I have a piece of assembly code which is written in NASM syntax. This is a vectorization code so we want to test it on Xeon Phi. I apologize if my question sounds too naive because this is the first day I have a Xeon Phi device. My question is:
How do I test this code? First question is it seems NASM/YASM doesn't support Xeon Phi yet. It seems difficult for me to rewrite the code in C because the algorithm itself is designed specific to some instructions and no description in C is considered.
I think intel compiler can recognize .s file as assembly but the syntax is different from NASM, and, is it still the same on for Xeon Phi? I mean syntax, format, I know instructions have changed.
Another question is: my current code involves intensive vector operations (it is coded with AVX/AVX2 in VEX prefix). I read from somewhere that on Xeon Phi, one thread is only able to do vector operation every other cycle so multi-thread on each core is recommended but I have some question here. Say my code used all 32 available zmm registers. Then when I execute the program, the code itself is actually designed for each core instead of each thread(thinking it is designed on CPU). Therefore, how to keep values in registers seems a problem for me. Say we use 32 zmm registers in our algorithm but I am using 4threads each core, then actually all threads would require 32 zmm registers. This is not possible I guess? So I am just courious...
My questions might sound naive and I apologize again if they really are. I will try to ask more sophisticated questions next time...when I have more knowledge about the device.
Thanks a lot!