The best method for inter-processor data communication

The best method for inter-processor data communication

Hello,

I measure the performance of memory copy in a NUMA machine with 4 Xeon(R) CPU E5-4620 processors. When I copy data in the local memory, I can get up to almost 10GB/s. However, when I copy data from remote memory, I get much worse performance, only around 1GB/s. I use memcpy() to copy data and each copy is a page size (4KB).

I wonder if Intel processors provides special instructions for inter-processor data movement. I know Intel use QPI for inter-processor communication. Does it expose some interface for programmers? Is the performance above the best I can get?

Thanks,
Da 

publicaciones de 35 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

>>...I wonder if Intel processors provides special instructions for inter-processor data movement...

Please take a look at Instructions Set Reference located at: www.intel.com/content/www/us/en/processors/architectures-software-develo...

>>...I get much worse performance, only around 1GB/s...

Access to a local memory is always faster, however the 10x drop in performance is significant. Does it really so big in case of an access to a foreign memory?

The best option is to disassemble memcpy() function and look at its machine code implementation.Rep prefix combined with movsd instruction are used to copy the memory in large quantity.
I think that interprocessor communication at the lowest level is managed by the hardware itself.I do not know if there is some kind of programming interface exposed to the programmer in order to manage and control programmatically inter-processor communication.
Regarding documentation I would like to recommend you to read Intel chipset documentation which probably does contain some information regarding inter-processor communication.

It may be worthwhile to check your memcpy() version and your data alignments. At one time, the __intel_fast_memcpy() substitution made by Intel compilers could be a great help. Any up to date memcpy() ought to take advantage of simd nontemporal instructions in cases where alignment is compatible. rep movsd should be used only where alignment requires it. memcpy() supplied by early 64-bit linux distros was extremely poor. You might want to experiment with 16, 32, and 64-byte alignments for both source and destination.
corei7-2 CPUs were supposed to be designed to improve performance of rep mov loops such as 32-bit gcc might create, but there would still be an advantage in setting alignment so as to use simd instructions. Some past CPUs performed poorly with rep mov loops.
In connection with illyapolak's remark, it would be interesting to use a profiler such as VTune or oprofile to show which instructions are actually used in your slow case.
I'm not sure what causes might be suspect for a slowdown such as you quote on that platform; more than a 2x penalty for remote memory would be disappointing. You should check whether the RAM is compatible and properly distributed among the slots.

Sorry for a question not related to the subject.

>>...NUMA machine with 4 Xeon(R) CPU E5-4620 processors...

Are these NUMA computers expensive? How much could cost a cheapest computer that supports NUMA architecture? Thanks in advance.

Note: I'm asking because I couldn't find an answer on the web.

>>...NUMA machine with 4 Xeon(R) CPU E5-4620 processors...>>>
Albeit no Intel-based chipset , but you can calculate the price of the motherboard and the cpus.
Please follow this link :http://www.tyan.com/product_SKU_spec.aspx?ProductType=MB&pid=670&SKU=600...
And this link :http://www.pcsuperstore.com/products/11113480-Tyan-S8812WGM3NR.html

For Intel-based chipset motherboards please follow these links:http://www.supermicro.com/products/motherboard/Xeon/C600/X9QR7-TF.cfm
http://www.alvio.com/xABK_PID1237628_supermicro-computer_mbd-x9qr7-tf-o_...

Whole system can easily reach the price of 2500-3000$.
For the complete solutions follow this link:
www.supermicro.com/xeon_mp/http://www.alvio.com/xABK_PID1237628_supermicro-computer_mbd-x9qr7-tf-o_...

>>>You should check whether the RAM is compatible and properly distributed among the slots.>>>
Lower hardware layer in the form of chipset's memory controller or on-die memory controller should be also accounted for the poor memory performance.
Regarding the internal implementation of the memcpy() , it is highly probable that compiler could wrongfully implement rep movsb instead of rep movsd instruction.

Thank you for all your suggestions.

I started with checking the assembly code of a small test code:
char src[4096];
char dest[4096];

int main()
{
memcpy(dest, src, sizeof(dest));
}

gcc compiles the code into the following assembly code:
0000000000000000 :
0: bf 00 00 00 00 mov $0x0,%edi
5: be 00 00 00 00 mov $0x0,%esi
a: b9 00 02 00 00 mov $0x200,%ecx
f: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
12: c3 retq
The code is pretty straightforward and as expected.

The Intel compiler compiles it into:
00000000004005c0 :
4005c0: 55 push %rbp
4005c1: 48 89 e5 mov %rsp,%rbp
4005c4: 48 83 e4 80 and $0xffffffffffffff80,%rsp
4005c8: 48 81 ec 80 00 00 00 sub $0x80,%rsp
4005cf: bf 03 00 00 00 mov $0x3,%edi
4005d4: e8 c7 00 00 00 callq 4006a0 <__intel_new_proc_init>
4005d9: 0f ae 1c 24 stmxcsr (%rsp)
4005dd: bf 00 9b 60 00 mov $0x609b00,%edi
4005e2: be 00 ab 60 00 mov $0x60ab00,%esi
4005e7: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
4005ee: ba 00 10 00 00 mov $0x1000,%edx
4005f3: 0f ae 14 24 ldmxcsr (%rsp)
4005f7: e8 54 00 00 00 callq 400650 <_intel_fast_memcpy>
4005fc: 33 c0 xor %eax,%eax
4005fe: 48 89 ec mov %rbp,%rsp
400601: 5d pop %rbp
400602: c3 retq
400603: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400608: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40060f: 00
So the Intel compiler uses _intel_fast_memcpy.

From the performance perspective, the executable compiled by the Interl compiler isn't faster than compiled by gcc at all. I tried using VTune to profile the compiled code, and it shows me _intel_fast_memcpy uses most time, but it doesn't show me which instructions in _intel_fast_memcpy is time consuming.

I know this question is more related to the topics in other sections: how do I profile the code in the external library. As I said, I can't see instructions in _intel_fast_memcpy. I tried to link the C library to my program statically, then I got an error as:
$ amplxe-cl -collect hotspots ./rand-memcpy 1 8
Error: Binary file of the analysis target does not contain symbols required for profiling. See documentation for more details.
Error: Valid pthread_setcancelstate symbol is not found in the static binary of the analysis target.
So how do I profile _intel_fast_memcpy?

Thanks,
Da

I guess you're looking at 32-bit gcc, which appears to optimize for short or non-aligned copies. If you use icc -static-intel, with long enough copies, you should be able to collect data to view __intel_fast_memcpy in assembly view. It might be interesting to see whether the results change with alignment.

@zhengda1936

Please follow this call instruction 4005f7: e8 54 00 00 00 callq 400650 <_intel_fast_memcpy>
It would be interesting to see the exact machine code implementation of that function.

Guys,

1. Please take a look / read the original post again because I really don't understand these continued pushes to disassemble / debug a memcpy function currently used in his tests
.
2. The user is on a NUMA system and this is a "different world" ( I don't have access to any such system at the moment )

The user clearly described that:
...
When I copy data in the local memory, I can get up to almost 10GB/s. However, when I copy data !!! from !!! remote memory, I get much worse performance, only around 1GB/s
...

He uses the same memcpy function in both cases and possibly experiences some hardware issue ( I can be wrong here ) and it has to be considered / taken into account. When he reads data from the remote memory it looks like he simply switches Source and Destination pointers in the the same memcpy function.

A question to zhengda1936,

Could you post C/C++ source codes of your test-case, please?

Cita:

TimP (Intel) escribió:

I guess you're looking at 32-bit gcc, which appears to optimize for short or non-aligned copies. If you use icc -static-intel, with long enough copies, you should be able to collect data to view __intel_fast_memcpy in assembly view. It might be interesting to see whether the results change with alignment.


No, I run my program in a 64-bit Linux, and I have switched to the Intel compiler.

I have tried -static-intel, but it seems it doesn't the intel library statically. With or without -static-intel, the compiled executable is exactly the same (I ran cmp to the two versions of executables). If I add -static as a linker option, I got the same error when I ran the executable under Vtune.

BTW, I copy 1G memory and the memory is aligned to a page size.

Da

If it helps, I post my code here. Basically, it allocates 1GB of memory aligned with a page size in a NUMA node, and tries to copy memory to a specified NUMA node.

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

#define NUM_THREADS 64
#define PAGE_SIZE 4096
#define ENTRY_SIZE PAGE_SIZE
#define ARRAY_SIZE 1073741824

off_t *offset;
unsigned int nentries;
int nthreads;
struct timeval global_start;
char *array;
char *dst_arr;

void permute_offset(off_t *offset, int num)
{
int i;
for (i = num - 1; i >= 1; i--) {
int j = random() % i;
off_t tmp = offset[j];
offset[j] = offset[i];
offset[i] = tmp;
}
}

float time_diff(struct timeval time1, struct timeval time2)
{
return time2.tv_sec - time1.tv_sec
+ ((float)(time2.tv_usec - time1.tv_usec))/1000000;
}

void rand_read(void *arg)
{
int fd;
ssize_t ret;
int i, j, start_i, end_i;
ssize_t read_bytes = 0;
struct timeval start_time, end_time;

start_i = (long) arg;
end_i = start_i + nentries / nthreads;
gettimeofday(&start_time, NULL);
for (j = 0; j < 8; j++) {
for (i = start_i; i < end_i; i++) {
memcpy(dst_arr + offset[i], array + offset[i], ENTRY_SIZE);
read_bytes += ENTRY_SIZE;
}
}
gettimeofday(&end_time, NULL);
printf("read %ld bytes, start at %f seconds, takes %f seconds\n",
read_bytes, time_diff(global_start, start_time),
time_diff(start_time, end_time));

pthread_exit((void *) read_bytes);
}

int main(int argc, char *argv[])
{
int ret;
int i;
struct timeval start_time, end_time;
ssize_t read_bytes = 0;
pthread_t threads[NUM_THREADS];
/* the number of entries the array can contain. */
int node;

if (argc != 3) {
fprintf(stderr, "read node_id num_threads\n");
exit(1);
}

nentries = ARRAY_SIZE / ENTRY_SIZE;
node = atoi(argv[1]);
offset = valloc(sizeof(*offset) * nentries);
for(i = 0; i < nentries; i++) {
offset[i] = ((off_t) i) * ENTRY_SIZE;
}
permute_offset(offset, nentries);

#if 0
int ncpus = numa_num_configured_cpus();
printf("there are %d cores in the machine\n", ncpus);
for (i = 0; i < ncpus; i++) {
printf("cpu %d belongs to node %d\n",
i, numa_node_of_cpu(i));
}
#endif
/* bind to node 0. */
nodemask_t nodemask;
nodemask_zero(&nodemask);
nodemask_set_compat(&nodemask, 0);
unsigned long maxnode = NUMA_NUM_NODES;
if (set_mempolicy(MPOL_BIND,
(unsigned long *) &nodemask, maxnode) < 0) {
perror("set_mempolicy");
exit(1);
}
printf("run on node 0\n");
if (numa_run_on_node(0) < 0) {
perror("numa_run_on_node");
exit(1);
}

array = valloc(ARRAY_SIZE);
/* we need to avoid the cost of page fault. */
for (i = 0; i < ARRAY_SIZE; i += PAGE_SIZE)
array[i] = 0;
dst_arr = valloc(ARRAY_SIZE);
/* we need to avoid the cost of page fault. */
for (i = 0; i < ARRAY_SIZE; i += PAGE_SIZE)
dst_arr[i] = 0;

printf("run on node %d\n", node);
if (numa_run_on_node(node) < 0) {
perror("numa_run_on_node");
exit(1);
}

nthreads = atoi(argv[2]);
if (nthreads > NUM_THREADS) {
fprintf(stderr, "too many threads\n");
exit(1);
}

ret = setpriority(PRIO_PROCESS, getpid(), -20);
if (ret < 0) {
perror("setpriority");
exit(1);
}

gettimeofday(&start_time, NULL);
global_start = start_time;
for (i = 0; i < nthreads; i++) {
ret = pthread_create(&threads[i], NULL,
rand_read, (void *) (long) (nentries / nthreads * i));
if (ret) {
perror("pthread_create");
exit(1);
}
}

for (i = 0; i < nthreads; i++) {
ssize_t size;
ret = pthread_join(threads[i], (void **) &size);
if (ret) {
perror("pthread_join");
exit(1);
}
read_bytes += size;
}
gettimeofday(&end_time, NULL);
printf("read %ld bytes, takes %f seconds\n",
read_bytes, end_time.tv_sec - start_time.tv_sec
+ ((float)(end_time.tv_usec - start_time.tv_usec))/1000000);
}

>>>The user is on a NUMA system and this is a "different world" ( I don't have access to any such system at the moment )>>>

.It could very helpful if @zhengda1936 could post his hardware configuration.I'm sure that he has a quad CPU motherboard probably manufactured by TYAN or SuperMicro.

>>> Please take a look / read the original post again because I really don't understand these continued pushes to disassemble / debug a memcpy function currently used in his tests>>>

I think that disassembling memcpy() function and revealing its exact machine code implementation could provide us with the some insight into
what is going under the hood.As I stated earlier in my post there is possibility that 'rep movsb' instruction is used by the compiler.
I do not exclude the possibility of some hardware related issue.

>>>When I copy data in the local memory, I can get up to almost 10GB/s. However, when I copy data !!! from !!! remote memory, I get much worse performance, only around 1GB/s>>>

It is obvious that in NUMA architecture one can expect memory transfer speed degradation when for example CPU 0 is accessing non local memory(remote memory) from its relative "point of view".

Sure, I can provide the hardware configuration. Here is the CPU info:
description: CPU
product: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
vendor: Intel Corp.
physical id: 400
bus info: cpu@0
version: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
slot: CPU1
size: 2200MHz
capacity: 3600MHz
width: 64 bits
clock: 2905MHz

So each CPU has 8 cores and there are 4 CPUs in the machine.

Memory info:
description: DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
product: M393B2G70BH0-YH9
vendor: 00CE00B300CE
physical id: 0
serial: 342F3D9C
slot: DIMM_A1
size: 16GiB
width: 64 bits
clock: 1333MHz (0.8ns)

What other hardware configuration info I should provide to help you diagnose?

>>>So each CPU has 8 cores and there are 4 CPUs in the machine.>>>

Do you mean 8 threads/4 cores per CPU?
Have you experienced earlier memory speed degradation?

As I show above, gcc compiled memcpy into "rep movsq", and Intel compiler invokes __intel_fast_memcpy.
I used perf to profile my program, and it seems it eventually invokes __intel_ssse3_rep_memcpy and the most time-consuming instructions are:
0.06 : 405657: movaps -0x10(%rsi),%xmm1
38.23 : 40565b: movaps %xmm1,-0x10(%rdi)
1.00 : 40565f: movaps -0x20(%rsi),%xmm2
0.06 : 405663: movaps %xmm2,-0x20(%rdi)
0.18 : 405667: movaps -0x30(%rsi),%xmm3
0.06 : 40566b: movaps %xmm3,-0x30(%rdi)
0.05 : 40566f: movaps -0x40(%rsi),%xmm4
0.01 : 405673: movaps %xmm4,-0x40(%rdi)
0.10 : 405677: movaps -0x50(%rsi),%xmm5
41.82 : 40567b: movaps %xmm5,-0x50(%rdi)
0.47 : 40567f: movaps -0x60(%rsi),%xmm5
0.03 : 405683: movaps %xmm5,-0x60(%rdi)
0.06 : 405687: movaps -0x70(%rsi),%xmm5
0.01 : 40568b: movaps %xmm5,-0x70(%rdi)
0.04 : 40568f: movaps -0x80(%rsi),%xmm5
0.01 : 405693: movaps %xmm5,-0x80(%rdi)
It seems one data copy triggers moving 64 bytes to the remote node, so only the first data copy consumes most CPU time. I thought one data copy would trigger moving 128 bytes to a remote node (since the cache line is 128 bytes).

Cita:

iliyapolak escribió:

>>>So each CPU has 8 cores and there are 4 CPUs in the machine.>>>

Do you mean 8 threads/4 cores per CPU?
Have you experienced earlier memory speed degradation?

No, 16 threads/8 cores per CPU.
What do you mean by earlier memory speed degradation?
The local memory copy speed is expected.

Hi Da,

It is always a right thing to follow a top-to-down approach when investigating a problem. That is:

- Source codes ->
- Analysis ->
- Is there a hardware problem?
- Are there any logical errors in the codes? ->
- Could I reproduce a problem? ->
- Could I simplify the test-case? ->
- Could I remove some dependencies on 3rd party software components ->
- Why does my application crash ( if this is the case )? ->
- What else could be wrong with my codes?
- Etc.

It means, that if a C/C++ developer will try to do some investigation in opposite way, that is following a down-to-top approach ( dissassembling first all the rest later ), a significant amount of a project time could be wasted.

From my point of view a Summary of the problem could look like:

- Possible logical problem with the test-case ( very high possibility )
- Possible oversubscription of the processing threads ( high possibility )
- Possible hardware issue with the NUMA system ( very low possibility )
- Possible problem with CRT memcpy function ( low possibility )

A simplified test-case is needed without changing priorities of any threads or a process and ideally it would be nice to have just one thread of normal priority. This is needed to verify that NUMA system doesn't have any hardware issues.

A logic for the simplified test-case could look like:

- one thread test application
- allocate a memory block in a 'local' memory
- copy some data ( some number of times to get an average time )
- invalidate cache lines somehow
- read some data ( some number of times to get an average time )
- save performance numbers
- allocate a memory block in a 'remote' memory
- copy some data ( some number of times to get an average time )
- invalidate cache lines somehow
- read some data ( some number of times to get an average time )
- save performance numbers
- compare results
- repeat the test with more threads ( increase by 2 every time ) until it reaches 64

1. After a very quick code review of the test-case I noticed that a priority of the executing process is changed:
...
setpriority( PRIO_PROCESS, getpid(), -20 );
...
Why do you change the priority of the process?

2. In order to clear any uncertanties with the 'memcpy' function I recommend to replace it with an external pure C function ( a couple of minutes to implement, right? )

3. A Virtual Memory Manager ( VMM ) on any OS should have 'Above Normal' or 'High' priority. If processing thread(s) in some test have higher priorities then VMM will be preempted most of the time and any memory operations using 'mem'-like CRT functions will be affected. Also, there will be a performance degradation of the whole operating system. If processing thread(s) have lower priorities, like 'Below Normal' or 'Idle', then they will be preempted most of the time and performance of the test will be affected.

4. A brief high-level overview of the test-case will also help

Best regards,
Sergey

>>>>...The local memory copy speed is expected...

>>...
>>2. In order to clear any uncertanties with the 'memcpy' function I recommend to replace it with an external pure C function ( a couple
>>of minutes to implement, right? )
>>...

I would verify it first.

>>...It seems one data copy triggers moving 64 bytes to the remote node, so only the first data copy consumes most CPU time. I thought
>>one data copy would trigger moving 128 bytes to a remote node (since the cache line is 128 bytes)...

But your actual performance degradation is ~10 times and that is significant.

Note: I see a call to:
...
valloc( sizeof( *offset ) * nentries );
...
Is it similar to 'calloc'? Is it a NUMA specific function?

>>>It seems one data copy triggers moving 64 bytes to the remote node, so only the first data copy consumes most CPU time>>>
Yes most time consuming instructions are transferring 64-bytes data.What is the content of rdi register?Is it address of remote memory destination?

>>2. In order to clear any uncertanties with the 'memcpy' function I recommend to replace it with an external pure C function ( a couple
>>of minutes to implement, right? )

Interesting if your memcpy function is inlined or not and if some non - temporal cache hints are used inside the function.
Can you post the full disassembly of memcpy function?

>>>What do you mean by earlier memory speed degradation?>>>

Have you earlier maybe during your other tests experienced memory write/read speed degradation when the remote memory was accessed?

>>>The local memory copy speed is expected.>>>

Yes I agree with you completely.
In the case of remote memory access some memory transfer speed penalty is also expected, the question is why do you have 10 times higher speed degradation it is more than expected.
Have you tested remote accesses from all of your nodes?What are the NUMA distances between your nodes?Have you tried to access furthest and the nearest possible node?

>>>1. After a very quick code review of the test-case I noticed that a priority of the executing process is changed:
...
setpriority( PRIO_PROCESS, getpid(), -20 );
...
Why do you change the priority of the process?>>>

Good observation Sergey.
He should not try to change the process priority because of possibility of process beign scheduled to run on the other node and thus all memory accesses will be remote from the NUMA perspective.

>>> A Virtual Memory Manager ( VMM ) on any OS should have 'Above Normal' or 'High' priority. >>>

Windows memory manager runs its code at DISPATCH_LEVEL and normal thread activities occur at PASSIVE_LEVEL only when the thread is scheduled to run so called kernel APC then it will run at APC_LEVEL.

Hi Da,

I see that #include directives are not shown completely in your test case. Could you attach the source codes as external c or cpp file?

Thanks in advance.

OK, I guess it's my mistake to give you the impression that remote memory copy is 10 time slower.
In my program, I allocate the source array and the destination array in node 0, and run the memory copy code in node 0, 1, 2, 3.
If I run memory copy in node 0, it takes 0.85s. If I run the code in node 1 or 3, it takes 5.74s. If in node 2, it takes 7s. So the performance degradation is 6.75 and 8.24.
Note, here both source array and destination array is in the remote memory. If I only place the source array in the remote memory (in node 0), it takes 2.69s to run in the code in node 1. If I only place the destination array in the remote memory, it takes 3.27s. That is, writing to remote memory is slower than reading.

I get the same performance with or without setting priority. I thought I could get more stable result by setting the process with a higher priority.

As I showed before, gcc inlines memcpy and intel compiler invokes __intel_fast_memcpy and eventually end up in __intel_ssse3_rep_memcpy. __intel_ssse3_rep_memcpy is a super large function. I don't think posting its assembly code can help. What do you expect to see from there?

#include stdio.h
#include unistd.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include sys/time.h
#include stdlib.h
#include sys/resource.h
#include string.h
#include numa.h
#include numaif.h

>>>don't think posting its assembly code can help. What do you expect to see from there?>>>

I would like to check for existance of cache hints instructions.

>>>I get the same performance with or without setting priority. I thought I could get more stable result by setting the process with a higher priority.>>>

Can you set affinity to the specific node?
It seems that execution of your process is beign distributed between various nodes.

My program first runs on node 0, and then it runs on a node specified by the user. So I do set affinity and it is controlled by the user.

0000000000405410 <__intel_ssse3_rep_memcpy>:
405410: 48 89 f8 mov %rdi,%rax
405413: 48 81 fa 90 00 00 00 cmp $0x90,%rdx
40541a: 73 34 jae 405450 <__intel_ssse3_rep_memcpy+0x40>
40541c: 40 38 fe cmp %dil,%sil
40541f: 76 19 jbe 40543a <__intel_ssse3_rep_memcpy+0x2a>
405421: 48 01 d6 add %rdx,%rsi
405424: 48 01 d7 add %rdx,%rdi
405427: 4c 8d 1d 4a 3c 00 00 lea 0x3c4a(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
40542e: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405432: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405436: ff e2 jmpq *%rdx
405438: 0f 0b ud2
40543a: 4c 8d 1d f7 39 00 00 lea 0x39f7(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
405441: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405445: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405449: ff e2 jmpq *%rdx
40544b: 0f 0b ud2
40544d: 0f 1f 00 nopl (%rax)
405450: 40 38 fe cmp %dil,%sil
405453: 7e 5b jle 4054b0 <__intel_ssse3_rep_memcpy+0xa0>
405455: f3 0f 6f 06 movdqu (%rsi),%xmm0
405459: 49 89 f8 mov %rdi,%r8
40545c: 48 83 e7 f0 and $0xfffffffffffffff0,%rdi
405460: 48 83 c7 10 add $0x10,%rdi
405464: 49 89 f9 mov %rdi,%r9
405467: 4d 29 c1 sub %r8,%r9
40546a: 4c 29 ca sub %r9,%rdx
40546d: 4c 01 ce add %r9,%rsi
405470: 49 89 f1 mov %rsi,%r9
405473: 49 83 e1 0f and $0xf,%r9
405477: 0f 84 93 00 00 00 je 405510 <__intel_ssse3_rep_memcpy+0x100>
40547d: 8b 0d 85 56 20 00 mov 0x205685(%rip),%ecx # 60ab08 <__libirc_data_cache_size>
405483: 48 39 ca cmp %rcx,%rdx
405486: 0f 83 24 18 00 00 jae 406cb0 <__intel_ssse3_rep_memcpy+0x18a0>
40548c: 4c 8d 1d 25 3e 00 00 lea 0x3e25(%rip),%r11 # 4092b8 <.L_2il0floatpacket.29+0x64c>
405493: 48 81 ea 80 00 00 00 sub $0x80,%rdx
40549a: 4f 63 0c 8b movslq (%r11,%r9,4),%r9
40549e: 4d 01 d9 add %r11,%r9
4054a1: 41 ff e1 jmpq *%r9
4054a4: 0f 0b ud2
4054a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4054ad: 00 00 00
4054b0: 8b 0d 52 56 20 00 mov 0x205652(%rip),%ecx # 60ab08 <__libirc_data_cache_size>
4054b6: 48 d1 e1 shl %rcx
4054b9: 48 39 ca cmp %rcx,%rdx
4054bc: 0f 87 7e 19 00 00 ja 406e40 <__intel_ssse3_rep_memcpy+0x1a30>
4054c2: 48 01 d7 add %rdx,%rdi
4054c5: 48 01 d6 add %rdx,%rsi
4054c8: f3 0f 6f 46 f0 movdqu -0x10(%rsi),%xmm0
4054cd: 4c 8d 47 f0 lea -0x10(%rdi),%r8
4054d1: 49 89 f9 mov %rdi,%r9
4054d4: 49 83 e1 0f and $0xf,%r9
4054d8: 4c 31 cf xor %r9,%rdi
4054db: 4c 29 ce sub %r9,%rsi
4054de: 4c 29 ca sub %r9,%rdx
4054e1: 49 89 f1 mov %rsi,%r9
4054e4: 49 83 e1 0f and $0xf,%r9
4054e8: 0f 84 c2 00 00 00 je 4055b0 <__intel_ssse3_rep_memcpy+0x1a0>
4054ee: 4c 8d 1d 03 3e 00 00 lea 0x3e03(%rip),%r11 # 4092f8 <.L_2il0floatpacket.29+0x68c>
4054f5: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4054fc: 4f 63 0c 8b movslq (%r11,%r9,4),%r9
405500: 4d 01 d9 add %r11,%r9
405503: 41 ff e1 jmpq *%r9
405506: 0f 0b ud2
405508: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40550f: 00
405510: 49 89 d1 mov %rdx,%r9
405513: 49 c1 e9 08 shr $0x8,%r9
405517: 49 01 d1 add %rdx,%r9
40551a: 8b 0d ec 55 20 00 mov 0x2055ec(%rip),%ecx # 60ab0c <__libirc_data_cache_size_half>
405520: 49 39 c9 cmp %rcx,%r9
405523: 0f 83 87 17 00 00 jae 406cb0 <__intel_ssse3_rep_memcpy+0x18a0>
405529: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405530: 66 0f 6f 0e movdqa (%rsi),%xmm1
405534: 66 0f 7f 0f movdqa %xmm1,(%rdi)
405538: 0f 28 56 10 movaps 0x10(%rsi),%xmm2
40553c: 0f 29 57 10 movaps %xmm2,0x10(%rdi)
405540: 0f 28 5e 20 movaps 0x20(%rsi),%xmm3
405544: 0f 29 5f 20 movaps %xmm3,0x20(%rdi)
405548: 0f 28 66 30 movaps 0x30(%rsi),%xmm4
40554c: 0f 29 67 30 movaps %xmm4,0x30(%rdi)
405550: 0f 28 4e 40 movaps 0x40(%rsi),%xmm1
405554: 0f 29 4f 40 movaps %xmm1,0x40(%rdi)
405558: 0f 28 56 50 movaps 0x50(%rsi),%xmm2
40555c: 0f 29 57 50 movaps %xmm2,0x50(%rdi)
405560: 0f 28 5e 60 movaps 0x60(%rsi),%xmm3
405564: 0f 29 5f 60 movaps %xmm3,0x60(%rdi)
405568: 0f 28 66 70 movaps 0x70(%rsi),%xmm4
40556c: 0f 29 67 70 movaps %xmm4,0x70(%rdi)
405570: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405577: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
40557e: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
405585: 73 a9 jae 405530 <__intel_ssse3_rep_memcpy+0x120>
405587: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
40558c: 48 81 c2 80 00 00 00 add $0x80,%rdx
405593: 48 01 d6 add %rdx,%rsi
405596: 48 01 d7 add %rdx,%rdi
405599: 4c 8d 1d d8 3a 00 00 lea 0x3ad8(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
4055a0: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
4055a4: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4055a8: ff e2 jmpq *%rdx
4055aa: 0f 0b ud2
4055ac: 0f 1f 40 00 nopl 0x0(%rax)
4055b0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4055b7: 0f 28 4e f0 movaps -0x10(%rsi),%xmm1
4055bb: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
4055bf: 0f 28 56 e0 movaps -0x20(%rsi),%xmm2
4055c3: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
4055c7: 0f 28 5e d0 movaps -0x30(%rsi),%xmm3
4055cb: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
4055cf: 0f 28 66 c0 movaps -0x40(%rsi),%xmm4
4055d3: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
4055d7: 0f 28 6e b0 movaps -0x50(%rsi),%xmm5
4055db: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
4055df: 0f 28 6e a0 movaps -0x60(%rsi),%xmm5
4055e3: 0f 29 6f a0 movaps %xmm5,-0x60(%rdi)
4055e7: 0f 28 6e 90 movaps -0x70(%rsi),%xmm5
4055eb: 0f 29 6f 90 movaps %xmm5,-0x70(%rdi)
4055ef: 0f 28 6e 80 movaps -0x80(%rsi),%xmm5
4055f3: 0f 29 6f 80 movaps %xmm5,-0x80(%rdi)
4055f7: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4055fe: 48 8d 7f 80 lea -0x80(%rdi),%rdi
405602: 48 8d 76 80 lea -0x80(%rsi),%rsi
405606: 73 af jae 4055b7 <__intel_ssse3_rep_memcpy+0x1a7>
405608: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
40560d: 48 81 c2 80 00 00 00 add $0x80,%rdx
405614: 48 29 d7 sub %rdx,%rdi
405617: 48 29 d6 sub %rdx,%rsi
40561a: 4c 8d 1d 17 38 00 00 lea 0x3817(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
405621: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405625: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405629: ff e2 jmpq *%rdx
40562b: 0f 0b ud2
40562d: 0f 1f 00 nopl (%rax)
405630: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405637: 0f 28 4e ff movaps -0x1(%rsi),%xmm1
40563b: 0f 28 56 0f movaps 0xf(%rsi),%xmm2
40563f: 0f 28 5e 1f movaps 0x1f(%rsi),%xmm3
405643: 0f 28 66 2f movaps 0x2f(%rsi),%xmm4
405647: 0f 28 6e 3f movaps 0x3f(%rsi),%xmm5
40564b: 0f 28 76 4f movaps 0x4f(%rsi),%xmm6
40564f: 0f 28 7e 5f movaps 0x5f(%rsi),%xmm7
405653: 44 0f 28 46 6f movaps 0x6f(%rsi),%xmm8
405658: 44 0f 28 4e 7f movaps 0x7f(%rsi),%xmm9
40565d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
405664: 66 45 0f 3a 0f c8 01 palignr $0x1,%xmm8,%xmm9
40566b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
405670: 66 44 0f 3a 0f c7 01 palignr $0x1,%xmm7,%xmm8
405677: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
40567c: 66 0f 3a 0f fe 01 palignr $0x1,%xmm6,%xmm7
405682: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405686: 66 0f 3a 0f f5 01 palignr $0x1,%xmm5,%xmm6
40568c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405690: 66 0f 3a 0f ec 01 palignr $0x1,%xmm4,%xmm5
405696: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40569a: 66 0f 3a 0f e3 01 palignr $0x1,%xmm3,%xmm4
4056a0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
4056a4: 66 0f 3a 0f da 01 palignr $0x1,%xmm2,%xmm3
4056aa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
4056ae: 66 0f 3a 0f d1 01 palignr $0x1,%xmm1,%xmm2
4056b4: 0f 29 17 movaps %xmm2,(%rdi)
4056b7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
4056be: 0f 83 6c ff ff ff jae 405630 <__intel_ssse3_rep_memcpy+0x220>
4056c4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
4056c9: 48 81 c2 80 00 00 00 add $0x80,%rdx
4056d0: 48 01 d7 add %rdx,%rdi
4056d3: 48 01 d6 add %rdx,%rsi
4056d6: 4c 8d 1d 9b 39 00 00 lea 0x399b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
4056dd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
4056e1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4056e5: ff e2 jmpq *%rdx
4056e7: 0f 0b ud2
4056e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4056f0: 0f 28 4e ff movaps -0x1(%rsi),%xmm1
4056f4: 0f 28 56 ef movaps -0x11(%rsi),%xmm2
4056f8: 66 0f 3a 0f ca 01 palignr $0x1,%xmm2,%xmm1
4056fe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
405702: 0f 28 5e df movaps -0x21(%rsi),%xmm3
405706: 66 0f 3a 0f d3 01 palignr $0x1,%xmm3,%xmm2
40570c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
405710: 0f 28 66 cf movaps -0x31(%rsi),%xmm4
405714: 66 0f 3a 0f dc 01 palignr $0x1,%xmm4,%xmm3
40571a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40571e: 0f 28 6e bf movaps -0x41(%rsi),%xmm5
405722: 66 0f 3a 0f e5 01 palignr $0x1,%xmm5,%xmm4
405728: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
40572c: 0f 28 76 af movaps -0x51(%rsi),%xmm6
405730: 66 0f 3a 0f ee 01 palignr $0x1,%xmm6,%xmm5
405736: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
40573a: 0f 28 7e 9f movaps -0x61(%rsi),%xmm7
40573e: 66 0f 3a 0f f7 01 palignr $0x1,%xmm7,%xmm6
405744: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
405748: 44 0f 28 46 8f movaps -0x71(%rsi),%xmm8
40574d: 66 41 0f 3a 0f f8 01 palignr $0x1,%xmm8,%xmm7
405754: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
405758: 44 0f 28 8e 7f ff ff movaps -0x81(%rsi),%xmm9
40575f: ff
405760: 66 45 0f 3a 0f c1 01 palignr $0x1,%xmm9,%xmm8
405767: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
40576c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405773: 48 8d 7f 80 lea -0x80(%rdi),%rdi
405777: 48 8d 76 80 lea -0x80(%rsi),%rsi
40577b: 0f 83 6f ff ff ff jae 4056f0 <__intel_ssse3_rep_memcpy+0x2e0>
405781: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405786: 48 81 c2 80 00 00 00 add $0x80,%rdx
40578d: 48 29 d7 sub %rdx,%rdi
405790: 48 29 d6 sub %rdx,%rsi
405793: 4c 8d 1d 9e 36 00 00 lea 0x369e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40579a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40579e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4057a2: ff e2 jmpq *%rdx
4057a4: 0f 0b ud2
4057a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4057ad: 00 00 00
4057b0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4057b7: 0f 28 4e fe movaps -0x2(%rsi),%xmm1
4057bb: 0f 28 56 0e movaps 0xe(%rsi),%xmm2
4057bf: 0f 28 5e 1e movaps 0x1e(%rsi),%xmm3
4057c3: 0f 28 66 2e movaps 0x2e(%rsi),%xmm4
4057c7: 0f 28 6e 3e movaps 0x3e(%rsi),%xmm5
4057cb: 0f 28 76 4e movaps 0x4e(%rsi),%xmm6
4057cf: 0f 28 7e 5e movaps 0x5e(%rsi),%xmm7
4057d3: 44 0f 28 46 6e movaps 0x6e(%rsi),%xmm8
4057d8: 44 0f 28 4e 7e movaps 0x7e(%rsi),%xmm9
4057dd: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
4057e4: 66 45 0f 3a 0f c8 02 palignr $0x2,%xmm8,%xmm9
4057eb: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
4057f0: 66 44 0f 3a 0f c7 02 palignr $0x2,%xmm7,%xmm8
4057f7: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
4057fc: 66 0f 3a 0f fe 02 palignr $0x2,%xmm6,%xmm7
405802: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405806: 66 0f 3a 0f f5 02 palignr $0x2,%xmm5,%xmm6
40580c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405810: 66 0f 3a 0f ec 02 palignr $0x2,%xmm4,%xmm5
405816: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40581a: 66 0f 3a 0f e3 02 palignr $0x2,%xmm3,%xmm4
405820: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
405824: 66 0f 3a 0f da 02 palignr $0x2,%xmm2,%xmm3
40582a: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
40582e: 66 0f 3a 0f d1 02 palignr $0x2,%xmm1,%xmm2
405834: 0f 29 17 movaps %xmm2,(%rdi)
405837: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
40583e: 0f 83 6c ff ff ff jae 4057b0 <__intel_ssse3_rep_memcpy+0x3a0>
405844: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405849: 48 81 c2 80 00 00 00 add $0x80,%rdx
405850: 48 01 d7 add %rdx,%rdi
405853: 48 01 d6 add %rdx,%rsi
405856: 4c 8d 1d 1b 38 00 00 lea 0x381b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
40585d: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405861: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405865: ff e2 jmpq *%rdx
405867: 0f 0b ud2
405869: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
405870: 0f 28 4e fe movaps -0x2(%rsi),%xmm1
405874: 0f 28 56 ee movaps -0x12(%rsi),%xmm2
405878: 66 0f 3a 0f ca 02 palignr $0x2,%xmm2,%xmm1
40587e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
405882: 0f 28 5e de movaps -0x22(%rsi),%xmm3
405886: 66 0f 3a 0f d3 02 palignr $0x2,%xmm3,%xmm2
40588c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
405890: 0f 28 66 ce movaps -0x32(%rsi),%xmm4
405894: 66 0f 3a 0f dc 02 palignr $0x2,%xmm4,%xmm3
40589a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40589e: 0f 28 6e be movaps -0x42(%rsi),%xmm5
4058a2: 66 0f 3a 0f e5 02 palignr $0x2,%xmm5,%xmm4
4058a8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
4058ac: 0f 28 76 ae movaps -0x52(%rsi),%xmm6
4058b0: 66 0f 3a 0f ee 02 palignr $0x2,%xmm6,%xmm5
4058b6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
4058ba: 0f 28 7e 9e movaps -0x62(%rsi),%xmm7
4058be: 66 0f 3a 0f f7 02 palignr $0x2,%xmm7,%xmm6
4058c4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
4058c8: 44 0f 28 46 8e movaps -0x72(%rsi),%xmm8
4058cd: 66 41 0f 3a 0f f8 02 palignr $0x2,%xmm8,%xmm7
4058d4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
4058d8: 44 0f 28 8e 7e ff ff movaps -0x82(%rsi),%xmm9
4058df: ff
4058e0: 66 45 0f 3a 0f c1 02 palignr $0x2,%xmm9,%xmm8
4058e7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
4058ec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4058f3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
4058f7: 48 8d 76 80 lea -0x80(%rsi),%rsi
4058fb: 0f 83 6f ff ff ff jae 405870 <__intel_ssse3_rep_memcpy+0x460>
405901: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405906: 48 81 c2 80 00 00 00 add $0x80,%rdx
40590d: 48 29 d7 sub %rdx,%rdi
405910: 48 29 d6 sub %rdx,%rsi
405913: 4c 8d 1d 1e 35 00 00 lea 0x351e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40591a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40591e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405922: ff e2 jmpq *%rdx
405924: 0f 0b ud2
405926: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40592d: 00 00 00
405930: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405937: 0f 28 4e fd movaps -0x3(%rsi),%xmm1
40593b: 0f 28 56 0d movaps 0xd(%rsi),%xmm2
40593f: 0f 28 5e 1d movaps 0x1d(%rsi),%xmm3
405943: 0f 28 66 2d movaps 0x2d(%rsi),%xmm4
405947: 0f 28 6e 3d movaps 0x3d(%rsi),%xmm5
40594b: 0f 28 76 4d movaps 0x4d(%rsi),%xmm6
40594f: 0f 28 7e 5d movaps 0x5d(%rsi),%xmm7
405953: 44 0f 28 46 6d movaps 0x6d(%rsi),%xmm8
405958: 44 0f 28 4e 7d movaps 0x7d(%rsi),%xmm9
40595d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
405964: 66 45 0f 3a 0f c8 03 palignr $0x3,%xmm8,%xmm9
40596b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
405970: 66 44 0f 3a 0f c7 03 palignr $0x3,%xmm7,%xmm8
405977: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
40597c: 66 0f 3a 0f fe 03 palignr $0x3,%xmm6,%xmm7
405982: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405986: 66 0f 3a 0f f5 03 palignr $0x3,%xmm5,%xmm6
40598c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405990: 66 0f 3a 0f ec 03 palignr $0x3,%xmm4,%xmm5
405996: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40599a: 66 0f 3a 0f e3 03 palignr $0x3,%xmm3,%xmm4
4059a0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
4059a4: 66 0f 3a 0f da 03 palignr $0x3,%xmm2,%xmm3
4059aa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
4059ae: 66 0f 3a 0f d1 03 palignr $0x3,%xmm1,%xmm2
4059b4: 0f 29 17 movaps %xmm2,(%rdi)
4059b7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
4059be: 0f 83 6c ff ff ff jae 405930 <__intel_ssse3_rep_memcpy+0x520>
4059c4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
4059c9: 48 81 c2 80 00 00 00 add $0x80,%rdx
4059d0: 48 01 d7 add %rdx,%rdi
4059d3: 48 01 d6 add %rdx,%rsi
4059d6: 4c 8d 1d 9b 36 00 00 lea 0x369b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
4059dd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
4059e1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4059e5: ff e2 jmpq *%rdx
4059e7: 0f 0b ud2
4059e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4059f0: 0f 28 4e fd movaps -0x3(%rsi),%xmm1
4059f4: 0f 28 56 ed movaps -0x13(%rsi),%xmm2
4059f8: 66 0f 3a 0f ca 03 palignr $0x3,%xmm2,%xmm1
4059fe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
405a02: 0f 28 5e dd movaps -0x23(%rsi),%xmm3
405a06: 66 0f 3a 0f d3 03 palignr $0x3,%xmm3,%xmm2
405a0c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
405a10: 0f 28 66 cd movaps -0x33(%rsi),%xmm4
405a14: 66 0f 3a 0f dc 03 palignr $0x3,%xmm4,%xmm3
405a1a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
405a1e: 0f 28 6e bd movaps -0x43(%rsi),%xmm5
405a22: 66 0f 3a 0f e5 03 palignr $0x3,%xmm5,%xmm4
405a28: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
405a2c: 0f 28 76 ad movaps -0x53(%rsi),%xmm6
405a30: 66 0f 3a 0f ee 03 palignr $0x3,%xmm6,%xmm5
405a36: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
405a3a: 0f 28 7e 9d movaps -0x63(%rsi),%xmm7
405a3e: 66 0f 3a 0f f7 03 palignr $0x3,%xmm7,%xmm6
405a44: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
405a48: 44 0f 28 46 8d movaps -0x73(%rsi),%xmm8
405a4d: 66 41 0f 3a 0f f8 03 palignr $0x3,%xmm8,%xmm7
405a54: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
405a58: 44 0f 28 8e 7d ff ff movaps -0x83(%rsi),%xmm9
405a5f: ff
405a60: 66 45 0f 3a 0f c1 03 palignr $0x3,%xmm9,%xmm8
405a67: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
405a6c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405a73: 48 8d 7f 80 lea -0x80(%rdi),%rdi
405a77: 48 8d 76 80 lea -0x80(%rsi),%rsi
405a7b: 0f 83 6f ff ff ff jae 4059f0 <__intel_ssse3_rep_memcpy+0x5e0>
405a81: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405a86: 48 81 c2 80 00 00 00 add $0x80,%rdx
405a8d: 48 29 d7 sub %rdx,%rdi
405a90: 48 29 d6 sub %rdx,%rsi
405a93: 4c 8d 1d 9e 33 00 00 lea 0x339e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
405a9a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405a9e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405aa2: ff e2 jmpq *%rdx
405aa4: 0f 0b ud2
405aa6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
405aad: 00 00 00
405ab0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405ab7: 0f 28 4e fc movaps -0x4(%rsi),%xmm1
405abb: 0f 28 56 0c movaps 0xc(%rsi),%xmm2
405abf: 0f 28 5e 1c movaps 0x1c(%rsi),%xmm3
405ac3: 0f 28 66 2c movaps 0x2c(%rsi),%xmm4
405ac7: 0f 28 6e 3c movaps 0x3c(%rsi),%xmm5
405acb: 0f 28 76 4c movaps 0x4c(%rsi),%xmm6
405acf: 0f 28 7e 5c movaps 0x5c(%rsi),%xmm7
405ad3: 44 0f 28 46 6c movaps 0x6c(%rsi),%xmm8
405ad8: 44 0f 28 4e 7c movaps 0x7c(%rsi),%xmm9
405add: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
405ae4: 66 45 0f 3a 0f c8 04 palignr $0x4,%xmm8,%xmm9
405aeb: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
405af0: 66 44 0f 3a 0f c7 04 palignr $0x4,%xmm7,%xmm8
405af7: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
405afc: 66 0f 3a 0f fe 04 palignr $0x4,%xmm6,%xmm7
405b02: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405b06: 66 0f 3a 0f f5 04 palignr $0x4,%xmm5,%xmm6
405b0c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405b10: 66 0f 3a 0f ec 04 palignr $0x4,%xmm4,%xmm5
405b16: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
405b1a: 66 0f 3a 0f e3 04 palignr $0x4,%xmm3,%xmm4
405b20: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
405b24: 66 0f 3a 0f da 04 palignr $0x4,%xmm2,%xmm3
405b2a: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
405b2e: 66 0f 3a 0f d1 04 palignr $0x4,%xmm1,%xmm2
405b34: 0f 29 17 movaps %xmm2,(%rdi)
405b37: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
405b3e: 0f 83 6c ff ff ff jae 405ab0 <__intel_ssse3_rep_memcpy+0x6a0>
405b44: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405b49: 48 81 c2 80 00 00 00 add $0x80,%rdx
405b50: 48 01 d7 add %rdx,%rdi
405b53: 48 01 d6 add %rdx,%rsi
405b56: 4c 8d 1d 1b 35 00 00 lea 0x351b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
405b5d: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405b61: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405b65: ff e2 jmpq *%rdx
405b67: 0f 0b ud2
405b69: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
405b70: 0f 28 4e fc movaps -0x4(%rsi),%xmm1
405b74: 0f 28 56 ec movaps -0x14(%rsi),%xmm2
405b78: 66 0f 3a 0f ca 04 palignr $0x4,%xmm2,%xmm1
405b7e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
405b82: 0f 28 5e dc movaps -0x24(%rsi),%xmm3
405b86: 66 0f 3a 0f d3 04 palignr $0x4,%xmm3,%xmm2
405b8c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
405b90: 0f 28 66 cc movaps -0x34(%rsi),%xmm4
405b94: 66 0f 3a 0f dc 04 palignr $0x4,%xmm4,%xmm3
405b9a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
405b9e: 0f 28 6e bc movaps -0x44(%rsi),%xmm5
405ba2: 66 0f 3a 0f e5 04 palignr $0x4,%xmm5,%xmm4
405ba8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
405bac: 0f 28 76 ac movaps -0x54(%rsi),%xmm6
405bb0: 66 0f 3a 0f ee 04 palignr $0x4,%xmm6,%xmm5
405bb6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
405bba: 0f 28 7e 9c movaps -0x64(%rsi),%xmm7
405bbe: 66 0f 3a 0f f7 04 palignr $0x4,%xmm7,%xmm6
405bc4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
405bc8: 44 0f 28 46 8c movaps -0x74(%rsi),%xmm8
405bcd: 66 41 0f 3a 0f f8 04 palignr $0x4,%xmm8,%xmm7
405bd4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
405bd8: 44 0f 28 8e 7c ff ff movaps -0x84(%rsi),%xmm9
405bdf: ff
405be0: 66 45 0f 3a 0f c1 04 palignr $0x4,%xmm9,%xmm8
405be7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
405bec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405bf3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
405bf7: 48 8d 76 80 lea -0x80(%rsi),%rsi
405bfb: 0f 83 6f ff ff ff jae 405b70 <__intel_ssse3_rep_memcpy+0x760>
405c01: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405c06: 48 81 c2 80 00 00 00 add $0x80,%rdx
405c0d: 48 29 d7 sub %rdx,%rdi
405c10: 48 29 d6 sub %rdx,%rsi
405c13: 4c 8d 1d 1e 32 00 00 lea 0x321e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
405c1a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405c1e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405c22: ff e2 jmpq *%rdx
405c24: 0f 0b ud2
405c26: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
405c2d: 00 00 00
405c30: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405c37: 0f 28 4e fb movaps -0x5(%rsi),%xmm1
405c3b: 0f 28 56 0b movaps 0xb(%rsi),%xmm2
405c3f: 0f 28 5e 1b movaps 0x1b(%rsi),%xmm3
405c43: 0f 28 66 2b movaps 0x2b(%rsi),%xmm4
405c47: 0f 28 6e 3b movaps 0x3b(%rsi),%xmm5
405c4b: 0f 28 76 4b movaps 0x4b(%rsi),%xmm6
405c4f: 0f 28 7e 5b movaps 0x5b(%rsi),%xmm7
405c53: 44 0f 28 46 6b movaps 0x6b(%rsi),%xmm8
405c58: 44 0f 28 4e 7b movaps 0x7b(%rsi),%xmm9
405c5d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
405c64: 66 45 0f 3a 0f c8 05 palignr $0x5,%xmm8,%xmm9
405c6b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
405c70: 66 44 0f 3a 0f c7 05 palignr $0x5,%xmm7,%xmm8
405c77: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
405c7c: 66 0f 3a 0f fe 05 palignr $0x5,%xmm6,%xmm7
405c82: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405c86: 66 0f 3a 0f f5 05 palignr $0x5,%xmm5,%xmm6
405c8c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405c90: 66 0f 3a 0f ec 05 palignr $0x5,%xmm4,%xmm5
405c96: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
405c9a: 66 0f 3a 0f e3 05 palignr $0x5,%xmm3,%xmm4
405ca0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
405ca4: 66 0f 3a 0f da 05 palignr $0x5,%xmm2,%xmm3
405caa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
405cae: 66 0f 3a 0f d1 05 palignr $0x5,%xmm1,%xmm2
405cb4: 0f 29 17 movaps %xmm2,(%rdi)
405cb7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
405cbe: 0f 83 6c ff ff ff jae 405c30 <__intel_ssse3_rep_memcpy+0x820>
405cc4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405cc9: 48 81 c2 80 00 00 00 add $0x80,%rdx
405cd0: 48 01 d7 add %rdx,%rdi
405cd3: 48 01 d6 add %rdx,%rsi
405cd6: 4c 8d 1d 9b 33 00 00 lea 0x339b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
405cdd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405ce1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405ce5: ff e2 jmpq *%rdx
405ce7: 0f 0b ud2
405ce9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
405cf0: 0f 28 4e fb movaps -0x5(%rsi),%xmm1
405cf4: 0f 28 56 eb movaps -0x15(%rsi),%xmm2
405cf8: 66 0f 3a 0f ca 05 palignr $0x5,%xmm2,%xmm1
405cfe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
405d02: 0f 28 5e db movaps -0x25(%rsi),%xmm3
405d06: 66 0f 3a 0f d3 05 palignr $0x5,%xmm3,%xmm2
405d0c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
405d10: 0f 28 66 cb movaps -0x35(%rsi),%xmm4
405d14: 66 0f 3a 0f dc 05 palignr $0x5,%xmm4,%xmm3
405d1a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
405d1e: 0f 28 6e bb movaps -0x45(%rsi),%xmm5
405d22: 66 0f 3a 0f e5 05 palignr $0x5,%xmm5,%xmm4
405d28: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
405d2c: 0f 28 76 ab movaps -0x55(%rsi),%xmm6
405d30: 66 0f 3a 0f ee 05 palignr $0x5,%xmm6,%xmm5
405d36: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
405d3a: 0f 28 7e 9b movaps -0x65(%rsi),%xmm7
405d3e: 66 0f 3a 0f f7 05 palignr $0x5,%xmm7,%xmm6
405d44: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
405d48: 44 0f 28 46 8b movaps -0x75(%rsi),%xmm8
405d4d: 66 41 0f 3a 0f f8 05 palignr $0x5,%xmm8,%xmm7
405d54: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
405d58: 44 0f 28 8e 7b ff ff movaps -0x85(%rsi),%xmm9
405d5f: ff
405d60: 66 45 0f 3a 0f c1 05 palignr $0x5,%xmm9,%xmm8
405d67: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
405d6c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405d73: 48 8d 7f 80 lea -0x80(%rdi),%rdi
405d77: 48 8d 76 80 lea -0x80(%rsi),%rsi
405d7b: 0f 83 6f ff ff ff jae 405cf0 <__intel_ssse3_rep_memcpy+0x8e0>
405d81: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405d86: 48 81 c2 80 00 00 00 add $0x80,%rdx
405d8d: 48 29 d7 sub %rdx,%rdi
405d90: 48 29 d6 sub %rdx,%rsi
405d93: 4c 8d 1d 9e 30 00 00 lea 0x309e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
405d9a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405d9e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405da2: ff e2 jmpq *%rdx
405da4: 0f 0b ud2
405da6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
405dad: 00 00 00
405db0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405db7: 0f 28 4e fa movaps -0x6(%rsi),%xmm1
405dbb: 0f 28 56 0a movaps 0xa(%rsi),%xmm2
405dbf: 0f 28 5e 1a movaps 0x1a(%rsi),%xmm3
405dc3: 0f 28 66 2a movaps 0x2a(%rsi),%xmm4
405dc7: 0f 28 6e 3a movaps 0x3a(%rsi),%xmm5
405dcb: 0f 28 76 4a movaps 0x4a(%rsi),%xmm6
405dcf: 0f 28 7e 5a movaps 0x5a(%rsi),%xmm7
405dd3: 44 0f 28 46 6a movaps 0x6a(%rsi),%xmm8
405dd8: 44 0f 28 4e 7a movaps 0x7a(%rsi),%xmm9
405ddd: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
405de4: 66 45 0f 3a 0f c8 06 palignr $0x6,%xmm8,%xmm9
405deb: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
405df0: 66 44 0f 3a 0f c7 06 palignr $0x6,%xmm7,%xmm8
405df7: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
405dfc: 66 0f 3a 0f fe 06 palignr $0x6,%xmm6,%xmm7
405e02: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405e06: 66 0f 3a 0f f5 06 palignr $0x6,%xmm5,%xmm6
405e0c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405e10: 66 0f 3a 0f ec 06 palignr $0x6,%xmm4,%xmm5
405e16: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
405e1a: 66 0f 3a 0f e3 06 palignr $0x6,%xmm3,%xmm4
405e20: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
405e24: 66 0f 3a 0f da 06 palignr $0x6,%xmm2,%xmm3
405e2a: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
405e2e: 66 0f 3a 0f d1 06 palignr $0x6,%xmm1,%xmm2
405e34: 0f 29 17 movaps %xmm2,(%rdi)
405e37: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
405e3e: 0f 83 6c ff ff ff jae 405db0 <__intel_ssse3_rep_memcpy+0x9a0>
405e44: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405e49: 48 81 c2 80 00 00 00 add $0x80,%rdx
405e50: 48 01 d7 add %rdx,%rdi
405e53: 48 01 d6 add %rdx,%rsi
405e56: 4c 8d 1d 1b 32 00 00 lea 0x321b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
405e5d: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405e61: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405e65: ff e2 jmpq *%rdx
405e67: 0f 0b ud2
405e69: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
405e70: 0f 28 4e fa movaps -0x6(%rsi),%xmm1
405e74: 0f 28 56 ea movaps -0x16(%rsi),%xmm2
405e78: 66 0f 3a 0f ca 06 palignr $0x6,%xmm2,%xmm1
405e7e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
405e82: 0f 28 5e da movaps -0x26(%rsi),%xmm3
405e86: 66 0f 3a 0f d3 06 palignr $0x6,%xmm3,%xmm2
405e8c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
405e90: 0f 28 66 ca movaps -0x36(%rsi),%xmm4
405e94: 66 0f 3a 0f dc 06 palignr $0x6,%xmm4,%xmm3
405e9a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
405e9e: 0f 28 6e ba movaps -0x46(%rsi),%xmm5
405ea2: 66 0f 3a 0f e5 06 palignr $0x6,%xmm5,%xmm4
405ea8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
405eac: 0f 28 76 aa movaps -0x56(%rsi),%xmm6
405eb0: 66 0f 3a 0f ee 06 palignr $0x6,%xmm6,%xmm5
405eb6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
405eba: 0f 28 7e 9a movaps -0x66(%rsi),%xmm7
405ebe: 66 0f 3a 0f f7 06 palignr $0x6,%xmm7,%xmm6
405ec4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
405ec8: 44 0f 28 46 8a movaps -0x76(%rsi),%xmm8
405ecd: 66 41 0f 3a 0f f8 06 palignr $0x6,%xmm8,%xmm7
405ed4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
405ed8: 44 0f 28 8e 7a ff ff movaps -0x86(%rsi),%xmm9
405edf: ff
405ee0: 66 45 0f 3a 0f c1 06 palignr $0x6,%xmm9,%xmm8
405ee7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
405eec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405ef3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
405ef7: 48 8d 76 80 lea -0x80(%rsi),%rsi
405efb: 0f 83 6f ff ff ff jae 405e70 <__intel_ssse3_rep_memcpy+0xa60>
405f01: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405f06: 48 81 c2 80 00 00 00 add $0x80,%rdx
405f0d: 48 29 d7 sub %rdx,%rdi
405f10: 48 29 d6 sub %rdx,%rsi
405f13: 4c 8d 1d 1e 2f 00 00 lea 0x2f1e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
405f1a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405f1e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405f22: ff e2 jmpq *%rdx
405f24: 0f 0b ud2
405f26: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
405f2d: 00 00 00
405f30: 48 81 ea 80 00 00 00 sub $0x80,%rdx
405f37: 0f 28 4e f9 movaps -0x7(%rsi),%xmm1
405f3b: 0f 28 56 09 movaps 0x9(%rsi),%xmm2
405f3f: 0f 28 5e 19 movaps 0x19(%rsi),%xmm3
405f43: 0f 28 66 29 movaps 0x29(%rsi),%xmm4
405f47: 0f 28 6e 39 movaps 0x39(%rsi),%xmm5
405f4b: 0f 28 76 49 movaps 0x49(%rsi),%xmm6
405f4f: 0f 28 7e 59 movaps 0x59(%rsi),%xmm7
405f53: 44 0f 28 46 69 movaps 0x69(%rsi),%xmm8
405f58: 44 0f 28 4e 79 movaps 0x79(%rsi),%xmm9
405f5d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
405f64: 66 45 0f 3a 0f c8 07 palignr $0x7,%xmm8,%xmm9
405f6b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
405f70: 66 44 0f 3a 0f c7 07 palignr $0x7,%xmm7,%xmm8
405f77: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
405f7c: 66 0f 3a 0f fe 07 palignr $0x7,%xmm6,%xmm7
405f82: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
405f86: 66 0f 3a 0f f5 07 palignr $0x7,%xmm5,%xmm6
405f8c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
405f90: 66 0f 3a 0f ec 07 palignr $0x7,%xmm4,%xmm5
405f96: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
405f9a: 66 0f 3a 0f e3 07 palignr $0x7,%xmm3,%xmm4
405fa0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
405fa4: 66 0f 3a 0f da 07 palignr $0x7,%xmm2,%xmm3
405faa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
405fae: 66 0f 3a 0f d1 07 palignr $0x7,%xmm1,%xmm2
405fb4: 0f 29 17 movaps %xmm2,(%rdi)
405fb7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
405fbe: 0f 83 6c ff ff ff jae 405f30 <__intel_ssse3_rep_memcpy+0xb20>
405fc4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
405fc9: 48 81 c2 80 00 00 00 add $0x80,%rdx
405fd0: 48 01 d7 add %rdx,%rdi
405fd3: 48 01 d6 add %rdx,%rsi
405fd6: 4c 8d 1d 9b 30 00 00 lea 0x309b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
405fdd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
405fe1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
405fe5: ff e2 jmpq *%rdx
405fe7: 0f 0b ud2
405fe9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
405ff0: 0f 28 4e f9 movaps -0x7(%rsi),%xmm1
405ff4: 0f 28 56 e9 movaps -0x17(%rsi),%xmm2
405ff8: 66 0f 3a 0f ca 07 palignr $0x7,%xmm2,%xmm1
405ffe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406002: 0f 28 5e d9 movaps -0x27(%rsi),%xmm3
406006: 66 0f 3a 0f d3 07 palignr $0x7,%xmm3,%xmm2
40600c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406010: 0f 28 66 c9 movaps -0x37(%rsi),%xmm4
406014: 66 0f 3a 0f dc 07 palignr $0x7,%xmm4,%xmm3
40601a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40601e: 0f 28 6e b9 movaps -0x47(%rsi),%xmm5
406022: 66 0f 3a 0f e5 07 palignr $0x7,%xmm5,%xmm4
406028: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
40602c: 0f 28 76 a9 movaps -0x57(%rsi),%xmm6
406030: 66 0f 3a 0f ee 07 palignr $0x7,%xmm6,%xmm5
406036: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
40603a: 0f 28 7e 99 movaps -0x67(%rsi),%xmm7
40603e: 66 0f 3a 0f f7 07 palignr $0x7,%xmm7,%xmm6
406044: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
406048: 44 0f 28 46 89 movaps -0x77(%rsi),%xmm8
40604d: 66 41 0f 3a 0f f8 07 palignr $0x7,%xmm8,%xmm7
406054: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
406058: 44 0f 28 8e 79 ff ff movaps -0x87(%rsi),%xmm9
40605f: ff
406060: 66 45 0f 3a 0f c1 07 palignr $0x7,%xmm9,%xmm8
406067: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
40606c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406073: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406077: 48 8d 76 80 lea -0x80(%rsi),%rsi
40607b: 0f 83 6f ff ff ff jae 405ff0 <__intel_ssse3_rep_memcpy+0xbe0>
406081: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406086: 48 81 c2 80 00 00 00 add $0x80,%rdx
40608d: 48 29 d7 sub %rdx,%rdi
406090: 48 29 d6 sub %rdx,%rsi
406093: 4c 8d 1d 9e 2d 00 00 lea 0x2d9e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40609a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40609e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4060a2: ff e2 jmpq *%rdx
4060a4: 0f 0b ud2
4060a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4060ad: 00 00 00
4060b0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4060b7: 0f 28 4e f8 movaps -0x8(%rsi),%xmm1
4060bb: 0f 28 56 08 movaps 0x8(%rsi),%xmm2
4060bf: 0f 28 5e 18 movaps 0x18(%rsi),%xmm3
4060c3: 0f 28 66 28 movaps 0x28(%rsi),%xmm4
4060c7: 0f 28 6e 38 movaps 0x38(%rsi),%xmm5
4060cb: 0f 28 76 48 movaps 0x48(%rsi),%xmm6
4060cf: 0f 28 7e 58 movaps 0x58(%rsi),%xmm7
4060d3: 44 0f 28 46 68 movaps 0x68(%rsi),%xmm8
4060d8: 44 0f 28 4e 78 movaps 0x78(%rsi),%xmm9
4060dd: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
4060e4: 66 45 0f 3a 0f c8 08 palignr $0x8,%xmm8,%xmm9
4060eb: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
4060f0: 66 44 0f 3a 0f c7 08 palignr $0x8,%xmm7,%xmm8
4060f7: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
4060fc: 66 0f 3a 0f fe 08 palignr $0x8,%xmm6,%xmm7
406102: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406106: 66 0f 3a 0f f5 08 palignr $0x8,%xmm5,%xmm6
40610c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406110: 66 0f 3a 0f ec 08 palignr $0x8,%xmm4,%xmm5
406116: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40611a: 66 0f 3a 0f e3 08 palignr $0x8,%xmm3,%xmm4
406120: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
406124: 66 0f 3a 0f da 08 palignr $0x8,%xmm2,%xmm3
40612a: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
40612e: 66 0f 3a 0f d1 08 palignr $0x8,%xmm1,%xmm2
406134: 0f 29 17 movaps %xmm2,(%rdi)
406137: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
40613e: 0f 83 6c ff ff ff jae 4060b0 <__intel_ssse3_rep_memcpy+0xca0>
406144: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406149: 48 81 c2 80 00 00 00 add $0x80,%rdx
406150: 48 01 d7 add %rdx,%rdi
406153: 48 01 d6 add %rdx,%rsi
406156: 4c 8d 1d 1b 2f 00 00 lea 0x2f1b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
40615d: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406161: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406165: ff e2 jmpq *%rdx
406167: 0f 0b ud2
406169: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
406170: 0f 28 4e f8 movaps -0x8(%rsi),%xmm1
406174: 0f 28 56 e8 movaps -0x18(%rsi),%xmm2
406178: 66 0f 3a 0f ca 08 palignr $0x8,%xmm2,%xmm1
40617e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406182: 0f 28 5e d8 movaps -0x28(%rsi),%xmm3
406186: 66 0f 3a 0f d3 08 palignr $0x8,%xmm3,%xmm2
40618c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406190: 0f 28 66 c8 movaps -0x38(%rsi),%xmm4
406194: 66 0f 3a 0f dc 08 palignr $0x8,%xmm4,%xmm3
40619a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40619e: 0f 28 6e b8 movaps -0x48(%rsi),%xmm5
4061a2: 66 0f 3a 0f e5 08 palignr $0x8,%xmm5,%xmm4
4061a8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
4061ac: 0f 28 76 a8 movaps -0x58(%rsi),%xmm6
4061b0: 66 0f 3a 0f ee 08 palignr $0x8,%xmm6,%xmm5
4061b6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
4061ba: 0f 28 7e 98 movaps -0x68(%rsi),%xmm7
4061be: 66 0f 3a 0f f7 08 palignr $0x8,%xmm7,%xmm6
4061c4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
4061c8: 44 0f 28 46 88 movaps -0x78(%rsi),%xmm8
4061cd: 66 41 0f 3a 0f f8 08 palignr $0x8,%xmm8,%xmm7
4061d4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
4061d8: 44 0f 28 8e 78 ff ff movaps -0x88(%rsi),%xmm9
4061df: ff
4061e0: 66 45 0f 3a 0f c1 08 palignr $0x8,%xmm9,%xmm8
4061e7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
4061ec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4061f3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
4061f7: 48 8d 76 80 lea -0x80(%rsi),%rsi
4061fb: 0f 83 6f ff ff ff jae 406170 <__intel_ssse3_rep_memcpy+0xd60>
406201: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406206: 48 81 c2 80 00 00 00 add $0x80,%rdx
40620d: 48 29 d7 sub %rdx,%rdi
406210: 48 29 d6 sub %rdx,%rsi
406213: 4c 8d 1d 1e 2c 00 00 lea 0x2c1e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40621a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40621e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406222: ff e2 jmpq *%rdx
406224: 0f 0b ud2
406226: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40622d: 00 00 00
406230: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406237: 0f 28 4e f7 movaps -0x9(%rsi),%xmm1
40623b: 0f 28 56 07 movaps 0x7(%rsi),%xmm2
40623f: 0f 28 5e 17 movaps 0x17(%rsi),%xmm3
406243: 0f 28 66 27 movaps 0x27(%rsi),%xmm4
406247: 0f 28 6e 37 movaps 0x37(%rsi),%xmm5
40624b: 0f 28 76 47 movaps 0x47(%rsi),%xmm6
40624f: 0f 28 7e 57 movaps 0x57(%rsi),%xmm7
406253: 44 0f 28 46 67 movaps 0x67(%rsi),%xmm8
406258: 44 0f 28 4e 77 movaps 0x77(%rsi),%xmm9
40625d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
406264: 66 45 0f 3a 0f c8 09 palignr $0x9,%xmm8,%xmm9
40626b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
406270: 66 44 0f 3a 0f c7 09 palignr $0x9,%xmm7,%xmm8
406277: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
40627c: 66 0f 3a 0f fe 09 palignr $0x9,%xmm6,%xmm7
406282: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406286: 66 0f 3a 0f f5 09 palignr $0x9,%xmm5,%xmm6
40628c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406290: 66 0f 3a 0f ec 09 palignr $0x9,%xmm4,%xmm5
406296: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40629a: 66 0f 3a 0f e3 09 palignr $0x9,%xmm3,%xmm4
4062a0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
4062a4: 66 0f 3a 0f da 09 palignr $0x9,%xmm2,%xmm3
4062aa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
4062ae: 66 0f 3a 0f d1 09 palignr $0x9,%xmm1,%xmm2
4062b4: 0f 29 17 movaps %xmm2,(%rdi)
4062b7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
4062be: 0f 83 6c ff ff ff jae 406230 <__intel_ssse3_rep_memcpy+0xe20>
4062c4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
4062c9: 48 81 c2 80 00 00 00 add $0x80,%rdx
4062d0: 48 01 d7 add %rdx,%rdi
4062d3: 48 01 d6 add %rdx,%rsi
4062d6: 4c 8d 1d 9b 2d 00 00 lea 0x2d9b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
4062dd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
4062e1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4062e5: ff e2 jmpq *%rdx
4062e7: 0f 0b ud2
4062e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4062f0: 0f 28 4e f7 movaps -0x9(%rsi),%xmm1
4062f4: 0f 28 56 e7 movaps -0x19(%rsi),%xmm2
4062f8: 66 0f 3a 0f ca 09 palignr $0x9,%xmm2,%xmm1
4062fe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406302: 0f 28 5e d7 movaps -0x29(%rsi),%xmm3
406306: 66 0f 3a 0f d3 09 palignr $0x9,%xmm3,%xmm2
40630c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406310: 0f 28 66 c7 movaps -0x39(%rsi),%xmm4
406314: 66 0f 3a 0f dc 09 palignr $0x9,%xmm4,%xmm3
40631a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40631e: 0f 28 6e b7 movaps -0x49(%rsi),%xmm5
406322: 66 0f 3a 0f e5 09 palignr $0x9,%xmm5,%xmm4
406328: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
40632c: 0f 28 76 a7 movaps -0x59(%rsi),%xmm6
406330: 66 0f 3a 0f ee 09 palignr $0x9,%xmm6,%xmm5
406336: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
40633a: 0f 28 7e 97 movaps -0x69(%rsi),%xmm7
40633e: 66 0f 3a 0f f7 09 palignr $0x9,%xmm7,%xmm6
406344: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
406348: 44 0f 28 46 87 movaps -0x79(%rsi),%xmm8
40634d: 66 41 0f 3a 0f f8 09 palignr $0x9,%xmm8,%xmm7
406354: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
406358: 44 0f 28 8e 77 ff ff movaps -0x89(%rsi),%xmm9
40635f: ff
406360: 66 45 0f 3a 0f c1 09 palignr $0x9,%xmm9,%xmm8
406367: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
40636c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406373: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406377: 48 8d 76 80 lea -0x80(%rsi),%rsi
40637b: 0f 83 6f ff ff ff jae 4062f0 <__intel_ssse3_rep_memcpy+0xee0>
406381: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406386: 48 81 c2 80 00 00 00 add $0x80,%rdx
40638d: 48 29 d7 sub %rdx,%rdi
406390: 48 29 d6 sub %rdx,%rsi
406393: 4c 8d 1d 9e 2a 00 00 lea 0x2a9e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40639a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40639e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4063a2: ff e2 jmpq *%rdx
4063a4: 0f 0b ud2
4063a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4063ad: 00 00 00
4063b0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4063b7: 0f 28 4e f6 movaps -0xa(%rsi),%xmm1
4063bb: 0f 28 56 06 movaps 0x6(%rsi),%xmm2
4063bf: 0f 28 5e 16 movaps 0x16(%rsi),%xmm3
4063c3: 0f 28 66 26 movaps 0x26(%rsi),%xmm4
4063c7: 0f 28 6e 36 movaps 0x36(%rsi),%xmm5
4063cb: 0f 28 76 46 movaps 0x46(%rsi),%xmm6
4063cf: 0f 28 7e 56 movaps 0x56(%rsi),%xmm7
4063d3: 44 0f 28 46 66 movaps 0x66(%rsi),%xmm8
4063d8: 44 0f 28 4e 76 movaps 0x76(%rsi),%xmm9
4063dd: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
4063e4: 66 45 0f 3a 0f c8 0a palignr $0xa,%xmm8,%xmm9
4063eb: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
4063f0: 66 44 0f 3a 0f c7 0a palignr $0xa,%xmm7,%xmm8
4063f7: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
4063fc: 66 0f 3a 0f fe 0a palignr $0xa,%xmm6,%xmm7
406402: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406406: 66 0f 3a 0f f5 0a palignr $0xa,%xmm5,%xmm6
40640c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406410: 66 0f 3a 0f ec 0a palignr $0xa,%xmm4,%xmm5
406416: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40641a: 66 0f 3a 0f e3 0a palignr $0xa,%xmm3,%xmm4
406420: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
406424: 66 0f 3a 0f da 0a palignr $0xa,%xmm2,%xmm3
40642a: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
40642e: 66 0f 3a 0f d1 0a palignr $0xa,%xmm1,%xmm2
406434: 0f 29 17 movaps %xmm2,(%rdi)
406437: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
40643e: 0f 83 6c ff ff ff jae 4063b0 <__intel_ssse3_rep_memcpy+0xfa0>
406444: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406449: 48 81 c2 80 00 00 00 add $0x80,%rdx
406450: 48 01 d7 add %rdx,%rdi
406453: 48 01 d6 add %rdx,%rsi
406456: 4c 8d 1d 1b 2c 00 00 lea 0x2c1b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
40645d: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406461: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406465: ff e2 jmpq *%rdx
406467: 0f 0b ud2
406469: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
406470: 0f 28 4e f6 movaps -0xa(%rsi),%xmm1
406474: 0f 28 56 e6 movaps -0x1a(%rsi),%xmm2
406478: 66 0f 3a 0f ca 0a palignr $0xa,%xmm2,%xmm1
40647e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406482: 0f 28 5e d6 movaps -0x2a(%rsi),%xmm3
406486: 66 0f 3a 0f d3 0a palignr $0xa,%xmm3,%xmm2
40648c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406490: 0f 28 66 c6 movaps -0x3a(%rsi),%xmm4
406494: 66 0f 3a 0f dc 0a palignr $0xa,%xmm4,%xmm3
40649a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40649e: 0f 28 6e b6 movaps -0x4a(%rsi),%xmm5
4064a2: 66 0f 3a 0f e5 0a palignr $0xa,%xmm5,%xmm4
4064a8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
4064ac: 0f 28 76 a6 movaps -0x5a(%rsi),%xmm6
4064b0: 66 0f 3a 0f ee 0a palignr $0xa,%xmm6,%xmm5
4064b6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
4064ba: 0f 28 7e 96 movaps -0x6a(%rsi),%xmm7
4064be: 66 0f 3a 0f f7 0a palignr $0xa,%xmm7,%xmm6
4064c4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
4064c8: 44 0f 28 46 86 movaps -0x7a(%rsi),%xmm8
4064cd: 66 41 0f 3a 0f f8 0a palignr $0xa,%xmm8,%xmm7
4064d4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
4064d8: 44 0f 28 8e 76 ff ff movaps -0x8a(%rsi),%xmm9
4064df: ff
4064e0: 66 45 0f 3a 0f c1 0a palignr $0xa,%xmm9,%xmm8
4064e7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
4064ec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4064f3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
4064f7: 48 8d 76 80 lea -0x80(%rsi),%rsi
4064fb: 0f 83 6f ff ff ff jae 406470 <__intel_ssse3_rep_memcpy+0x1060>
406501: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406506: 48 81 c2 80 00 00 00 add $0x80,%rdx
40650d: 48 29 d7 sub %rdx,%rdi
406510: 48 29 d6 sub %rdx,%rsi
406513: 4c 8d 1d 1e 29 00 00 lea 0x291e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40651a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40651e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406522: ff e2 jmpq *%rdx
406524: 0f 0b ud2
406526: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40652d: 00 00 00
406530: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406537: 0f 28 4e f5 movaps -0xb(%rsi),%xmm1
40653b: 0f 28 56 05 movaps 0x5(%rsi),%xmm2
40653f: 0f 28 5e 15 movaps 0x15(%rsi),%xmm3
406543: 0f 28 66 25 movaps 0x25(%rsi),%xmm4
406547: 0f 28 6e 35 movaps 0x35(%rsi),%xmm5
40654b: 0f 28 76 45 movaps 0x45(%rsi),%xmm6
40654f: 0f 28 7e 55 movaps 0x55(%rsi),%xmm7
406553: 44 0f 28 46 65 movaps 0x65(%rsi),%xmm8
406558: 44 0f 28 4e 75 movaps 0x75(%rsi),%xmm9
40655d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
406564: 66 45 0f 3a 0f c8 0b palignr $0xb,%xmm8,%xmm9
40656b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
406570: 66 44 0f 3a 0f c7 0b palignr $0xb,%xmm7,%xmm8
406577: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
40657c: 66 0f 3a 0f fe 0b palignr $0xb,%xmm6,%xmm7
406582: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406586: 66 0f 3a 0f f5 0b palignr $0xb,%xmm5,%xmm6
40658c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406590: 66 0f 3a 0f ec 0b palignr $0xb,%xmm4,%xmm5
406596: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40659a: 66 0f 3a 0f e3 0b palignr $0xb,%xmm3,%xmm4
4065a0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
4065a4: 66 0f 3a 0f da 0b palignr $0xb,%xmm2,%xmm3
4065aa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
4065ae: 66 0f 3a 0f d1 0b palignr $0xb,%xmm1,%xmm2
4065b4: 0f 29 17 movaps %xmm2,(%rdi)
4065b7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
4065be: 0f 83 6c ff ff ff jae 406530 <__intel_ssse3_rep_memcpy+0x1120>
4065c4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
4065c9: 48 81 c2 80 00 00 00 add $0x80,%rdx
4065d0: 48 01 d7 add %rdx,%rdi
4065d3: 48 01 d6 add %rdx,%rsi
4065d6: 4c 8d 1d 9b 2a 00 00 lea 0x2a9b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
4065dd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
4065e1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4065e5: ff e2 jmpq *%rdx
4065e7: 0f 0b ud2
4065e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4065f0: 0f 28 4e f5 movaps -0xb(%rsi),%xmm1
4065f4: 0f 28 56 e5 movaps -0x1b(%rsi),%xmm2
4065f8: 66 0f 3a 0f ca 0b palignr $0xb,%xmm2,%xmm1
4065fe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406602: 0f 28 5e d5 movaps -0x2b(%rsi),%xmm3
406606: 66 0f 3a 0f d3 0b palignr $0xb,%xmm3,%xmm2
40660c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406610: 0f 28 66 c5 movaps -0x3b(%rsi),%xmm4
406614: 66 0f 3a 0f dc 0b palignr $0xb,%xmm4,%xmm3
40661a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40661e: 0f 28 6e b5 movaps -0x4b(%rsi),%xmm5
406622: 66 0f 3a 0f e5 0b palignr $0xb,%xmm5,%xmm4
406628: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
40662c: 0f 28 76 a5 movaps -0x5b(%rsi),%xmm6
406630: 66 0f 3a 0f ee 0b palignr $0xb,%xmm6,%xmm5
406636: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
40663a: 0f 28 7e 95 movaps -0x6b(%rsi),%xmm7
40663e: 66 0f 3a 0f f7 0b palignr $0xb,%xmm7,%xmm6
406644: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
406648: 44 0f 28 46 85 movaps -0x7b(%rsi),%xmm8
40664d: 66 41 0f 3a 0f f8 0b palignr $0xb,%xmm8,%xmm7
406654: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
406658: 44 0f 28 8e 75 ff ff movaps -0x8b(%rsi),%xmm9
40665f: ff
406660: 66 45 0f 3a 0f c1 0b palignr $0xb,%xmm9,%xmm8
406667: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
40666c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406673: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406677: 48 8d 76 80 lea -0x80(%rsi),%rsi
40667b: 0f 83 6f ff ff ff jae 4065f0 <__intel_ssse3_rep_memcpy+0x11e0>
406681: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406686: 48 81 c2 80 00 00 00 add $0x80,%rdx
40668d: 48 29 d7 sub %rdx,%rdi
406690: 48 29 d6 sub %rdx,%rsi
406693: 4c 8d 1d 9e 27 00 00 lea 0x279e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40669a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40669e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4066a2: ff e2 jmpq *%rdx
4066a4: 0f 0b ud2
4066a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4066ad: 00 00 00
4066b0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4066b7: 66 0f 6f 4e f4 movdqa -0xc(%rsi),%xmm1
4066bc: 0f 28 56 04 movaps 0x4(%rsi),%xmm2
4066c0: 0f 28 5e 14 movaps 0x14(%rsi),%xmm3
4066c4: 0f 28 66 24 movaps 0x24(%rsi),%xmm4
4066c8: 0f 28 6e 34 movaps 0x34(%rsi),%xmm5
4066cc: 0f 28 76 44 movaps 0x44(%rsi),%xmm6
4066d0: 0f 28 7e 54 movaps 0x54(%rsi),%xmm7
4066d4: 44 0f 28 46 64 movaps 0x64(%rsi),%xmm8
4066d9: 44 0f 28 4e 74 movaps 0x74(%rsi),%xmm9
4066de: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
4066e5: 66 45 0f 3a 0f c8 0c palignr $0xc,%xmm8,%xmm9
4066ec: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
4066f1: 66 44 0f 3a 0f c7 0c palignr $0xc,%xmm7,%xmm8
4066f8: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
4066fd: 66 0f 3a 0f fe 0c palignr $0xc,%xmm6,%xmm7
406703: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406707: 66 0f 3a 0f f5 0c palignr $0xc,%xmm5,%xmm6
40670d: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406711: 66 0f 3a 0f ec 0c palignr $0xc,%xmm4,%xmm5
406717: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40671b: 66 0f 3a 0f e3 0c palignr $0xc,%xmm3,%xmm4
406721: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
406725: 66 0f 3a 0f da 0c palignr $0xc,%xmm2,%xmm3
40672b: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
40672f: 66 0f 3a 0f d1 0c palignr $0xc,%xmm1,%xmm2
406735: 0f 29 17 movaps %xmm2,(%rdi)
406738: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
40673f: 0f 83 6b ff ff ff jae 4066b0 <__intel_ssse3_rep_memcpy+0x12a0>
406745: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
40674a: 48 81 c2 80 00 00 00 add $0x80,%rdx
406751: 48 01 d7 add %rdx,%rdi
406754: 48 01 d6 add %rdx,%rsi
406757: 4c 8d 1d 1a 29 00 00 lea 0x291a(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
40675e: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406762: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406766: ff e2 jmpq *%rdx
406768: 0f 0b ud2
40676a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
406770: 0f 28 4e f4 movaps -0xc(%rsi),%xmm1
406774: 0f 28 56 e4 movaps -0x1c(%rsi),%xmm2
406778: 66 0f 3a 0f ca 0c palignr $0xc,%xmm2,%xmm1
40677e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406782: 0f 28 5e d4 movaps -0x2c(%rsi),%xmm3
406786: 66 0f 3a 0f d3 0c palignr $0xc,%xmm3,%xmm2
40678c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406790: 0f 28 66 c4 movaps -0x3c(%rsi),%xmm4
406794: 66 0f 3a 0f dc 0c palignr $0xc,%xmm4,%xmm3
40679a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40679e: 0f 28 6e b4 movaps -0x4c(%rsi),%xmm5
4067a2: 66 0f 3a 0f e5 0c palignr $0xc,%xmm5,%xmm4
4067a8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
4067ac: 0f 28 76 a4 movaps -0x5c(%rsi),%xmm6
4067b0: 66 0f 3a 0f ee 0c palignr $0xc,%xmm6,%xmm5
4067b6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
4067ba: 0f 28 7e 94 movaps -0x6c(%rsi),%xmm7
4067be: 66 0f 3a 0f f7 0c palignr $0xc,%xmm7,%xmm6
4067c4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
4067c8: 44 0f 28 46 84 movaps -0x7c(%rsi),%xmm8
4067cd: 66 41 0f 3a 0f f8 0c palignr $0xc,%xmm8,%xmm7
4067d4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
4067d8: 44 0f 28 8e 74 ff ff movaps -0x8c(%rsi),%xmm9
4067df: ff
4067e0: 66 45 0f 3a 0f c1 0c palignr $0xc,%xmm9,%xmm8
4067e7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
4067ec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4067f3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
4067f7: 48 8d 76 80 lea -0x80(%rsi),%rsi
4067fb: 0f 83 6f ff ff ff jae 406770 <__intel_ssse3_rep_memcpy+0x1360>
406801: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406806: 48 81 c2 80 00 00 00 add $0x80,%rdx
40680d: 48 29 d7 sub %rdx,%rdi
406810: 48 29 d6 sub %rdx,%rsi
406813: 4c 8d 1d 1e 26 00 00 lea 0x261e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40681a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40681e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406822: ff e2 jmpq *%rdx
406824: 0f 0b ud2
406826: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40682d: 00 00 00
406830: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406837: 0f 28 4e f3 movaps -0xd(%rsi),%xmm1
40683b: 0f 28 56 03 movaps 0x3(%rsi),%xmm2
40683f: 0f 28 5e 13 movaps 0x13(%rsi),%xmm3
406843: 0f 28 66 23 movaps 0x23(%rsi),%xmm4
406847: 0f 28 6e 33 movaps 0x33(%rsi),%xmm5
40684b: 0f 28 76 43 movaps 0x43(%rsi),%xmm6
40684f: 0f 28 7e 53 movaps 0x53(%rsi),%xmm7
406853: 44 0f 28 46 63 movaps 0x63(%rsi),%xmm8
406858: 44 0f 28 4e 73 movaps 0x73(%rsi),%xmm9
40685d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
406864: 66 45 0f 3a 0f c8 0d palignr $0xd,%xmm8,%xmm9
40686b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
406870: 66 44 0f 3a 0f c7 0d palignr $0xd,%xmm7,%xmm8
406877: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
40687c: 66 0f 3a 0f fe 0d palignr $0xd,%xmm6,%xmm7
406882: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406886: 66 0f 3a 0f f5 0d palignr $0xd,%xmm5,%xmm6
40688c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406890: 66 0f 3a 0f ec 0d palignr $0xd,%xmm4,%xmm5
406896: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
40689a: 66 0f 3a 0f e3 0d palignr $0xd,%xmm3,%xmm4
4068a0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
4068a4: 66 0f 3a 0f da 0d palignr $0xd,%xmm2,%xmm3
4068aa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
4068ae: 66 0f 3a 0f d1 0d palignr $0xd,%xmm1,%xmm2
4068b4: 0f 29 17 movaps %xmm2,(%rdi)
4068b7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
4068be: 0f 83 6c ff ff ff jae 406830 <__intel_ssse3_rep_memcpy+0x1420>
4068c4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
4068c9: 48 81 c2 80 00 00 00 add $0x80,%rdx
4068d0: 48 01 d7 add %rdx,%rdi
4068d3: 48 01 d6 add %rdx,%rsi
4068d6: 4c 8d 1d 9b 27 00 00 lea 0x279b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
4068dd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
4068e1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4068e5: ff e2 jmpq *%rdx
4068e7: 0f 0b ud2
4068e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4068f0: 0f 28 4e f3 movaps -0xd(%rsi),%xmm1
4068f4: 0f 28 56 e3 movaps -0x1d(%rsi),%xmm2
4068f8: 66 0f 3a 0f ca 0d palignr $0xd,%xmm2,%xmm1
4068fe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406902: 0f 28 5e d3 movaps -0x2d(%rsi),%xmm3
406906: 66 0f 3a 0f d3 0d palignr $0xd,%xmm3,%xmm2
40690c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406910: 0f 28 66 c3 movaps -0x3d(%rsi),%xmm4
406914: 66 0f 3a 0f dc 0d palignr $0xd,%xmm4,%xmm3
40691a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
40691e: 0f 28 6e b3 movaps -0x4d(%rsi),%xmm5
406922: 66 0f 3a 0f e5 0d palignr $0xd,%xmm5,%xmm4
406928: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
40692c: 0f 28 76 a3 movaps -0x5d(%rsi),%xmm6
406930: 66 0f 3a 0f ee 0d palignr $0xd,%xmm6,%xmm5
406936: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
40693a: 0f 28 7e 93 movaps -0x6d(%rsi),%xmm7
40693e: 66 0f 3a 0f f7 0d palignr $0xd,%xmm7,%xmm6
406944: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
406948: 44 0f 28 46 83 movaps -0x7d(%rsi),%xmm8
40694d: 66 41 0f 3a 0f f8 0d palignr $0xd,%xmm8,%xmm7
406954: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
406958: 44 0f 28 8e 73 ff ff movaps -0x8d(%rsi),%xmm9
40695f: ff
406960: 66 45 0f 3a 0f c1 0d palignr $0xd,%xmm9,%xmm8
406967: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
40696c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406973: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406977: 48 8d 76 80 lea -0x80(%rsi),%rsi
40697b: 0f 83 6f ff ff ff jae 4068f0 <__intel_ssse3_rep_memcpy+0x14e0>
406981: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406986: 48 81 c2 80 00 00 00 add $0x80,%rdx
40698d: 48 29 d7 sub %rdx,%rdi
406990: 48 29 d6 sub %rdx,%rsi
406993: 4c 8d 1d 9e 24 00 00 lea 0x249e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
40699a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
40699e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
4069a2: ff e2 jmpq *%rdx
4069a4: 0f 0b ud2
4069a6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4069ad: 00 00 00
4069b0: 48 81 ea 80 00 00 00 sub $0x80,%rdx
4069b7: 0f 28 4e f2 movaps -0xe(%rsi),%xmm1
4069bb: 0f 28 56 02 movaps 0x2(%rsi),%xmm2
4069bf: 0f 28 5e 12 movaps 0x12(%rsi),%xmm3
4069c3: 0f 28 66 22 movaps 0x22(%rsi),%xmm4
4069c7: 0f 28 6e 32 movaps 0x32(%rsi),%xmm5
4069cb: 0f 28 76 42 movaps 0x42(%rsi),%xmm6
4069cf: 0f 28 7e 52 movaps 0x52(%rsi),%xmm7
4069d3: 44 0f 28 46 62 movaps 0x62(%rsi),%xmm8
4069d8: 44 0f 28 4e 72 movaps 0x72(%rsi),%xmm9
4069dd: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
4069e4: 66 45 0f 3a 0f c8 0e palignr $0xe,%xmm8,%xmm9
4069eb: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
4069f0: 66 44 0f 3a 0f c7 0e palignr $0xe,%xmm7,%xmm8
4069f7: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
4069fc: 66 0f 3a 0f fe 0e palignr $0xe,%xmm6,%xmm7
406a02: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406a06: 66 0f 3a 0f f5 0e palignr $0xe,%xmm5,%xmm6
406a0c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406a10: 66 0f 3a 0f ec 0e palignr $0xe,%xmm4,%xmm5
406a16: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
406a1a: 66 0f 3a 0f e3 0e palignr $0xe,%xmm3,%xmm4
406a20: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
406a24: 66 0f 3a 0f da 0e palignr $0xe,%xmm2,%xmm3
406a2a: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
406a2e: 66 0f 3a 0f d1 0e palignr $0xe,%xmm1,%xmm2
406a34: 0f 29 17 movaps %xmm2,(%rdi)
406a37: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
406a3e: 0f 83 6c ff ff ff jae 4069b0 <__intel_ssse3_rep_memcpy+0x15a0>
406a44: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406a49: 48 81 c2 80 00 00 00 add $0x80,%rdx
406a50: 48 01 d7 add %rdx,%rdi
406a53: 48 01 d6 add %rdx,%rsi
406a56: 4c 8d 1d 1b 26 00 00 lea 0x261b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
406a5d: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406a61: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406a65: ff e2 jmpq *%rdx
406a67: 0f 0b ud2
406a69: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
406a70: 0f 28 4e f2 movaps -0xe(%rsi),%xmm1
406a74: 0f 28 56 e2 movaps -0x1e(%rsi),%xmm2
406a78: 66 0f 3a 0f ca 0e palignr $0xe,%xmm2,%xmm1
406a7e: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406a82: 0f 28 5e d2 movaps -0x2e(%rsi),%xmm3
406a86: 66 0f 3a 0f d3 0e palignr $0xe,%xmm3,%xmm2
406a8c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406a90: 0f 28 66 c2 movaps -0x3e(%rsi),%xmm4
406a94: 66 0f 3a 0f dc 0e palignr $0xe,%xmm4,%xmm3
406a9a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
406a9e: 0f 28 6e b2 movaps -0x4e(%rsi),%xmm5
406aa2: 66 0f 3a 0f e5 0e palignr $0xe,%xmm5,%xmm4
406aa8: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
406aac: 0f 28 76 a2 movaps -0x5e(%rsi),%xmm6
406ab0: 66 0f 3a 0f ee 0e palignr $0xe,%xmm6,%xmm5
406ab6: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
406aba: 0f 28 7e 92 movaps -0x6e(%rsi),%xmm7
406abe: 66 0f 3a 0f f7 0e palignr $0xe,%xmm7,%xmm6
406ac4: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
406ac8: 44 0f 28 46 82 movaps -0x7e(%rsi),%xmm8
406acd: 66 41 0f 3a 0f f8 0e palignr $0xe,%xmm8,%xmm7
406ad4: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
406ad8: 44 0f 28 8e 72 ff ff movaps -0x8e(%rsi),%xmm9
406adf: ff
406ae0: 66 45 0f 3a 0f c1 0e palignr $0xe,%xmm9,%xmm8
406ae7: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
406aec: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406af3: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406af7: 48 8d 76 80 lea -0x80(%rsi),%rsi
406afb: 0f 83 6f ff ff ff jae 406a70 <__intel_ssse3_rep_memcpy+0x1660>
406b01: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406b06: 48 81 c2 80 00 00 00 add $0x80,%rdx
406b0d: 48 29 d7 sub %rdx,%rdi
406b10: 48 29 d6 sub %rdx,%rsi
406b13: 4c 8d 1d 1e 23 00 00 lea 0x231e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
406b1a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406b1e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406b22: ff e2 jmpq *%rdx
406b24: 0f 0b ud2
406b26: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
406b2d: 00 00 00
406b30: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406b37: 0f 28 4e f1 movaps -0xf(%rsi),%xmm1
406b3b: 0f 28 56 01 movaps 0x1(%rsi),%xmm2
406b3f: 0f 28 5e 11 movaps 0x11(%rsi),%xmm3
406b43: 0f 28 66 21 movaps 0x21(%rsi),%xmm4
406b47: 0f 28 6e 31 movaps 0x31(%rsi),%xmm5
406b4b: 0f 28 76 41 movaps 0x41(%rsi),%xmm6
406b4f: 0f 28 7e 51 movaps 0x51(%rsi),%xmm7
406b53: 44 0f 28 46 61 movaps 0x61(%rsi),%xmm8
406b58: 44 0f 28 4e 71 movaps 0x71(%rsi),%xmm9
406b5d: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
406b64: 66 45 0f 3a 0f c8 0f palignr $0xf,%xmm8,%xmm9
406b6b: 44 0f 29 4f 70 movaps %xmm9,0x70(%rdi)
406b70: 66 44 0f 3a 0f c7 0f palignr $0xf,%xmm7,%xmm8
406b77: 44 0f 29 47 60 movaps %xmm8,0x60(%rdi)
406b7c: 66 0f 3a 0f fe 0f palignr $0xf,%xmm6,%xmm7
406b82: 0f 29 7f 50 movaps %xmm7,0x50(%rdi)
406b86: 66 0f 3a 0f f5 0f palignr $0xf,%xmm5,%xmm6
406b8c: 0f 29 77 40 movaps %xmm6,0x40(%rdi)
406b90: 66 0f 3a 0f ec 0f palignr $0xf,%xmm4,%xmm5
406b96: 0f 29 6f 30 movaps %xmm5,0x30(%rdi)
406b9a: 66 0f 3a 0f e3 0f palignr $0xf,%xmm3,%xmm4
406ba0: 0f 29 67 20 movaps %xmm4,0x20(%rdi)
406ba4: 66 0f 3a 0f da 0f palignr $0xf,%xmm2,%xmm3
406baa: 0f 29 5f 10 movaps %xmm3,0x10(%rdi)
406bae: 66 0f 3a 0f d1 0f palignr $0xf,%xmm1,%xmm2
406bb4: 0f 29 17 movaps %xmm2,(%rdi)
406bb7: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
406bbe: 0f 83 6c ff ff ff jae 406b30 <__intel_ssse3_rep_memcpy+0x1720>
406bc4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406bc9: 48 81 c2 80 00 00 00 add $0x80,%rdx
406bd0: 48 01 d7 add %rdx,%rdi
406bd3: 48 01 d6 add %rdx,%rsi
406bd6: 4c 8d 1d 9b 24 00 00 lea 0x249b(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
406bdd: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406be1: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406be5: ff e2 jmpq *%rdx
406be7: 0f 0b ud2
406be9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
406bf0: 0f 28 4e f1 movaps -0xf(%rsi),%xmm1
406bf4: 0f 28 56 e1 movaps -0x1f(%rsi),%xmm2
406bf8: 66 0f 3a 0f ca 0f palignr $0xf,%xmm2,%xmm1
406bfe: 0f 29 4f f0 movaps %xmm1,-0x10(%rdi)
406c02: 0f 28 5e d1 movaps -0x2f(%rsi),%xmm3
406c06: 66 0f 3a 0f d3 0f palignr $0xf,%xmm3,%xmm2
406c0c: 0f 29 57 e0 movaps %xmm2,-0x20(%rdi)
406c10: 0f 28 66 c1 movaps -0x3f(%rsi),%xmm4
406c14: 66 0f 3a 0f dc 0f palignr $0xf,%xmm4,%xmm3
406c1a: 0f 29 5f d0 movaps %xmm3,-0x30(%rdi)
406c1e: 0f 28 6e b1 movaps -0x4f(%rsi),%xmm5
406c22: 66 0f 3a 0f e5 0f palignr $0xf,%xmm5,%xmm4
406c28: 0f 29 67 c0 movaps %xmm4,-0x40(%rdi)
406c2c: 0f 28 76 a1 movaps -0x5f(%rsi),%xmm6
406c30: 66 0f 3a 0f ee 0f palignr $0xf,%xmm6,%xmm5
406c36: 0f 29 6f b0 movaps %xmm5,-0x50(%rdi)
406c3a: 0f 28 7e 91 movaps -0x6f(%rsi),%xmm7
406c3e: 66 0f 3a 0f f7 0f palignr $0xf,%xmm7,%xmm6
406c44: 0f 29 77 a0 movaps %xmm6,-0x60(%rdi)
406c48: 44 0f 28 46 81 movaps -0x7f(%rsi),%xmm8
406c4d: 66 41 0f 3a 0f f8 0f palignr $0xf,%xmm8,%xmm7
406c54: 0f 29 7f 90 movaps %xmm7,-0x70(%rdi)
406c58: 44 0f 28 8e 71 ff ff movaps -0x8f(%rsi),%xmm9
406c5f: ff
406c60: 66 45 0f 3a 0f c1 0f palignr $0xf,%xmm9,%xmm8
406c67: 44 0f 29 47 80 movaps %xmm8,-0x80(%rdi)
406c6c: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406c73: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406c77: 48 8d 76 80 lea -0x80(%rsi),%rsi
406c7b: 0f 83 6f ff ff ff jae 406bf0 <__intel_ssse3_rep_memcpy+0x17e0>
406c81: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406c86: 48 81 c2 80 00 00 00 add $0x80,%rdx
406c8d: 48 29 d7 sub %rdx,%rdi
406c90: 48 29 d6 sub %rdx,%rsi
406c93: 4c 8d 1d 9e 21 00 00 lea 0x219e(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
406c9a: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406c9e: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406ca2: ff e2 jmpq *%rdx
406ca4: 0f 0b ud2
406ca6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
406cad: 00 00 00
406cb0: f3 0f 6f 0e movdqu (%rsi),%xmm1
406cb4: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406cb9: 66 0f 7f 0f movdqa %xmm1,(%rdi)
406cbd: 48 83 ea 10 sub $0x10,%rdx
406cc1: 48 83 c6 10 add $0x10,%rsi
406cc5: 48 83 c7 10 add $0x10,%rdi
406cc9: 8b 0d 35 3e 20 00 mov 0x203e35(%rip),%ecx # 60ab04 <__libirc_largest_cache_size_half>
406ccf: 48 39 ca cmp %rcx,%rdx
406cd2: 77 03 ja 406cd7 <__intel_ssse3_rep_memcpy+0x18c7>
406cd4: 48 89 d1 mov %rdx,%rcx
406cd7: 48 29 ca sub %rcx,%rdx
406cda: 48 81 fa 00 10 00 00 cmp $0x1000,%rdx
406ce1: 0f 86 a6 00 00 00 jbe 406d8d <__intel_ssse3_rep_memcpy+0x197d>
406ce7: 49 89 c9 mov %rcx,%r9
406cea: 49 c1 e1 03 shl $0x3,%r9
406cee: 4c 39 ca cmp %r9,%rdx
406cf1: 76 06 jbe 406cf9 <__intel_ssse3_rep_memcpy+0x18e9>
406cf3: 48 01 ca add %rcx,%rdx
406cf6: 48 31 c9 xor %rcx,%rcx
406cf9: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406d00: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406d07: 0f 18 8e 00 02 00 00 prefetcht0 0x200(%rsi)
406d0e: 0f 18 8e 00 03 00 00 prefetcht0 0x300(%rsi)
406d15: f3 0f 6f 06 movdqu (%rsi),%xmm0
406d19: f3 0f 6f 4e 10 movdqu 0x10(%rsi),%xmm1
406d1e: f3 0f 6f 56 20 movdqu 0x20(%rsi),%xmm2
406d23: f3 0f 6f 5e 30 movdqu 0x30(%rsi),%xmm3
406d28: f3 0f 6f 66 40 movdqu 0x40(%rsi),%xmm4
406d2d: f3 0f 6f 6e 50 movdqu 0x50(%rsi),%xmm5
406d32: f3 0f 6f 76 60 movdqu 0x60(%rsi),%xmm6
406d37: f3 0f 6f 7e 70 movdqu 0x70(%rsi),%xmm7
406d3c: 0f ae e8 lfence
406d3f: 66 0f e7 07 movntdq %xmm0,(%rdi)
406d43: 66 0f e7 4f 10 movntdq %xmm1,0x10(%rdi)
406d48: 66 0f e7 57 20 movntdq %xmm2,0x20(%rdi)
406d4d: 66 0f e7 5f 30 movntdq %xmm3,0x30(%rdi)
406d52: 66 0f e7 67 40 movntdq %xmm4,0x40(%rdi)
406d57: 66 0f e7 6f 50 movntdq %xmm5,0x50(%rdi)
406d5c: 66 0f e7 77 60 movntdq %xmm6,0x60(%rdi)
406d61: 66 0f e7 7f 70 movntdq %xmm7,0x70(%rdi)
406d66: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
406d6d: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
406d74: 73 8a jae 406d00 <__intel_ssse3_rep_memcpy+0x18f0>
406d76: 0f ae f8 sfence
406d79: 48 81 f9 80 00 00 00 cmp $0x80,%rcx
406d80: 0f 82 96 00 00 00 jb 406e1c <__intel_ssse3_rep_memcpy+0x1a0c>
406d86: 48 81 c2 80 00 00 00 add $0x80,%rdx
406d8d: 48 01 ca add %rcx,%rdx
406d90: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406d97: 0f 18 86 c0 01 00 00 prefetchnta 0x1c0(%rsi)
406d9e: 0f 18 86 80 02 00 00 prefetchnta 0x280(%rsi)
406da5: 0f 18 87 c0 01 00 00 prefetchnta 0x1c0(%rdi)
406dac: 0f 18 87 80 02 00 00 prefetchnta 0x280(%rdi)
406db3: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406dba: f3 0f 6f 06 movdqu (%rsi),%xmm0
406dbe: f3 0f 6f 4e 10 movdqu 0x10(%rsi),%xmm1
406dc3: f3 0f 6f 56 20 movdqu 0x20(%rsi),%xmm2
406dc8: f3 0f 6f 5e 30 movdqu 0x30(%rsi),%xmm3
406dcd: f3 0f 6f 66 40 movdqu 0x40(%rsi),%xmm4
406dd2: f3 0f 6f 6e 50 movdqu 0x50(%rsi),%xmm5
406dd7: f3 0f 6f 76 60 movdqu 0x60(%rsi),%xmm6
406ddc: f3 0f 6f 7e 70 movdqu 0x70(%rsi),%xmm7
406de1: 66 0f 7f 07 movdqa %xmm0,(%rdi)
406de5: 66 0f 7f 4f 10 movdqa %xmm1,0x10(%rdi)
406dea: 66 0f 7f 57 20 movdqa %xmm2,0x20(%rdi)
406def: 66 0f 7f 5f 30 movdqa %xmm3,0x30(%rdi)
406df4: 66 0f 7f 67 40 movdqa %xmm4,0x40(%rdi)
406df9: 66 0f 7f 6f 50 movdqa %xmm5,0x50(%rdi)
406dfe: 66 0f 7f 77 60 movdqa %xmm6,0x60(%rdi)
406e03: 66 0f 7f 7f 70 movdqa %xmm7,0x70(%rdi)
406e08: 48 8d b6 80 00 00 00 lea 0x80(%rsi),%rsi
406e0f: 48 8d bf 80 00 00 00 lea 0x80(%rdi),%rdi
406e16: 0f 83 7b ff ff ff jae 406d97 <__intel_ssse3_rep_memcpy+0x1987>
406e1c: 48 81 c2 80 00 00 00 add $0x80,%rdx
406e23: 48 01 d6 add %rdx,%rsi
406e26: 48 01 d7 add %rdx,%rdi
406e29: 4c 8d 1d 48 22 00 00 lea 0x2248(%rip),%r11 # 409078 <.L_2il0floatpacket.29+0x40c>
406e30: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406e34: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406e38: ff e2 jmpq *%rdx
406e3a: 0f 0b ud2
406e3c: 0f 1f 40 00 nopl 0x0(%rax)
406e40: 48 01 d6 add %rdx,%rsi
406e43: 48 01 d7 add %rdx,%rdi
406e46: f3 0f 6f 46 f0 movdqu -0x10(%rsi),%xmm0
406e4b: 4c 8d 47 f0 lea -0x10(%rdi),%r8
406e4f: 49 89 f9 mov %rdi,%r9
406e52: 48 83 e7 f0 and $0xfffffffffffffff0,%rdi
406e56: 49 29 f9 sub %rdi,%r9
406e59: 4c 29 ce sub %r9,%rsi
406e5c: 4c 29 ca sub %r9,%rdx
406e5f: 8b 0d 9f 3c 20 00 mov 0x203c9f(%rip),%ecx # 60ab04 <__libirc_largest_cache_size_half>
406e65: 48 39 ca cmp %rcx,%rdx
406e68: 77 03 ja 406e6d <__intel_ssse3_rep_memcpy+0x1a5d>
406e6a: 48 89 d1 mov %rdx,%rcx
406e6d: 48 29 ca sub %rcx,%rdx
406e70: 48 81 fa 00 10 00 00 cmp $0x1000,%rdx
406e77: 0f 86 a4 00 00 00 jbe 406f21 <__intel_ssse3_rep_memcpy+0x1b11>
406e7d: 49 89 c9 mov %rcx,%r9
406e80: 49 c1 e1 03 shl $0x3,%r9
406e84: 4c 39 ca cmp %r9,%rdx
406e87: 76 06 jbe 406e8f <__intel_ssse3_rep_memcpy+0x1a7f>
406e89: 48 01 ca add %rcx,%rdx
406e8c: 48 31 c9 xor %rcx,%rcx
406e8f: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406e96: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406e9d: 0f 18 8e 00 fe ff ff prefetcht0 -0x200(%rsi)
406ea4: 0f 18 8e 00 fd ff ff prefetcht0 -0x300(%rsi)
406eab: f3 0f 6f 4e f0 movdqu -0x10(%rsi),%xmm1
406eb0: f3 0f 6f 56 e0 movdqu -0x20(%rsi),%xmm2
406eb5: f3 0f 6f 5e d0 movdqu -0x30(%rsi),%xmm3
406eba: f3 0f 6f 66 c0 movdqu -0x40(%rsi),%xmm4
406ebf: f3 0f 6f 6e b0 movdqu -0x50(%rsi),%xmm5
406ec4: f3 0f 6f 76 a0 movdqu -0x60(%rsi),%xmm6
406ec9: f3 0f 6f 7e 90 movdqu -0x70(%rsi),%xmm7
406ece: f3 44 0f 6f 46 80 movdqu -0x80(%rsi),%xmm8
406ed4: 0f ae e8 lfence
406ed7: 66 0f e7 4f f0 movntdq %xmm1,-0x10(%rdi)
406edc: 66 0f e7 57 e0 movntdq %xmm2,-0x20(%rdi)
406ee1: 66 0f e7 5f d0 movntdq %xmm3,-0x30(%rdi)
406ee6: 66 0f e7 67 c0 movntdq %xmm4,-0x40(%rdi)
406eeb: 66 0f e7 6f b0 movntdq %xmm5,-0x50(%rdi)
406ef0: 66 0f e7 77 a0 movntdq %xmm6,-0x60(%rdi)
406ef5: 66 0f e7 7f 90 movntdq %xmm7,-0x70(%rdi)
406efa: 66 44 0f e7 47 80 movntdq %xmm8,-0x80(%rdi)
406f00: 48 8d 76 80 lea -0x80(%rsi),%rsi
406f04: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406f08: 73 8c jae 406e96 <__intel_ssse3_rep_memcpy+0x1a86>
406f0a: 0f ae f8 sfence
406f0d: 48 81 f9 80 00 00 00 cmp $0x80,%rcx
406f14: 0f 82 90 00 00 00 jb 406faa <__intel_ssse3_rep_memcpy+0x1b9a>
406f1a: 48 81 c2 80 00 00 00 add $0x80,%rdx
406f21: 48 01 ca add %rcx,%rdx
406f24: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406f2b: 0f 18 86 40 fe ff ff prefetchnta -0x1c0(%rsi)
406f32: 0f 18 86 80 fd ff ff prefetchnta -0x280(%rsi)
406f39: 0f 18 87 40 fe ff ff prefetchnta -0x1c0(%rdi)
406f40: 0f 18 87 80 fd ff ff prefetchnta -0x280(%rdi)
406f47: 48 81 ea 80 00 00 00 sub $0x80,%rdx
406f4e: f3 0f 6f 4e f0 movdqu -0x10(%rsi),%xmm1
406f53: f3 0f 6f 56 e0 movdqu -0x20(%rsi),%xmm2
406f58: f3 0f 6f 5e d0 movdqu -0x30(%rsi),%xmm3
406f5d: f3 0f 6f 66 c0 movdqu -0x40(%rsi),%xmm4
406f62: f3 0f 6f 6e b0 movdqu -0x50(%rsi),%xmm5
406f67: f3 0f 6f 76 a0 movdqu -0x60(%rsi),%xmm6
406f6c: f3 0f 6f 7e 90 movdqu -0x70(%rsi),%xmm7
406f71: f3 44 0f 6f 46 80 movdqu -0x80(%rsi),%xmm8
406f77: 66 0f 7f 4f f0 movdqa %xmm1,-0x10(%rdi)
406f7c: 66 0f 7f 57 e0 movdqa %xmm2,-0x20(%rdi)
406f81: 66 0f 7f 5f d0 movdqa %xmm3,-0x30(%rdi)
406f86: 66 0f 7f 67 c0 movdqa %xmm4,-0x40(%rdi)
406f8b: 66 0f 7f 6f b0 movdqa %xmm5,-0x50(%rdi)
406f90: 66 0f 7f 77 a0 movdqa %xmm6,-0x60(%rdi)
406f95: 66 0f 7f 7f 90 movdqa %xmm7,-0x70(%rdi)
406f9a: 66 44 0f 7f 47 80 movdqa %xmm8,-0x80(%rdi)
406fa0: 48 8d 76 80 lea -0x80(%rsi),%rsi
406fa4: 48 8d 7f 80 lea -0x80(%rdi),%rdi
406fa8: 73 81 jae 406f2b <__intel_ssse3_rep_memcpy+0x1b1b>
406faa: f3 41 0f 7f 00 movdqu %xmm0,(%r8)
406faf: 48 81 c2 80 00 00 00 add $0x80,%rdx
406fb6: 48 29 d6 sub %rdx,%rsi
406fb9: 48 29 d7 sub %rdx,%rdi
406fbc: 4c 8d 1d 75 1e 00 00 lea 0x1e75(%rip),%r11 # 408e38 <.L_2il0floatpacket.29+0x1cc>
406fc3: 49 63 14 93 movslq (%r11,%rdx,4),%rdx
406fc7: 49 8d 14 13 lea (%r11,%rdx,1),%rdx
406fcb: ff e2 jmpq *%rdx
406fcd: 0f 0b ud2
406fcf: 90 nop
406fd0: f2 0f f0 46 80 lddqu -0x80(%rsi),%xmm0
406fd5: f3 0f 7f 47 80 movdqu %xmm0,-0x80(%rdi)
406fda: f2 0f f0 46 90 lddqu -0x70(%rsi),%xmm0
406fdf: f3 0f 7f 47 90 movdqu %xmm0,-0x70(%rdi)
406fe4: f2 0f f0 46 a0 lddqu -0x60(%rsi),%xmm0
406fe9: f3 0f 7f 47 a0 movdqu %xmm0,-0x60(%rdi)
406fee: f2 0f f0 46 b0 lddqu -0x50(%rsi),%xmm0
406ff3: f3 0f 7f 47 b0 movdqu %xmm0,-0x50(%rdi)
406ff8: f2 0f f0 46 c0 lddqu -0x40(%rsi),%xmm0
406ffd: f3 0f 7f 47 c0 movdqu %xmm0,-0x40(%rdi)
407002: f2 0f f0 46 d0 lddqu -0x30(%rsi),%xmm0
407007: f3 0f 7f 47 d0 movdqu %xmm0,-0x30(%rdi)
40700c: f2 0f f0 46 e0 lddqu -0x20(%rsi),%xmm0
407011: f3 0f 7f 47 e0 movdqu %xmm0,-0x20(%rdi)
407016: f2 0f f0 46 f0 lddqu -0x10(%rsi),%xmm0
40701b: f3 0f 7f 47 f0 movdqu %xmm0,-0x10(%rdi)
407020: c3 retq
407021: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407028: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40702f: 00
407030: f2 0f f0 86 71 ff ff lddqu -0x8f(%rsi),%xmm0
407037: ff
407038: f3 0f 7f 87 71 ff ff movdqu %xmm0,-0x8f(%rdi)
40703f: ff
407040: f2 0f f0 46 81 lddqu -0x7f(%rsi),%xmm0
407045: f3 0f 7f 47 81 movdqu %xmm0,-0x7f(%rdi)
40704a: f2 0f f0 46 91 lddqu -0x6f(%rsi),%xmm0
40704f: f3 0f 7f 47 91 movdqu %xmm0,-0x6f(%rdi)
407054: f2 0f f0 46 a1 lddqu -0x5f(%rsi),%xmm0
407059: f3 0f 7f 47 a1 movdqu %xmm0,-0x5f(%rdi)
40705e: f2 0f f0 46 b1 lddqu -0x4f(%rsi),%xmm0
407063: f3 0f 7f 47 b1 movdqu %xmm0,-0x4f(%rdi)
407068: f2 0f f0 46 c1 lddqu -0x3f(%rsi),%xmm0
40706d: f3 0f 7f 47 c1 movdqu %xmm0,-0x3f(%rdi)
407072: f2 0f f0 46 d1 lddqu -0x2f(%rsi),%xmm0
407077: f3 0f 7f 47 d1 movdqu %xmm0,-0x2f(%rdi)
40707c: f2 0f f0 46 e1 lddqu -0x1f(%rsi),%xmm0
407081: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407086: f3 0f 7f 47 e1 movdqu %xmm0,-0x1f(%rdi)
40708b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407090: c3 retq
407091: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407098: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40709f: 00
4070a0: 48 8b 56 f1 mov -0xf(%rsi),%rdx
4070a4: 48 8b 4e f8 mov -0x8(%rsi),%rcx
4070a8: 48 89 57 f1 mov %rdx,-0xf(%rdi)
4070ac: 48 89 4f f8 mov %rcx,-0x8(%rdi)
4070b0: c3 retq
4070b1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4070b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4070bf: 00
4070c0: f2 0f f0 86 72 ff ff lddqu -0x8e(%rsi),%xmm0
4070c7: ff
4070c8: f3 0f 7f 87 72 ff ff movdqu %xmm0,-0x8e(%rdi)
4070cf: ff
4070d0: f2 0f f0 46 82 lddqu -0x7e(%rsi),%xmm0
4070d5: f3 0f 7f 47 82 movdqu %xmm0,-0x7e(%rdi)
4070da: f2 0f f0 46 92 lddqu -0x6e(%rsi),%xmm0
4070df: f3 0f 7f 47 92 movdqu %xmm0,-0x6e(%rdi)
4070e4: f2 0f f0 46 a2 lddqu -0x5e(%rsi),%xmm0
4070e9: f3 0f 7f 47 a2 movdqu %xmm0,-0x5e(%rdi)
4070ee: f2 0f f0 46 b2 lddqu -0x4e(%rsi),%xmm0
4070f3: f3 0f 7f 47 b2 movdqu %xmm0,-0x4e(%rdi)
4070f8: f2 0f f0 46 c2 lddqu -0x3e(%rsi),%xmm0
4070fd: f3 0f 7f 47 c2 movdqu %xmm0,-0x3e(%rdi)
407102: f2 0f f0 46 d2 lddqu -0x2e(%rsi),%xmm0
407107: f3 0f 7f 47 d2 movdqu %xmm0,-0x2e(%rdi)
40710c: f2 0f f0 46 e2 lddqu -0x1e(%rsi),%xmm0
407111: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407116: f3 0f 7f 47 e2 movdqu %xmm0,-0x1e(%rdi)
40711b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407120: c3 retq
407121: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407128: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40712f: 00
407130: 48 8b 56 f2 mov -0xe(%rsi),%rdx
407134: 48 8b 4e f8 mov -0x8(%rsi),%rcx
407138: 48 89 57 f2 mov %rdx,-0xe(%rdi)
40713c: 48 89 4f f8 mov %rcx,-0x8(%rdi)
407140: c3 retq
407141: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407148: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40714f: 00
407150: f2 0f f0 86 73 ff ff lddqu -0x8d(%rsi),%xmm0
407157: ff
407158: f3 0f 7f 87 73 ff ff movdqu %xmm0,-0x8d(%rdi)
40715f: ff
407160: f2 0f f0 46 83 lddqu -0x7d(%rsi),%xmm0
407165: f3 0f 7f 47 83 movdqu %xmm0,-0x7d(%rdi)
40716a: f2 0f f0 46 93 lddqu -0x6d(%rsi),%xmm0
40716f: f3 0f 7f 47 93 movdqu %xmm0,-0x6d(%rdi)
407174: f2 0f f0 46 a3 lddqu -0x5d(%rsi),%xmm0
407179: f3 0f 7f 47 a3 movdqu %xmm0,-0x5d(%rdi)
40717e: f2 0f f0 46 b3 lddqu -0x4d(%rsi),%xmm0
407183: f3 0f 7f 47 b3 movdqu %xmm0,-0x4d(%rdi)
407188: f2 0f f0 46 c3 lddqu -0x3d(%rsi),%xmm0
40718d: f3 0f 7f 47 c3 movdqu %xmm0,-0x3d(%rdi)
407192: f2 0f f0 46 d3 lddqu -0x2d(%rsi),%xmm0
407197: f3 0f 7f 47 d3 movdqu %xmm0,-0x2d(%rdi)
40719c: f2 0f f0 46 e3 lddqu -0x1d(%rsi),%xmm0
4071a1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4071a6: f3 0f 7f 47 e3 movdqu %xmm0,-0x1d(%rdi)
4071ab: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4071b0: c3 retq
4071b1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4071b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4071bf: 00
4071c0: 48 8b 56 f3 mov -0xd(%rsi),%rdx
4071c4: 48 8b 4e f8 mov -0x8(%rsi),%rcx
4071c8: 48 89 57 f3 mov %rdx,-0xd(%rdi)
4071cc: 48 89 4f f8 mov %rcx,-0x8(%rdi)
4071d0: c3 retq
4071d1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4071d8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4071df: 00
4071e0: f2 0f f0 86 74 ff ff lddqu -0x8c(%rsi),%xmm0
4071e7: ff
4071e8: f3 0f 7f 87 74 ff ff movdqu %xmm0,-0x8c(%rdi)
4071ef: ff
4071f0: f2 0f f0 46 84 lddqu -0x7c(%rsi),%xmm0
4071f5: f3 0f 7f 47 84 movdqu %xmm0,-0x7c(%rdi)
4071fa: f2 0f f0 46 94 lddqu -0x6c(%rsi),%xmm0
4071ff: f3 0f 7f 47 94 movdqu %xmm0,-0x6c(%rdi)
407204: f2 0f f0 46 a4 lddqu -0x5c(%rsi),%xmm0
407209: f3 0f 7f 47 a4 movdqu %xmm0,-0x5c(%rdi)
40720e: f2 0f f0 46 b4 lddqu -0x4c(%rsi),%xmm0
407213: f3 0f 7f 47 b4 movdqu %xmm0,-0x4c(%rdi)
407218: f2 0f f0 46 c4 lddqu -0x3c(%rsi),%xmm0
40721d: f3 0f 7f 47 c4 movdqu %xmm0,-0x3c(%rdi)
407222: f2 0f f0 46 d4 lddqu -0x2c(%rsi),%xmm0
407227: f3 0f 7f 47 d4 movdqu %xmm0,-0x2c(%rdi)
40722c: f2 0f f0 46 e4 lddqu -0x1c(%rsi),%xmm0
407231: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407236: f3 0f 7f 47 e4 movdqu %xmm0,-0x1c(%rdi)
40723b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407240: c3 retq
407241: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407248: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40724f: 00
407250: 48 8b 56 f4 mov -0xc(%rsi),%rdx
407254: 8b 4e fc mov -0x4(%rsi),%ecx
407257: 48 89 57 f4 mov %rdx,-0xc(%rdi)
40725b: 89 4f fc mov %ecx,-0x4(%rdi)
40725e: c3 retq
40725f: 90 nop
407260: f2 0f f0 86 75 ff ff lddqu -0x8b(%rsi),%xmm0
407267: ff
407268: f3 0f 7f 87 75 ff ff movdqu %xmm0,-0x8b(%rdi)
40726f: ff
407270: f2 0f f0 46 85 lddqu -0x7b(%rsi),%xmm0
407275: f3 0f 7f 47 85 movdqu %xmm0,-0x7b(%rdi)
40727a: f2 0f f0 46 95 lddqu -0x6b(%rsi),%xmm0
40727f: f3 0f 7f 47 95 movdqu %xmm0,-0x6b(%rdi)
407284: f2 0f f0 46 a5 lddqu -0x5b(%rsi),%xmm0
407289: f3 0f 7f 47 a5 movdqu %xmm0,-0x5b(%rdi)
40728e: f2 0f f0 46 b5 lddqu -0x4b(%rsi),%xmm0
407293: f3 0f 7f 47 b5 movdqu %xmm0,-0x4b(%rdi)
407298: f2 0f f0 46 c5 lddqu -0x3b(%rsi),%xmm0
40729d: f3 0f 7f 47 c5 movdqu %xmm0,-0x3b(%rdi)
4072a2: f2 0f f0 46 d5 lddqu -0x2b(%rsi),%xmm0
4072a7: f3 0f 7f 47 d5 movdqu %xmm0,-0x2b(%rdi)
4072ac: f2 0f f0 46 e5 lddqu -0x1b(%rsi),%xmm0
4072b1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4072b6: f3 0f 7f 47 e5 movdqu %xmm0,-0x1b(%rdi)
4072bb: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4072c0: c3 retq
4072c1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4072c8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4072cf: 00
4072d0: 48 8b 56 f5 mov -0xb(%rsi),%rdx
4072d4: 8b 4e fc mov -0x4(%rsi),%ecx
4072d7: 48 89 57 f5 mov %rdx,-0xb(%rdi)
4072db: 89 4f fc mov %ecx,-0x4(%rdi)
4072de: c3 retq
4072df: 90 nop
4072e0: f2 0f f0 86 76 ff ff lddqu -0x8a(%rsi),%xmm0
4072e7: ff
4072e8: f3 0f 7f 87 76 ff ff movdqu %xmm0,-0x8a(%rdi)
4072ef: ff
4072f0: f2 0f f0 46 86 lddqu -0x7a(%rsi),%xmm0
4072f5: f3 0f 7f 47 86 movdqu %xmm0,-0x7a(%rdi)
4072fa: f2 0f f0 46 96 lddqu -0x6a(%rsi),%xmm0
4072ff: f3 0f 7f 47 96 movdqu %xmm0,-0x6a(%rdi)
407304: f2 0f f0 46 a6 lddqu -0x5a(%rsi),%xmm0
407309: f3 0f 7f 47 a6 movdqu %xmm0,-0x5a(%rdi)
40730e: f2 0f f0 46 b6 lddqu -0x4a(%rsi),%xmm0
407313: f3 0f 7f 47 b6 movdqu %xmm0,-0x4a(%rdi)
407318: f2 0f f0 46 c6 lddqu -0x3a(%rsi),%xmm0
40731d: f3 0f 7f 47 c6 movdqu %xmm0,-0x3a(%rdi)
407322: f2 0f f0 46 d6 lddqu -0x2a(%rsi),%xmm0
407327: f3 0f 7f 47 d6 movdqu %xmm0,-0x2a(%rdi)
40732c: f2 0f f0 46 e6 lddqu -0x1a(%rsi),%xmm0
407331: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407336: f3 0f 7f 47 e6 movdqu %xmm0,-0x1a(%rdi)
40733b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407340: c3 retq
407341: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407348: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40734f: 00
407350: 48 8b 56 f6 mov -0xa(%rsi),%rdx
407354: 8b 4e fc mov -0x4(%rsi),%ecx
407357: 48 89 57 f6 mov %rdx,-0xa(%rdi)
40735b: 89 4f fc mov %ecx,-0x4(%rdi)
40735e: c3 retq
40735f: 90 nop
407360: f2 0f f0 86 77 ff ff lddqu -0x89(%rsi),%xmm0
407367: ff
407368: f3 0f 7f 87 77 ff ff movdqu %xmm0,-0x89(%rdi)
40736f: ff
407370: f2 0f f0 46 87 lddqu -0x79(%rsi),%xmm0
407375: f3 0f 7f 47 87 movdqu %xmm0,-0x79(%rdi)
40737a: f2 0f f0 46 97 lddqu -0x69(%rsi),%xmm0
40737f: f3 0f 7f 47 97 movdqu %xmm0,-0x69(%rdi)
407384: f2 0f f0 46 a7 lddqu -0x59(%rsi),%xmm0
407389: f3 0f 7f 47 a7 movdqu %xmm0,-0x59(%rdi)
40738e: f2 0f f0 46 b7 lddqu -0x49(%rsi),%xmm0
407393: f3 0f 7f 47 b7 movdqu %xmm0,-0x49(%rdi)
407398: f2 0f f0 46 c7 lddqu -0x39(%rsi),%xmm0
40739d: f3 0f 7f 47 c7 movdqu %xmm0,-0x39(%rdi)
4073a2: f2 0f f0 46 d7 lddqu -0x29(%rsi),%xmm0
4073a7: f3 0f 7f 47 d7 movdqu %xmm0,-0x29(%rdi)
4073ac: f2 0f f0 46 e7 lddqu -0x19(%rsi),%xmm0
4073b1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4073b6: f3 0f 7f 47 e7 movdqu %xmm0,-0x19(%rdi)
4073bb: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4073c0: c3 retq
4073c1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4073c8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4073cf: 00
4073d0: 48 8b 56 f7 mov -0x9(%rsi),%rdx
4073d4: 8b 4e fc mov -0x4(%rsi),%ecx
4073d7: 48 89 57 f7 mov %rdx,-0x9(%rdi)
4073db: 89 4f fc mov %ecx,-0x4(%rdi)
4073de: c3 retq
4073df: 90 nop
4073e0: f2 0f f0 86 78 ff ff lddqu -0x88(%rsi),%xmm0
4073e7: ff
4073e8: f3 0f 7f 87 78 ff ff movdqu %xmm0,-0x88(%rdi)
4073ef: ff
4073f0: f2 0f f0 46 88 lddqu -0x78(%rsi),%xmm0
4073f5: f3 0f 7f 47 88 movdqu %xmm0,-0x78(%rdi)
4073fa: f2 0f f0 46 98 lddqu -0x68(%rsi),%xmm0
4073ff: f3 0f 7f 47 98 movdqu %xmm0,-0x68(%rdi)
407404: f2 0f f0 46 a8 lddqu -0x58(%rsi),%xmm0
407409: f3 0f 7f 47 a8 movdqu %xmm0,-0x58(%rdi)
40740e: f2 0f f0 46 b8 lddqu -0x48(%rsi),%xmm0
407413: f3 0f 7f 47 b8 movdqu %xmm0,-0x48(%rdi)
407418: f2 0f f0 46 c8 lddqu -0x38(%rsi),%xmm0
40741d: f3 0f 7f 47 c8 movdqu %xmm0,-0x38(%rdi)
407422: f2 0f f0 46 d8 lddqu -0x28(%rsi),%xmm0
407427: f3 0f 7f 47 d8 movdqu %xmm0,-0x28(%rdi)
40742c: f2 0f f0 46 e8 lddqu -0x18(%rsi),%xmm0
407431: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407436: f3 0f 7f 47 e8 movdqu %xmm0,-0x18(%rdi)
40743b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407440: c3 retq
407441: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407448: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40744f: 00
407450: 48 8b 56 f8 mov -0x8(%rsi),%rdx
407454: 48 89 57 f8 mov %rdx,-0x8(%rdi)
407458: c3 retq
407459: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407460: f2 0f f0 86 79 ff ff lddqu -0x87(%rsi),%xmm0
407467: ff
407468: f3 0f 7f 87 79 ff ff movdqu %xmm0,-0x87(%rdi)
40746f: ff
407470: f2 0f f0 46 89 lddqu -0x77(%rsi),%xmm0
407475: f3 0f 7f 47 89 movdqu %xmm0,-0x77(%rdi)
40747a: f2 0f f0 46 99 lddqu -0x67(%rsi),%xmm0
40747f: f3 0f 7f 47 99 movdqu %xmm0,-0x67(%rdi)
407484: f2 0f f0 46 a9 lddqu -0x57(%rsi),%xmm0
407489: f3 0f 7f 47 a9 movdqu %xmm0,-0x57(%rdi)
40748e: f2 0f f0 46 b9 lddqu -0x47(%rsi),%xmm0
407493: f3 0f 7f 47 b9 movdqu %xmm0,-0x47(%rdi)
407498: f2 0f f0 46 c9 lddqu -0x37(%rsi),%xmm0
40749d: f3 0f 7f 47 c9 movdqu %xmm0,-0x37(%rdi)
4074a2: f2 0f f0 46 d9 lddqu -0x27(%rsi),%xmm0
4074a7: f3 0f 7f 47 d9 movdqu %xmm0,-0x27(%rdi)
4074ac: f2 0f f0 46 e9 lddqu -0x17(%rsi),%xmm0
4074b1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4074b6: f3 0f 7f 47 e9 movdqu %xmm0,-0x17(%rdi)
4074bb: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4074c0: c3 retq
4074c1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4074c8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4074cf: 00
4074d0: 8b 56 f9 mov -0x7(%rsi),%edx
4074d3: 8b 4e fc mov -0x4(%rsi),%ecx
4074d6: 89 57 f9 mov %edx,-0x7(%rdi)
4074d9: 89 4f fc mov %ecx,-0x4(%rdi)
4074dc: c3 retq
4074dd: 0f 1f 00 nopl (%rax)
4074e0: f2 0f f0 86 7a ff ff lddqu -0x86(%rsi),%xmm0
4074e7: ff
4074e8: f3 0f 7f 87 7a ff ff movdqu %xmm0,-0x86(%rdi)
4074ef: ff
4074f0: f2 0f f0 46 8a lddqu -0x76(%rsi),%xmm0
4074f5: f3 0f 7f 47 8a movdqu %xmm0,-0x76(%rdi)
4074fa: f2 0f f0 46 9a lddqu -0x66(%rsi),%xmm0
4074ff: f3 0f 7f 47 9a movdqu %xmm0,-0x66(%rdi)
407504: f2 0f f0 46 aa lddqu -0x56(%rsi),%xmm0
407509: f3 0f 7f 47 aa movdqu %xmm0,-0x56(%rdi)
40750e: f2 0f f0 46 ba lddqu -0x46(%rsi),%xmm0
407513: f3 0f 7f 47 ba movdqu %xmm0,-0x46(%rdi)
407518: f2 0f f0 46 ca lddqu -0x36(%rsi),%xmm0
40751d: f3 0f 7f 47 ca movdqu %xmm0,-0x36(%rdi)
407522: f2 0f f0 46 da lddqu -0x26(%rsi),%xmm0
407527: f3 0f 7f 47 da movdqu %xmm0,-0x26(%rdi)
40752c: f2 0f f0 46 ea lddqu -0x16(%rsi),%xmm0
407531: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407536: f3 0f 7f 47 ea movdqu %xmm0,-0x16(%rdi)
40753b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407540: c3 retq
407541: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407548: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40754f: 00
407550: 8b 56 fa mov -0x6(%rsi),%edx
407553: 8b 4e fc mov -0x4(%rsi),%ecx
407556: 89 57 fa mov %edx,-0x6(%rdi)
407559: 89 4f fc mov %ecx,-0x4(%rdi)
40755c: c3 retq
40755d: 0f 1f 00 nopl (%rax)
407560: f2 0f f0 86 7b ff ff lddqu -0x85(%rsi),%xmm0
407567: ff
407568: f3 0f 7f 87 7b ff ff movdqu %xmm0,-0x85(%rdi)
40756f: ff
407570: f2 0f f0 46 8b lddqu -0x75(%rsi),%xmm0
407575: f3 0f 7f 47 8b movdqu %xmm0,-0x75(%rdi)
40757a: f2 0f f0 46 9b lddqu -0x65(%rsi),%xmm0
40757f: f3 0f 7f 47 9b movdqu %xmm0,-0x65(%rdi)
407584: f2 0f f0 46 ab lddqu -0x55(%rsi),%xmm0
407589: f3 0f 7f 47 ab movdqu %xmm0,-0x55(%rdi)
40758e: f2 0f f0 46 bb lddqu -0x45(%rsi),%xmm0
407593: f3 0f 7f 47 bb movdqu %xmm0,-0x45(%rdi)
407598: f2 0f f0 46 cb lddqu -0x35(%rsi),%xmm0
40759d: f3 0f 7f 47 cb movdqu %xmm0,-0x35(%rdi)
4075a2: f2 0f f0 46 db lddqu -0x25(%rsi),%xmm0
4075a7: f3 0f 7f 47 db movdqu %xmm0,-0x25(%rdi)
4075ac: f2 0f f0 46 eb lddqu -0x15(%rsi),%xmm0
4075b1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4075b6: f3 0f 7f 47 eb movdqu %xmm0,-0x15(%rdi)
4075bb: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4075c0: c3 retq
4075c1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4075c8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4075cf: 00
4075d0: 8b 56 fb mov -0x5(%rsi),%edx
4075d3: 8b 4e fc mov -0x4(%rsi),%ecx
4075d6: 89 57 fb mov %edx,-0x5(%rdi)
4075d9: 89 4f fc mov %ecx,-0x4(%rdi)
4075dc: c3 retq
4075dd: 0f 1f 00 nopl (%rax)
4075e0: f2 0f f0 86 7c ff ff lddqu -0x84(%rsi),%xmm0
4075e7: ff
4075e8: f3 0f 7f 87 7c ff ff movdqu %xmm0,-0x84(%rdi)
4075ef: ff
4075f0: f2 0f f0 46 8c lddqu -0x74(%rsi),%xmm0
4075f5: f3 0f 7f 47 8c movdqu %xmm0,-0x74(%rdi)
4075fa: f2 0f f0 46 9c lddqu -0x64(%rsi),%xmm0
4075ff: f3 0f 7f 47 9c movdqu %xmm0,-0x64(%rdi)
407604: f2 0f f0 46 ac lddqu -0x54(%rsi),%xmm0
407609: f3 0f 7f 47 ac movdqu %xmm0,-0x54(%rdi)
40760e: f2 0f f0 46 bc lddqu -0x44(%rsi),%xmm0
407613: f3 0f 7f 47 bc movdqu %xmm0,-0x44(%rdi)
407618: f2 0f f0 46 cc lddqu -0x34(%rsi),%xmm0
40761d: f3 0f 7f 47 cc movdqu %xmm0,-0x34(%rdi)
407622: f2 0f f0 46 dc lddqu -0x24(%rsi),%xmm0
407627: f3 0f 7f 47 dc movdqu %xmm0,-0x24(%rdi)
40762c: f2 0f f0 46 ec lddqu -0x14(%rsi),%xmm0
407631: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407636: f3 0f 7f 47 ec movdqu %xmm0,-0x14(%rdi)
40763b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407640: c3 retq
407641: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407648: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40764f: 00
407650: 8b 56 fc mov -0x4(%rsi),%edx
407653: 89 57 fc mov %edx,-0x4(%rdi)
407656: c3 retq
407657: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
40765e: 00 00
407660: f2 0f f0 86 7d ff ff lddqu -0x83(%rsi),%xmm0
407667: ff
407668: f3 0f 7f 87 7d ff ff movdqu %xmm0,-0x83(%rdi)
40766f: ff
407670: f2 0f f0 46 8d lddqu -0x73(%rsi),%xmm0
407675: f3 0f 7f 47 8d movdqu %xmm0,-0x73(%rdi)
40767a: f2 0f f0 46 9d lddqu -0x63(%rsi),%xmm0
40767f: f3 0f 7f 47 9d movdqu %xmm0,-0x63(%rdi)
407684: f2 0f f0 46 ad lddqu -0x53(%rsi),%xmm0
407689: f3 0f 7f 47 ad movdqu %xmm0,-0x53(%rdi)
40768e: f2 0f f0 46 bd lddqu -0x43(%rsi),%xmm0
407693: f3 0f 7f 47 bd movdqu %xmm0,-0x43(%rdi)
407698: f2 0f f0 46 cd lddqu -0x33(%rsi),%xmm0
40769d: f3 0f 7f 47 cd movdqu %xmm0,-0x33(%rdi)
4076a2: f2 0f f0 46 dd lddqu -0x23(%rsi),%xmm0
4076a7: f3 0f 7f 47 dd movdqu %xmm0,-0x23(%rdi)
4076ac: f2 0f f0 46 ed lddqu -0x13(%rsi),%xmm0
4076b1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4076b6: f3 0f 7f 47 ed movdqu %xmm0,-0x13(%rdi)
4076bb: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4076c0: c3 retq
4076c1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4076c8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4076cf: 00
4076d0: 66 8b 56 fd mov -0x3(%rsi),%dx
4076d4: 66 8b 4e fe mov -0x2(%rsi),%cx
4076d8: 66 89 57 fd mov %dx,-0x3(%rdi)
4076dc: 66 89 4f fe mov %cx,-0x2(%rdi)
4076e0: c3 retq
4076e1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4076e8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4076ef: 00
4076f0: f2 0f f0 86 7e ff ff lddqu -0x82(%rsi),%xmm0
4076f7: ff
4076f8: f3 0f 7f 87 7e ff ff movdqu %xmm0,-0x82(%rdi)
4076ff: ff
407700: f2 0f f0 46 8e lddqu -0x72(%rsi),%xmm0
407705: f3 0f 7f 47 8e movdqu %xmm0,-0x72(%rdi)
40770a: f2 0f f0 46 9e lddqu -0x62(%rsi),%xmm0
40770f: f3 0f 7f 47 9e movdqu %xmm0,-0x62(%rdi)
407714: f2 0f f0 46 ae lddqu -0x52(%rsi),%xmm0
407719: f3 0f 7f 47 ae movdqu %xmm0,-0x52(%rdi)
40771e: f2 0f f0 46 be lddqu -0x42(%rsi),%xmm0
407723: f3 0f 7f 47 be movdqu %xmm0,-0x42(%rdi)
407728: f2 0f f0 46 ce lddqu -0x32(%rsi),%xmm0
40772d: f3 0f 7f 47 ce movdqu %xmm0,-0x32(%rdi)
407732: f2 0f f0 46 de lddqu -0x22(%rsi),%xmm0
407737: f3 0f 7f 47 de movdqu %xmm0,-0x22(%rdi)
40773c: f2 0f f0 46 ee lddqu -0x12(%rsi),%xmm0
407741: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
407746: f3 0f 7f 47 ee movdqu %xmm0,-0x12(%rdi)
40774b: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
407750: c3 retq
407751: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407758: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40775f: 00
407760: 0f b7 56 fe movzwl -0x2(%rsi),%edx
407764: 66 89 57 fe mov %dx,-0x2(%rdi)
407768: c3 retq
407769: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407770: f2 0f f0 86 7f ff ff lddqu -0x81(%rsi),%xmm0
407777: ff
407778: f3 0f 7f 87 7f ff ff movdqu %xmm0,-0x81(%rdi)
40777f: ff
407780: f2 0f f0 46 8f lddqu -0x71(%rsi),%xmm0
407785: f3 0f 7f 47 8f movdqu %xmm0,-0x71(%rdi)
40778a: f2 0f f0 46 9f lddqu -0x61(%rsi),%xmm0
40778f: f3 0f 7f 47 9f movdqu %xmm0,-0x61(%rdi)
407794: f2 0f f0 46 af lddqu -0x51(%rsi),%xmm0
407799: f3 0f 7f 47 af movdqu %xmm0,-0x51(%rdi)
40779e: f2 0f f0 46 bf lddqu -0x41(%rsi),%xmm0
4077a3: f3 0f 7f 47 bf movdqu %xmm0,-0x41(%rdi)
4077a8: f2 0f f0 46 cf lddqu -0x31(%rsi),%xmm0
4077ad: f3 0f 7f 47 cf movdqu %xmm0,-0x31(%rdi)
4077b2: f2 0f f0 46 df lddqu -0x21(%rsi),%xmm0
4077b7: f3 0f 7f 47 df movdqu %xmm0,-0x21(%rdi)
4077bc: f2 0f f0 46 ef lddqu -0x11(%rsi),%xmm0
4077c1: f2 0f f0 4e f0 lddqu -0x10(%rsi),%xmm1
4077c6: f3 0f 7f 47 ef movdqu %xmm0,-0x11(%rdi)
4077cb: f3 0f 7f 4f f0 movdqu %xmm1,-0x10(%rdi)
4077d0: c3 retq
4077d1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4077d8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4077df: 00
4077e0: 0f b6 56 ff movzbl -0x1(%rsi),%edx
4077e4: 88 57 ff mov %dl,-0x1(%rdi)
4077e7: c3 retq
4077e8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4077ef: 00
4077f0: f2 0f f0 46 70 lddqu 0x70(%rsi),%xmm0
4077f5: f3 0f 7f 47 70 movdqu %xmm0,0x70(%rdi)
4077fa: f2 0f f0 46 60 lddqu 0x60(%rsi),%xmm0
4077ff: f3 0f 7f 47 60 movdqu %xmm0,0x60(%rdi)
407804: f2 0f f0 46 50 lddqu 0x50(%rsi),%xmm0
407809: f3 0f 7f 47 50 movdqu %xmm0,0x50(%rdi)
40780e: f2 0f f0 46 40 lddqu 0x40(%rsi),%xmm0
407813: f3 0f 7f 47 40 movdqu %xmm0,0x40(%rdi)
407818: f2 0f f0 46 30 lddqu 0x30(%rsi),%xmm0
40781d: f3 0f 7f 47 30 movdqu %xmm0,0x30(%rdi)
407822: f2 0f f0 46 20 lddqu 0x20(%rsi),%xmm0
407827: f3 0f 7f 47 20 movdqu %xmm0,0x20(%rdi)
40782c: f2 0f f0 46 10 lddqu 0x10(%rsi),%xmm0
407831: f3 0f 7f 47 10 movdqu %xmm0,0x10(%rdi)
407836: f2 0f f0 06 lddqu (%rsi),%xmm0
40783a: f3 0f 7f 07 movdqu %xmm0,(%rdi)
40783e: c3 retq
40783f: 90 nop
407840: f2 0f f0 46 7f lddqu 0x7f(%rsi),%xmm0
407845: f3 0f 7f 47 7f movdqu %xmm0,0x7f(%rdi)
40784a: f2 0f f0 46 6f lddqu 0x6f(%rsi),%xmm0
40784f: f3 0f 7f 47 6f movdqu %xmm0,0x6f(%rdi)
407854: f2 0f f0 46 5f lddqu 0x5f(%rsi),%xmm0
407859: f3 0f 7f 47 5f movdqu %xmm0,0x5f(%rdi)
40785e: f2 0f f0 46 4f lddqu 0x4f(%rsi),%xmm0
407863: f3 0f 7f 47 4f movdqu %xmm0,0x4f(%rdi)
407868: f2 0f f0 46 3f lddqu 0x3f(%rsi),%xmm0
40786d: f3 0f 7f 47 3f movdqu %xmm0,0x3f(%rdi)
407872: f2 0f f0 46 2f lddqu 0x2f(%rsi),%xmm0
407877: f3 0f 7f 47 2f movdqu %xmm0,0x2f(%rdi)
40787c: f2 0f f0 46 1f lddqu 0x1f(%rsi),%xmm0
407881: f3 0f 7f 47 1f movdqu %xmm0,0x1f(%rdi)
407886: f2 0f f0 46 0f lddqu 0xf(%rsi),%xmm0
40788b: f2 0f f0 0e lddqu (%rsi),%xmm1
40788f: f3 0f 7f 47 0f movdqu %xmm0,0xf(%rdi)
407894: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407898: c3 retq
407899: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4078a0: 48 8b 56 07 mov 0x7(%rsi),%rdx
4078a4: 48 8b 0e mov (%rsi),%rcx
4078a7: 48 89 57 07 mov %rdx,0x7(%rdi)
4078ab: 48 89 0f mov %rcx,(%rdi)
4078ae: c3 retq
4078af: 90 nop
4078b0: f2 0f f0 46 7e lddqu 0x7e(%rsi),%xmm0
4078b5: f3 0f 7f 47 7e movdqu %xmm0,0x7e(%rdi)
4078ba: f2 0f f0 46 6e lddqu 0x6e(%rsi),%xmm0
4078bf: f3 0f 7f 47 6e movdqu %xmm0,0x6e(%rdi)
4078c4: f2 0f f0 46 5e lddqu 0x5e(%rsi),%xmm0
4078c9: f3 0f 7f 47 5e movdqu %xmm0,0x5e(%rdi)
4078ce: f2 0f f0 46 4e lddqu 0x4e(%rsi),%xmm0
4078d3: f3 0f 7f 47 4e movdqu %xmm0,0x4e(%rdi)
4078d8: f2 0f f0 46 3e lddqu 0x3e(%rsi),%xmm0
4078dd: f3 0f 7f 47 3e movdqu %xmm0,0x3e(%rdi)
4078e2: f2 0f f0 46 2e lddqu 0x2e(%rsi),%xmm0
4078e7: f3 0f 7f 47 2e movdqu %xmm0,0x2e(%rdi)
4078ec: f2 0f f0 46 1e lddqu 0x1e(%rsi),%xmm0
4078f1: f3 0f 7f 47 1e movdqu %xmm0,0x1e(%rdi)
4078f6: f2 0f f0 46 0e lddqu 0xe(%rsi),%xmm0
4078fb: f2 0f f0 0e lddqu (%rsi),%xmm1
4078ff: f3 0f 7f 47 0e movdqu %xmm0,0xe(%rdi)
407904: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407908: c3 retq
407909: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407910: 48 8b 56 06 mov 0x6(%rsi),%rdx
407914: 48 8b 0e mov (%rsi),%rcx
407917: 48 89 57 06 mov %rdx,0x6(%rdi)
40791b: 48 89 0f mov %rcx,(%rdi)
40791e: c3 retq
40791f: 90 nop
407920: f2 0f f0 46 7d lddqu 0x7d(%rsi),%xmm0
407925: f3 0f 7f 47 7d movdqu %xmm0,0x7d(%rdi)
40792a: f2 0f f0 46 6d lddqu 0x6d(%rsi),%xmm0
40792f: f3 0f 7f 47 6d movdqu %xmm0,0x6d(%rdi)
407934: f2 0f f0 46 5d lddqu 0x5d(%rsi),%xmm0
407939: f3 0f 7f 47 5d movdqu %xmm0,0x5d(%rdi)
40793e: f2 0f f0 46 4d lddqu 0x4d(%rsi),%xmm0
407943: f3 0f 7f 47 4d movdqu %xmm0,0x4d(%rdi)
407948: f2 0f f0 46 3d lddqu 0x3d(%rsi),%xmm0
40794d: f3 0f 7f 47 3d movdqu %xmm0,0x3d(%rdi)
407952: f2 0f f0 46 2d lddqu 0x2d(%rsi),%xmm0
407957: f3 0f 7f 47 2d movdqu %xmm0,0x2d(%rdi)
40795c: f2 0f f0 46 1d lddqu 0x1d(%rsi),%xmm0
407961: f3 0f 7f 47 1d movdqu %xmm0,0x1d(%rdi)
407966: f2 0f f0 46 0d lddqu 0xd(%rsi),%xmm0
40796b: f2 0f f0 0e lddqu (%rsi),%xmm1
40796f: f3 0f 7f 47 0d movdqu %xmm0,0xd(%rdi)
407974: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407978: c3 retq
407979: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407980: 48 8b 56 05 mov 0x5(%rsi),%rdx
407984: 48 8b 0e mov (%rsi),%rcx
407987: 48 89 57 05 mov %rdx,0x5(%rdi)
40798b: 48 89 0f mov %rcx,(%rdi)
40798e: c3 retq
40798f: 90 nop
407990: f2 0f f0 46 7c lddqu 0x7c(%rsi),%xmm0
407995: f3 0f 7f 47 7c movdqu %xmm0,0x7c(%rdi)
40799a: f2 0f f0 46 6c lddqu 0x6c(%rsi),%xmm0
40799f: f3 0f 7f 47 6c movdqu %xmm0,0x6c(%rdi)
4079a4: f2 0f f0 46 5c lddqu 0x5c(%rsi),%xmm0
4079a9: f3 0f 7f 47 5c movdqu %xmm0,0x5c(%rdi)
4079ae: f2 0f f0 46 4c lddqu 0x4c(%rsi),%xmm0
4079b3: f3 0f 7f 47 4c movdqu %xmm0,0x4c(%rdi)
4079b8: f2 0f f0 46 3c lddqu 0x3c(%rsi),%xmm0
4079bd: f3 0f 7f 47 3c movdqu %xmm0,0x3c(%rdi)
4079c2: f2 0f f0 46 2c lddqu 0x2c(%rsi),%xmm0
4079c7: f3 0f 7f 47 2c movdqu %xmm0,0x2c(%rdi)
4079cc: f2 0f f0 46 1c lddqu 0x1c(%rsi),%xmm0
4079d1: f3 0f 7f 47 1c movdqu %xmm0,0x1c(%rdi)
4079d6: f2 0f f0 46 0c lddqu 0xc(%rsi),%xmm0
4079db: f2 0f f0 0e lddqu (%rsi),%xmm1
4079df: f3 0f 7f 47 0c movdqu %xmm0,0xc(%rdi)
4079e4: f3 0f 7f 0f movdqu %xmm1,(%rdi)
4079e8: c3 retq
4079e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4079f0: 48 8b 56 04 mov 0x4(%rsi),%rdx
4079f4: 48 8b 0e mov (%rsi),%rcx
4079f7: 48 89 57 04 mov %rdx,0x4(%rdi)
4079fb: 48 89 0f mov %rcx,(%rdi)
4079fe: c3 retq
4079ff: 90 nop
407a00: f2 0f f0 46 7b lddqu 0x7b(%rsi),%xmm0
407a05: f3 0f 7f 47 7b movdqu %xmm0,0x7b(%rdi)
407a0a: f2 0f f0 46 6b lddqu 0x6b(%rsi),%xmm0
407a0f: f3 0f 7f 47 6b movdqu %xmm0,0x6b(%rdi)
407a14: f2 0f f0 46 5b lddqu 0x5b(%rsi),%xmm0
407a19: f3 0f 7f 47 5b movdqu %xmm0,0x5b(%rdi)
407a1e: f2 0f f0 46 4b lddqu 0x4b(%rsi),%xmm0
407a23: f3 0f 7f 47 4b movdqu %xmm0,0x4b(%rdi)
407a28: f2 0f f0 46 3b lddqu 0x3b(%rsi),%xmm0
407a2d: f3 0f 7f 47 3b movdqu %xmm0,0x3b(%rdi)
407a32: f2 0f f0 46 2b lddqu 0x2b(%rsi),%xmm0
407a37: f3 0f 7f 47 2b movdqu %xmm0,0x2b(%rdi)
407a3c: f2 0f f0 46 1b lddqu 0x1b(%rsi),%xmm0
407a41: f3 0f 7f 47 1b movdqu %xmm0,0x1b(%rdi)
407a46: f2 0f f0 46 0b lddqu 0xb(%rsi),%xmm0
407a4b: f2 0f f0 0e lddqu (%rsi),%xmm1
407a4f: f3 0f 7f 47 0b movdqu %xmm0,0xb(%rdi)
407a54: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407a58: c3 retq
407a59: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407a60: 48 8b 56 03 mov 0x3(%rsi),%rdx
407a64: 48 8b 0e mov (%rsi),%rcx
407a67: 48 89 57 03 mov %rdx,0x3(%rdi)
407a6b: 48 89 0f mov %rcx,(%rdi)
407a6e: c3 retq
407a6f: 90 nop
407a70: f2 0f f0 46 7a lddqu 0x7a(%rsi),%xmm0
407a75: f3 0f 7f 47 7a movdqu %xmm0,0x7a(%rdi)
407a7a: f2 0f f0 46 6a lddqu 0x6a(%rsi),%xmm0
407a7f: f3 0f 7f 47 6a movdqu %xmm0,0x6a(%rdi)
407a84: f2 0f f0 46 5a lddqu 0x5a(%rsi),%xmm0
407a89: f3 0f 7f 47 5a movdqu %xmm0,0x5a(%rdi)
407a8e: f2 0f f0 46 4a lddqu 0x4a(%rsi),%xmm0
407a93: f3 0f 7f 47 4a movdqu %xmm0,0x4a(%rdi)
407a98: f2 0f f0 46 3a lddqu 0x3a(%rsi),%xmm0
407a9d: f3 0f 7f 47 3a movdqu %xmm0,0x3a(%rdi)
407aa2: f2 0f f0 46 2a lddqu 0x2a(%rsi),%xmm0
407aa7: f3 0f 7f 47 2a movdqu %xmm0,0x2a(%rdi)
407aac: f2 0f f0 46 1a lddqu 0x1a(%rsi),%xmm0
407ab1: f3 0f 7f 47 1a movdqu %xmm0,0x1a(%rdi)
407ab6: f2 0f f0 46 0a lddqu 0xa(%rsi),%xmm0
407abb: f2 0f f0 0e lddqu (%rsi),%xmm1
407abf: f3 0f 7f 47 0a movdqu %xmm0,0xa(%rdi)
407ac4: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407ac8: c3 retq
407ac9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407ad0: 48 8b 56 02 mov 0x2(%rsi),%rdx
407ad4: 48 8b 0e mov (%rsi),%rcx
407ad7: 48 89 57 02 mov %rdx,0x2(%rdi)
407adb: 48 89 0f mov %rcx,(%rdi)
407ade: c3 retq
407adf: 90 nop
407ae0: f2 0f f0 46 79 lddqu 0x79(%rsi),%xmm0
407ae5: f3 0f 7f 47 79 movdqu %xmm0,0x79(%rdi)
407aea: f2 0f f0 46 69 lddqu 0x69(%rsi),%xmm0
407aef: f3 0f 7f 47 69 movdqu %xmm0,0x69(%rdi)
407af4: f2 0f f0 46 59 lddqu 0x59(%rsi),%xmm0
407af9: f3 0f 7f 47 59 movdqu %xmm0,0x59(%rdi)
407afe: f2 0f f0 46 49 lddqu 0x49(%rsi),%xmm0
407b03: f3 0f 7f 47 49 movdqu %xmm0,0x49(%rdi)
407b08: f2 0f f0 46 39 lddqu 0x39(%rsi),%xmm0
407b0d: f3 0f 7f 47 39 movdqu %xmm0,0x39(%rdi)
407b12: f2 0f f0 46 29 lddqu 0x29(%rsi),%xmm0
407b17: f3 0f 7f 47 29 movdqu %xmm0,0x29(%rdi)
407b1c: f2 0f f0 46 19 lddqu 0x19(%rsi),%xmm0
407b21: f3 0f 7f 47 19 movdqu %xmm0,0x19(%rdi)
407b26: f2 0f f0 46 09 lddqu 0x9(%rsi),%xmm0
407b2b: f2 0f f0 0e lddqu (%rsi),%xmm1
407b2f: f3 0f 7f 47 09 movdqu %xmm0,0x9(%rdi)
407b34: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407b38: c3 retq
407b39: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407b40: 48 8b 56 01 mov 0x1(%rsi),%rdx
407b44: 48 8b 0e mov (%rsi),%rcx
407b47: 48 89 57 01 mov %rdx,0x1(%rdi)
407b4b: 48 89 0f mov %rcx,(%rdi)
407b4e: c3 retq
407b4f: 90 nop
407b50: f2 0f f0 46 78 lddqu 0x78(%rsi),%xmm0
407b55: f3 0f 7f 47 78 movdqu %xmm0,0x78(%rdi)
407b5a: f2 0f f0 46 68 lddqu 0x68(%rsi),%xmm0
407b5f: f3 0f 7f 47 68 movdqu %xmm0,0x68(%rdi)
407b64: f2 0f f0 46 58 lddqu 0x58(%rsi),%xmm0
407b69: f3 0f 7f 47 58 movdqu %xmm0,0x58(%rdi)
407b6e: f2 0f f0 46 48 lddqu 0x48(%rsi),%xmm0
407b73: f3 0f 7f 47 48 movdqu %xmm0,0x48(%rdi)
407b78: f2 0f f0 46 38 lddqu 0x38(%rsi),%xmm0
407b7d: f3 0f 7f 47 38 movdqu %xmm0,0x38(%rdi)
407b82: f2 0f f0 46 28 lddqu 0x28(%rsi),%xmm0
407b87: f3 0f 7f 47 28 movdqu %xmm0,0x28(%rdi)
407b8c: f2 0f f0 46 18 lddqu 0x18(%rsi),%xmm0
407b91: f3 0f 7f 47 18 movdqu %xmm0,0x18(%rdi)
407b96: f2 0f f0 46 08 lddqu 0x8(%rsi),%xmm0
407b9b: f2 0f f0 0e lddqu (%rsi),%xmm1
407b9f: f3 0f 7f 47 08 movdqu %xmm0,0x8(%rdi)
407ba4: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407ba8: c3 retq
407ba9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407bb0: 48 8b 16 mov (%rsi),%rdx
407bb3: 48 89 17 mov %rdx,(%rdi)
407bb6: c3 retq
407bb7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
407bbe: 00 00
407bc0: f2 0f f0 46 77 lddqu 0x77(%rsi),%xmm0
407bc5: f3 0f 7f 47 77 movdqu %xmm0,0x77(%rdi)
407bca: f2 0f f0 46 67 lddqu 0x67(%rsi),%xmm0
407bcf: f3 0f 7f 47 67 movdqu %xmm0,0x67(%rdi)
407bd4: f2 0f f0 46 57 lddqu 0x57(%rsi),%xmm0
407bd9: f3 0f 7f 47 57 movdqu %xmm0,0x57(%rdi)
407bde: f2 0f f0 46 47 lddqu 0x47(%rsi),%xmm0
407be3: f3 0f 7f 47 47 movdqu %xmm0,0x47(%rdi)
407be8: f2 0f f0 46 37 lddqu 0x37(%rsi),%xmm0
407bed: f3 0f 7f 47 37 movdqu %xmm0,0x37(%rdi)
407bf2: f2 0f f0 46 27 lddqu 0x27(%rsi),%xmm0
407bf7: f3 0f 7f 47 27 movdqu %xmm0,0x27(%rdi)
407bfc: f2 0f f0 46 17 lddqu 0x17(%rsi),%xmm0
407c01: f3 0f 7f 47 17 movdqu %xmm0,0x17(%rdi)
407c06: f2 0f f0 46 07 lddqu 0x7(%rsi),%xmm0
407c0b: f2 0f f0 0e lddqu (%rsi),%xmm1
407c0f: f3 0f 7f 47 07 movdqu %xmm0,0x7(%rdi)
407c14: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407c18: c3 retq
407c19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407c20: 8b 56 03 mov 0x3(%rsi),%edx
407c23: 8b 0e mov (%rsi),%ecx
407c25: 89 57 03 mov %edx,0x3(%rdi)
407c28: 89 0f mov %ecx,(%rdi)
407c2a: c3 retq
407c2b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
407c30: f2 0f f0 46 76 lddqu 0x76(%rsi),%xmm0
407c35: f3 0f 7f 47 76 movdqu %xmm0,0x76(%rdi)
407c3a: f2 0f f0 46 66 lddqu 0x66(%rsi),%xmm0
407c3f: f3 0f 7f 47 66 movdqu %xmm0,0x66(%rdi)
407c44: f2 0f f0 46 56 lddqu 0x56(%rsi),%xmm0
407c49: f3 0f 7f 47 56 movdqu %xmm0,0x56(%rdi)
407c4e: f2 0f f0 46 46 lddqu 0x46(%rsi),%xmm0
407c53: f3 0f 7f 47 46 movdqu %xmm0,0x46(%rdi)
407c58: f2 0f f0 46 36 lddqu 0x36(%rsi),%xmm0
407c5d: f3 0f 7f 47 36 movdqu %xmm0,0x36(%rdi)
407c62: f2 0f f0 46 26 lddqu 0x26(%rsi),%xmm0
407c67: f3 0f 7f 47 26 movdqu %xmm0,0x26(%rdi)
407c6c: f2 0f f0 46 16 lddqu 0x16(%rsi),%xmm0
407c71: f3 0f 7f 47 16 movdqu %xmm0,0x16(%rdi)
407c76: f2 0f f0 46 06 lddqu 0x6(%rsi),%xmm0
407c7b: f2 0f f0 0e lddqu (%rsi),%xmm1
407c7f: f3 0f 7f 47 06 movdqu %xmm0,0x6(%rdi)
407c84: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407c88: c3 retq
407c89: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407c90: 8b 56 02 mov 0x2(%rsi),%edx
407c93: 8b 0e mov (%rsi),%ecx
407c95: 89 57 02 mov %edx,0x2(%rdi)
407c98: 89 0f mov %ecx,(%rdi)
407c9a: c3 retq
407c9b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
407ca0: f2 0f f0 46 75 lddqu 0x75(%rsi),%xmm0
407ca5: f3 0f 7f 47 75 movdqu %xmm0,0x75(%rdi)
407caa: f2 0f f0 46 65 lddqu 0x65(%rsi),%xmm0
407caf: f3 0f 7f 47 65 movdqu %xmm0,0x65(%rdi)
407cb4: f2 0f f0 46 55 lddqu 0x55(%rsi),%xmm0
407cb9: f3 0f 7f 47 55 movdqu %xmm0,0x55(%rdi)
407cbe: f2 0f f0 46 45 lddqu 0x45(%rsi),%xmm0
407cc3: f3 0f 7f 47 45 movdqu %xmm0,0x45(%rdi)
407cc8: f2 0f f0 46 35 lddqu 0x35(%rsi),%xmm0
407ccd: f3 0f 7f 47 35 movdqu %xmm0,0x35(%rdi)
407cd2: f2 0f f0 46 25 lddqu 0x25(%rsi),%xmm0
407cd7: f3 0f 7f 47 25 movdqu %xmm0,0x25(%rdi)
407cdc: f2 0f f0 46 15 lddqu 0x15(%rsi),%xmm0
407ce1: f3 0f 7f 47 15 movdqu %xmm0,0x15(%rdi)
407ce6: f2 0f f0 46 05 lddqu 0x5(%rsi),%xmm0
407ceb: f2 0f f0 0e lddqu (%rsi),%xmm1
407cef: f3 0f 7f 47 05 movdqu %xmm0,0x5(%rdi)
407cf4: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407cf8: c3 retq
407cf9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407d00: 8b 56 01 mov 0x1(%rsi),%edx
407d03: 8b 0e mov (%rsi),%ecx
407d05: 89 57 01 mov %edx,0x1(%rdi)
407d08: 89 0f mov %ecx,(%rdi)
407d0a: c3 retq
407d0b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
407d10: f2 0f f0 46 74 lddqu 0x74(%rsi),%xmm0
407d15: f3 0f 7f 47 74 movdqu %xmm0,0x74(%rdi)
407d1a: f2 0f f0 46 64 lddqu 0x64(%rsi),%xmm0
407d1f: f3 0f 7f 47 64 movdqu %xmm0,0x64(%rdi)
407d24: f2 0f f0 46 54 lddqu 0x54(%rsi),%xmm0
407d29: f3 0f 7f 47 54 movdqu %xmm0,0x54(%rdi)
407d2e: f2 0f f0 46 44 lddqu 0x44(%rsi),%xmm0
407d33: f3 0f 7f 47 44 movdqu %xmm0,0x44(%rdi)
407d38: f2 0f f0 46 34 lddqu 0x34(%rsi),%xmm0
407d3d: f3 0f 7f 47 34 movdqu %xmm0,0x34(%rdi)
407d42: f2 0f f0 46 24 lddqu 0x24(%rsi),%xmm0
407d47: f3 0f 7f 47 24 movdqu %xmm0,0x24(%rdi)
407d4c: f2 0f f0 46 14 lddqu 0x14(%rsi),%xmm0
407d51: f3 0f 7f 47 14 movdqu %xmm0,0x14(%rdi)
407d56: f2 0f f0 46 04 lddqu 0x4(%rsi),%xmm0
407d5b: f2 0f f0 0e lddqu (%rsi),%xmm1
407d5f: f3 0f 7f 47 04 movdqu %xmm0,0x4(%rdi)
407d64: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407d68: c3 retq
407d69: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407d70: 8b 16 mov (%rsi),%edx
407d72: 89 17 mov %edx,(%rdi)
407d74: c3 retq
407d75: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
407d7a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
407d80: f2 0f f0 46 73 lddqu 0x73(%rsi),%xmm0
407d85: f3 0f 7f 47 73 movdqu %xmm0,0x73(%rdi)
407d8a: f2 0f f0 46 63 lddqu 0x63(%rsi),%xmm0
407d8f: f3 0f 7f 47 63 movdqu %xmm0,0x63(%rdi)
407d94: f2 0f f0 46 53 lddqu 0x53(%rsi),%xmm0
407d99: f3 0f 7f 47 53 movdqu %xmm0,0x53(%rdi)
407d9e: f2 0f f0 46 43 lddqu 0x43(%rsi),%xmm0
407da3: f3 0f 7f 47 43 movdqu %xmm0,0x43(%rdi)
407da8: f2 0f f0 46 33 lddqu 0x33(%rsi),%xmm0
407dad: f3 0f 7f 47 33 movdqu %xmm0,0x33(%rdi)
407db2: f2 0f f0 46 23 lddqu 0x23(%rsi),%xmm0
407db7: f3 0f 7f 47 23 movdqu %xmm0,0x23(%rdi)
407dbc: f2 0f f0 46 13 lddqu 0x13(%rsi),%xmm0
407dc1: f3 0f 7f 47 13 movdqu %xmm0,0x13(%rdi)
407dc6: f2 0f f0 46 03 lddqu 0x3(%rsi),%xmm0
407dcb: f2 0f f0 0e lddqu (%rsi),%xmm1
407dcf: f3 0f 7f 47 03 movdqu %xmm0,0x3(%rdi)
407dd4: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407dd8: c3 retq
407dd9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407de0: 66 8b 56 01 mov 0x1(%rsi),%dx
407de4: 66 8b 0e mov (%rsi),%cx
407de7: 66 89 57 01 mov %dx,0x1(%rdi)
407deb: 66 89 0f mov %cx,(%rdi)
407dee: c3 retq
407def: 90 nop
407df0: f2 0f f0 46 72 lddqu 0x72(%rsi),%xmm0
407df5: f3 0f 7f 47 72 movdqu %xmm0,0x72(%rdi)
407dfa: f2 0f f0 46 62 lddqu 0x62(%rsi),%xmm0
407dff: f3 0f 7f 47 62 movdqu %xmm0,0x62(%rdi)
407e04: f2 0f f0 46 52 lddqu 0x52(%rsi),%xmm0
407e09: f3 0f 7f 47 52 movdqu %xmm0,0x52(%rdi)
407e0e: f2 0f f0 46 42 lddqu 0x42(%rsi),%xmm0
407e13: f3 0f 7f 47 42 movdqu %xmm0,0x42(%rdi)
407e18: f2 0f f0 46 32 lddqu 0x32(%rsi),%xmm0
407e1d: f3 0f 7f 47 32 movdqu %xmm0,0x32(%rdi)
407e22: f2 0f f0 46 22 lddqu 0x22(%rsi),%xmm0
407e27: f3 0f 7f 47 22 movdqu %xmm0,0x22(%rdi)
407e2c: f2 0f f0 46 12 lddqu 0x12(%rsi),%xmm0
407e31: f3 0f 7f 47 12 movdqu %xmm0,0x12(%rdi)
407e36: f2 0f f0 46 02 lddqu 0x2(%rsi),%xmm0
407e3b: f2 0f f0 0e lddqu (%rsi),%xmm1
407e3f: f3 0f 7f 47 02 movdqu %xmm0,0x2(%rdi)
407e44: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407e48: c3 retq
407e49: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407e50: 0f b7 16 movzwl (%rsi),%edx
407e53: 66 89 17 mov %dx,(%rdi)
407e56: c3 retq
407e57: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
407e5e: 00 00
407e60: f2 0f f0 46 71 lddqu 0x71(%rsi),%xmm0
407e65: f3 0f 7f 47 71 movdqu %xmm0,0x71(%rdi)
407e6a: f2 0f f0 46 61 lddqu 0x61(%rsi),%xmm0
407e6f: f3 0f 7f 47 61 movdqu %xmm0,0x61(%rdi)
407e74: f2 0f f0 46 51 lddqu 0x51(%rsi),%xmm0
407e79: f3 0f 7f 47 51 movdqu %xmm0,0x51(%rdi)
407e7e: f2 0f f0 46 41 lddqu 0x41(%rsi),%xmm0
407e83: f3 0f 7f 47 41 movdqu %xmm0,0x41(%rdi)
407e88: f2 0f f0 46 31 lddqu 0x31(%rsi),%xmm0
407e8d: f3 0f 7f 47 31 movdqu %xmm0,0x31(%rdi)
407e92: f2 0f f0 46 21 lddqu 0x21(%rsi),%xmm0
407e97: f3 0f 7f 47 21 movdqu %xmm0,0x21(%rdi)
407e9c: f2 0f f0 46 11 lddqu 0x11(%rsi),%xmm0
407ea1: f3 0f 7f 47 11 movdqu %xmm0,0x11(%rdi)
407ea6: f2 0f f0 46 01 lddqu 0x1(%rsi),%xmm0
407eab: f2 0f f0 0e lddqu (%rsi),%xmm1
407eaf: f3 0f 7f 47 01 movdqu %xmm0,0x1(%rdi)
407eb4: f3 0f 7f 0f movdqu %xmm1,(%rdi)
407eb8: c3 retq
407eb9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
407ec0: 0f b6 16 movzbl (%rsi),%edx
407ec3: 88 17 mov %dl,(%rdi)
407ec5: c3 retq
407ec6: 90 nop
407ec7: 90 nop
407ec8: 90 nop
407ec9: 90 nop
407eca: 90 nop
407ecb: 90 nop
407ecc: 90 nop
407ecd: 90 nop
407ece: 90 nop
407ecf: 90 nop

I prefer to read assembly code in Intel syntax , can you translate it from AT&T syntax to Intel synatx.

>>>OK, I guess it's my mistake to give you the impression that remote memory copy is 10 time slower.
In my program, I allocate the source array and the destination array in node 0, and run the memory copy code in node 0, 1, 2, 3.
If I run memory copy in node 0, it takes 0.85s. If I run the code in node 1 or 3, it takes 5.74s. If in node 2, it takes 7s. So the performance degradation is 6.75 and 8.24.>>>

When your code moves further in the NUMA space than memory write/read speed degradation rises accordingly to the distance "travelled"
While analyzing your case I think that some kernel component like a memory manager or its subunit (allocator type) could be responsible for
large performance penalty.
Can you programmaticaly access and manipulate NUMA related API?I see two numa related header files.
By simple logical reasoning NUMA is managed at the chipset and built-in on-die memory controller units level so in theory it could expose
to the BIOS/EFI and later to the OS kernel some kind of programming interface which in turn could be accessed by the higher level code.
Maybe little bit of tweaking can be somehow helpful.

I need numa.h and numaif.h header files. Could you upload? I'll try to create a reproducer during next a couple of days.

Da, did you try to execute your test with a lower number of threads?

>>>I need numa.h and numaif.h header files. Could you upload? I'll try to create a reproducer during next a couple of days.>>>

I suppose that numa header files could be very helpful in attempt to understand numa related management functions.
I have read numa related ACPI documentation,but beside distance table i was not able to find any programming interface.
Maybe exact chipset documentation could shed some light.

>>> Da, did you try to execute your test with a lower number of threads?>>>
Do not you suspect that some kernel memory allocator can be responsible for such a large memory access speed degradation?

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya