Intel ISA Extensions

Understanding my Benchmarks

I wote a benchmark to compare thpossible speedup with SSE vs. scalar execution. But I don't undestand the results I get.
The following loop:
movaps 0x10(%rax),%xmm1
cmpltps %xmm1,%xmm0
movaps 0x20(%rax),%xmm0
cmpltps %xmm0,%xmm1
movaps 0x30(%rax),%xmm1
cmpltps %xmm1,%xmm0
add $0x40,%rax
movaps (%rax),%xmm0
cmpltps %xmm0,%xmm1
cmp %rax,%rbx
ja loop

help on detecting stalls(identifying structural hazards) in assembly code

Hi All,
Our project is to optimize instruction scheduling in gcc by detecting structural hazards. We are trying to come up with a test case for the same, a scenario wherein one of the instructions is stalled due to the resource being used by some other instruction. However, we are unable to do so.

is there a standard format in which we provide architecture specific information to a software

Hi All,
Our project requires us to specify architecture specific information to gcc(not .md files in gcc). We need to specify number of cycles taken per instruction for 686 architecture - as information to gcc.

Question: Is there a standard format in which we define the architecture specific information to softwares requiring these? Do we have the architecture specific information for Pentium Dual Core architecture in a format, that can be read by any software requiring it?

Target Architecture: 686 processor
Working on: Intel Pentium Dual Core processor

how to turn off out-of-order execution in Intel processor

Hi All,
Our project is to optimize instruction scheduling in gcc, by detecting structural hazards. The algorithm employed requires no out-of-order executions by the processor.

Question: Is there a command/mechanism to turn out-of-order execution off in Intel processor?
Target Architecture: 686 processor
Working on: Intel Pentium Dual Core processor

Thanking You,

Out of order execution

Is there a simulator and/or ageneral procedure one can follow to predict what instructions will be executedin what order (assuming all data is in the L1 cache)? I'm having a hard time comprehending why a given instruction sequence executes much faster than another. I suspect it's due to the out of order execution and register renaming, but I've found no tangible reason yet. Any help would be appreciated.

Parallel instructions for detecting MSB in array of bytes

First time posting, not sure if correct forum, but...

I have a large array of bytes, max up to 1600, but mostly up to 128.

Bytes are typically 7-bit of information, and the MSB is used as a sentinel, so the MSB is set in a small portion of the bytes.

Currently I'm looping through them in a loop, but is there a better way to use SSEx to process 128 bytes in parallel and get back a SSE vector with bits set for each byte?

Any suggestions?

Thank you,

[smp] processor disabled

I am trying to manage the multiprocessor initialization.

I found a little program [1] that implements the multiprocessor specification [2]. but in the processor entries([2] page 4-7) I get that the APs are disabled.

How can I enable the processors??
Is it a SW or HW problem ?


Thank you.
Daniel M.

LZCNT on Core i7

Let me start with: I know that LZCNT is not supported on the Core i7.

However when I run theinstruction on my Core i7 I do not as I expect get an illegal instruction exception.
Instead it performs a BSR instruction.
This is while working in 64 bit mode and using 64 bit registers.

Is this a bug, expected behaior or is running unsupported instructions undefined (and this is therefore OK)?

For reference the opcodes are:
LZCNT: 0xBD (Same as BSR but has a prefix of 0xF3)

Any feedback would be helpful.


Subscribe to Intel ISA Extensions