Links to instruction documentation

Links to instruction documentation

Thomas Willhalm (Intel)'s picture
  • The Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A and 2B (available here) are the instruction set reference.
  • Haswell (2013) new instructionsare in theprogrammer's reference manual.
  • In appendix C of the Intel 64 and IA-32 Architectures Optimization Reference Manual (available here), the latencies and throughput of instructions are listed.
  • The documentation of the Intel C++ Compiler contains documentation of the intrinsics.
  • The AVX Programming Reference and examples for using AVX are available on the AVX community page. (The interactive Intel Intrinsics Guide is also available there, which is useful for SSE programming as well.)
  • The Intel Software Development Emulator (Intel SDE) allows simulation of future instructions.
25 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
ron_bennett@mentor.com's picture

Thomas,

Is there a downloadable PDF of the Optimization Reference Manual? I'm not finding it.

Also, is there any published data on expected performance of the various AVX intrinsics relative to SSE by cache? I.E. vmulps is 2X faster in L1, 1.8X faster in L2, etc. Maybe that's a dumb question, but it's hard to tell if code is optimal without some idea of ideal hw throughput.

Thanks for the pointers,
Ron

ron_bennett@mentor.com's picture

A second search of the Intel site turned up a downloadable PDF copy of the June 2011 Optimization Guide.

Brijender Bharti (Intel)'s picture

Hi,
Please use the following link:
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia...

It will open the reading pan. On Top right hand corner there is a down arrow button that means download (next to print).

ron_bennett@mentor.com's picture

Thanks for the tip. I missed the little arrow.

Shih Kuo (Intel)'s picture

With revision 40 of the Intel 64 and IA-32 Architectures Software Developer's Manual (SDM) just published, we are pleased to announce that paper versions of the SDM are now available via a print-on-demand fulfillment model (see links below) through a 3rd-party vendor.

The print-on-demand model of hard copy fulfillment of SDM provides several advantages over the previous bulk-printing operation:

1. We expect the new model can sustain itself indefinitely as it no longer relies on long-range budget forecasting and unpredictable funding supply. Bulk printing requires substantial budget for printing, warehousing, and ancillary costs associated with either shipping or governmental regulations. Three years ago, the funding source dried up, and the operation went into hibernation.

2. We expect print-on-demand orders are generally fulfilled by the same up-to-date version as available on the web. Web updates of the SDM are approximated on a quarterly cadence. When we did bulk printing, the lag between shipping out the final master and receiving truckload of stock into the warehouse was taking up to 3 months.

We want to acknowledge that the unit cost of print-on-demand to purchaser is higher than bulk printing, and our publishing operation will do as much as we can to help our hard copy customer get the most mileage out of their purchases. There are a few things related to that aspect:

a. We implemented a 7-volume partition due to the physical page-count constraint required by the print service vendor. Currently that constraint sits at 740 pages.

b. The order price of each volume is set by the print vendor (as the vendor is a for-profit entity). Intel uploads the finalized master with zero royalty.

c. Considering (i) the frequent update schedule of web versions, (ii) often large updates may concentrate on a subset of the 7 volumes and occur at a slower pace than the quarterly updates; we did some chapter level re-organization. The objective is to facilitate hard copy SDM users who wishes to keep up on the subject matters of his/her interest to only need infrequent re-order of selected volume(s), instead of ordering 7 volumes repeatedly.

For example, readers whose primary hard copy resources are instruction reference pages can focus on Volumes 2A and 2B; the virtualization audience can focus on Volume 3C; a performance monitoring tool developer may focus on volume 3B, etc.

d. Our initial vendor of print-on-demand will be www.lulu.com. In our limited experience as a customer there, we find there are material advantages to being on their mailing list. We typically receive a few email promotions each month, ranging from xx% site wide sale to free-shipping offers. So that may be of interest to hard copy readers.

In the new fulfillment model, the 7-volume PDF set of the SDM is available for purchase at the links below*. In the future, several other IA manuals (e.g. Software Optimization Manual) will be available throughthe same3rd-party print-on-demand vendor.

*NOTE: Due to manual restructuring, please download the file and review prior to purchasing to ensure you are ordering the volume(s) with information you are interested in.

Volume 1 Basic Architecture: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-1-basic-architecture/18596113

Volume 2A Instruction Set Reference A-L: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-2a-instruction-set-reference-a-l/18595762

Volume 2B Instruction Set Reference M-Z: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-2b-instruction-set-reference-m-z/18621112

Volume 2C Instruction Set Reference: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-2c-instruction-set-reference/18621165

Volume 3A System Programming Guide, Part 1: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-3a-system-programming-guide-part-1/18621230

Volume 3B System Programming Guide, Part 2: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-3b-system-programming-guide-part-2/18621276

Volume 3C System Programming Guide, Part 3: http://www.lulu.com/product/paperback/intel%c2%ae-64-and-ia-32-architectures-software-developers-manual-volume-3c-system-programming-guide-part-3/18621297

Igor Levicki's picture

It would be very handy to have Instruction Set and intrinsic Reference in a CHM file. Any chance of creating that?

-- Regards, Igor Levicki If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.
Shih Kuo (Intel)'s picture

Rev. 26 of the Intel 64 and IA-32 Architectures Software Software Optimization Manual is live now.
In the next few weeks, hardcopy option of Rev. 26 (in a two-volume partition) is expected to be available from lulu.com as well.

Shih Kuo (Intel)'s picture

Hi Igor
We don't have plans to produce additional formats at this time.
Thx for your input.

Sergey Kostrov's picture

Thank you for the list of links in the Post #6!

Best regards,
Sergey

stefan.dragnev's picture

Thanks for posting the links to lulu.com. It is great to have a paper copy of the manuals so I don't have to be in front of computer just to read them.

Is there a chance Intel can also offer hardcover option of the manuals in addition to the already available paperback option? It will increase the cost somewhat, but it will make the printed manuals much more durable considering their size. Lulu already offers standard hardcover option, but when I spoke to their customer support I was told that the author of each publication is at the sole discretion whether lulu.com will offer their book/manual in hardcover.

Thanks,
Stefan Dragnev

Jason Perry's picture

Could we please get an answer to the question posed right above my post? (#11) I too would like these in hardcover - wouldn't mind paying the extra $$ for this option - and if it's just a matter of Intel saying, "It's okay for people to order this format" I'm left here wondering... Why not? Is there something not obvious that we're missing here? Thanks ~ Jason

Shih Kuo (Intel)'s picture

Hi Stefan/Jason

Thank you for your inputs and your interest in a hardbound option.

Since we resumed softbound print-on-demand fulfillment model, the ship-out data we have from the vendor indicates there is a constant but small volume of demand with each revision. Regardless of the demand volume, it does not change our commitment to continue the softbound availability.

At the same time, we are pursuing operational improvement that can lower the cost on the user side. The factor that has room for optimization is page count, as we had chosen zero royalty from the beginning. So we are in the process of adjusting our production to use a slightly larger format to reduce the page count.

A second factor that can affect users in certain geography is the cost of oversea shipment, which we don't have direct control. From our understanding, the current vendor's physical printer facilities are located in US, Canada, France, UK, and Australia. So, some of the historically largest consumption markets like China, Brazil, India would bear higher shipping costs on top of the merchandise cost. We are willing to investigate the feasibility of expanding the print-on-demand fulfillment model into locally-supplied distribution if available. We welcome referral information about local print-on-demand vendors for us to investigate, along with cost estimate of oversea shipment given by current supplier. Please direct your feedback of current cost and local print-on-demand supplier referral to "intelsdm@intel.com" with subject heading "local print-on-demand referral".

In terms of whether to initiate hard-bound options, we like to see more data before making a decision.

The most important factor to sway our decision is user demand. Based on our soft-bound data and considering the cost-delta, release frequency, other logistic obstacles. I feel it is more prudent to defer a decision. Intel SDM readers who wish to see the availability of hardbound options can direct feedback to "intelsdm@intel.com" with the subject heading "hardbound SDM" and provide information on the limit of acceptable cost increase of a hardbound volume.

sg03ty's picture

thank you

thietkelogo's picture
Quoting stefan.dragnev Thanks for posting the links to lulu.com. It is great to have a paper copy of the manuals so I don't have to be in front of computer just to read them.

Is there a chance Intel can also offer hardcover option of the manuals in addition to the already available paperback option? It will increase the cost somewhat, but it will make the printed manuals much more durable considering their size. Lulu already offers standard hardcover option, but when I spoke to their customer support I was told that the author of each publication is at the sole discretion whether lulu.com will offer their book/manual in hardcover.

Thanks,
Stefan Dragnev

Thanks for the link, i help me solve some problems

Roman Dementiev (Intel)'s picture

Hi,

there is a recent Intel Developer Forum 2012 presentation on AVX2 and Bit Manipulation New Instructions. Slides: http://intel.com/go/idfsessions (session ARCS005).

Best regards,
Roman

shin's picture

Hi
i have download all the links for file i needed, thanks for share all

shin

shin
Sergey Kostrov's picture

Hi eberybody,

Where coud I find latencies for MOVNTDQ and VMOVNTDQ instructions?

Unfortunately, the latest edition of "Intel Optimization Reference Manual" ( 04.2012 ) doesn't have any data for these two instructions in Appendix C.

Best regards,
Sergey

Shih Kuo (Intel)'s picture

This just my personal view...

1. The instruction in question is for streaming store usage, when the programmer do not intent to consume the stored data immediately. So the rationale to design an algorithm based on the latency of such instruction seems to be questionable if the intent includes optimization for performance.

2. I think it is easy to picture what will happen if you try to write a directed test by introducing dependency and see the delay exposure will reflect the store data operation from the memory pipeline to system ram, plus other factors. Your mileage will vary, depending on many non-CPU factor and likely won't be a sharp peak distribution.

3. If the intent is to figure how much distance hoist the streaming store ahead of eventual consumption. I suspect you have to deal with some range that's likely volatile. So trial may be your best tool.

Sergey Kostrov's picture

>>...So trial may be your best tool...

Shih,

I really appreciate your feedback and my question is should we always try to get latencies from our tests?

The latest edition of "Intel Optimization Reference Manual" ( 04.2012 ) has lots of details about these two instructions but by some unexplained reason latencies are not specified. I'll try to do some tests in about 2-3 weeks after I receive a new computer system but I'd like to get some information as soon as possible. Would you be able to forward my question to Intel Hardware Engineers, please?

Once again, Where coud I find latencies for MOVNTDQ and VMOVNTDQ instructions?

Best regards,
Sergey

Shih Kuo (Intel)'s picture

Hi Sergey

The best advice I could offer is to borrow from an article I read about David Chaiken's recommendation on algorithm.

To design a good algorithm, think about its performance model underneath.

If a hardware engineer gives me a single number on this, I am certain that is not a complete picture and it would be a dis-service to publish a number given the complexity of situations that software can deploy into the wide variety of platform.

A number in CPU core cycle will certainly be useless, considering the uncore operates in a different clock domain. I believe the DRAM sub-system may bring in another clock domain into the picture.

If your software gets deployed on a multi-socket platform, what kind of complications will snooping bring?

The round trip for a piece of data to move from a register to DRAM and then fetched back to the consumer will be an arguous journey.

If your algorithm is able to deal with a range of values, would that manifest to end users as desirable experience (selecting from one end of opportunistic (fragile) value to the other end of higher confidence with larger time constant)?

If you can replace the longer journey with shorter ones, then writeback cacheable and locality consideration come into play, which makes the question of streaming store round trip latency moot.

Sergey Kostrov's picture

>>...If a hardware engineer gives me a single number on this, I am certain that is not a complete picture...

Could you get that number? I'm sorry and let us decide what to do next. As I've told several times:

"...The latest edition of "Intel Optimization Reference Manual" ( 04.2012 ) has lots of details about these two instructions but by some unexplained reason latencies are not specified..."

I also don't see any logic in your statements:

...David Chaiken's recommendation on algorithm...
...I am certain that is not a complete picture...
...A number in CPU core cycle will certainly be useless...
...If your algorithm is able to deal with a range of values...
...If you can replace the longer journey with shorter ones...

Shih, we would like to see just two numbers ( !!! ), that is latencies for two Intel instructions and nothing else. Do you understand this?

Thomas Willhalm (Intel)'s picture

Sergey,

as Shiv has pointed out, the latency depends on several factors. Therefore, we need some information about the system that you are using. What is the core architecture that you are using? Which platform do you have? What is the core and uncore frequency? What DIMMs are you using (speed and rank) and how are they populated? Do you need the loaded latency and if so what is the bandwidth that you have?

Kind regards
Thomas

Sergey Kostrov's picture

Hi everybody,

>>...
>>What is the core architecture that you are using? Which platform do you have? What is the core and uncore frequency? What DIMMs
>>are you using (speed and rank) and how are they populated?

Here are some technical specs for my system:

Dell Precision Mobile M4700
Intel Core i7-3840QM ( Ivy Bridge / 4 cores / 8 logical processors )( http://ark.intel.com/compare/70846 )
16GB RAM
320GB HDD
NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )
Windows 7 Professional 64-bit

Best regards,
Sergey

iliyapolak's picture

>>>Once again, Where coud I find latencies for MOVNTDQ and VMOVNTDQ instructions?>>>

Latency of MOVNTDQ is given in Agner instruction tables and it is ~400 cycles for Haswell CPU.

Login to leave a comment.