<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Wed, 25 Nov 2009 14:27:25 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network Comments feed</title>
    <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/feed/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>By Gary Schaps</title>
      <description><![CDATA[ 
Revealing the password required to extract files from &quot;topology_enumeration_07082008.zip&quot; would certainly make it more user friendly.  Or, am I missing something?
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-192</link>
      <pubDate>Wed, 06 Aug 2008 14:29:02 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-192</guid>
    </item>
    <item>
      <title>By Intel Software Network Support</title>
      <description><![CDATA[ 
Hello - the zip file has been replaced with a new, non-encrypted zip file.  We apologize for the error.
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-196</link>
      <pubDate>Thu, 07 Aug 2008 16:02:02 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-196</guid>
    </item>
    <item>
      <title>By Igor Levicki</title>
      <description><![CDATA[ 
I would very much apreciate if Intel updated AP-485 to include CPUID leaf 0xB definition in more detail. I hate reverse-engineering someone else's code.
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-249</link>
      <pubDate>Fri, 12 Sep 2008 12:08:22 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-249</guid>
    </item>
    <item>
      <title>By shihjong kuo</title>
      <description><![CDATA[ 
You can also look up the definition of CPUID leaf 0BH fields in the CPUID instruction reference pages of Vol 2A.
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-253</link>
      <pubDate>Mon, 15 Sep 2008 19:02:33 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-253</guid>
    </item>
    <item>
      <title>By CQ Tang</title>
      <description><![CDATA[ 
I don&#39;t believe that the way to determine the cache topology is correct.

I have a machine whith two packages, two die on each package, and two core on 
each die, the two core on a die share a L2 cache.

p0-p7 are the OS processor number, p0,p2,p4,p6 are on the first package, p0 and p4 are
on the same die and share a L2 cache, but your code shows that p0 and p2 share a L2 cache,

I looked the CPUID specification, there is no relation between cache hierarchy and APIC id.

I don&#39;t know how core ID is assigned to each core, and use to construct APIC id

In my two package, two die, two core case, how is core ID assigned to each core ?

Thanks for email reply



 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-7866</link>
      <pubDate>Sat, 27 Sep 2008 22:30:08 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-7866</guid>
    </item>
    <item>
      <title>By Shih Kuo (Intel)</title>
      <description><![CDATA[ sometimes expectation or intuition can lead to a discovery, other times it can trip us instead. 

If you think an os api is providing you with information that is not consistent with the reference code, I can go over the detail of the printout of the reference code and the print out of the information you collected from the api.

If your intuition make you believe any portion of the print out of the reference code is questionable, can you send me the printout in a text file? you can send them to shihjong.kuo@intel.com

Please remember that a &#34;die&#34; inside a package is not a topology level that can be enumerated by the reference code, nor any OS can enumerate the existence of single-die or two-die implementation of multiple cores inside a physical package. This might be an area, intuition might play trick for you.

Another aspect is that each component of the software stack that tries to report topology information can implement some numbering scheme regarding logical processors, so Ihave to be careful when you refer to p0, p1, etc. which two pieces of software you are making comparison...

To expand a little more on your last question:
in the booting up process

1. at reset, APIC ID are assigned at hardware level, at this point control has not been transferred to any software, not even BIOS.
2. after the Bios takes control of the BSP, bios will round up the other logical processors (application processors in BIOS parlance) in the system and prepare some internal data structure with information about APIC ids of the BSP and APs. Obviously Bios can have some numbering system.
3. After BIOS transfer control to the OS along with the data structures defined by ACPI, how the OS choose to identy individual logical processor, it is up to each OS. Linux uses an ordinal numbering system starting from 0, Windows uses an affnity mask. You can associate a numbering system of individual bit of an affinity mask with some numbering system. But between Windows and Linux, this 0-based ordinal numbering schem may or may not be the same on the same hardware installed with both Linux and Windows with dual-boot configuration.

So if you&#39;re asking about #1, I can go over the printout that executed from your machine. If you want to ask #2 or #3, that&#39;s up to the bios and OS. ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-8094</link>
      <pubDate>Fri, 03 Oct 2008 20:41:27 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-8094</guid>
    </item>
    <item>
      <title>By Sam Baskinger</title>
      <description><![CDATA[ 
The file mk_32.sh has the line: gcc -m32 -g -c util_os.c -o util_os.c

I think that it was meant to be: gcc -m32 -g -c util_os.c -o util_os.o

The output file is a .o instead of a .c. :) Regardless, thank you for the code! While I have you attention, any caveats for the snazzy-new i7s? Thanks!


 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-9254</link>
      <pubDate>Wed, 26 Nov 2008 10:29:43 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-9254</guid>
    </item>
    <item>
      <title>By Alec Fistrovici</title>
      <description><![CDATA[ ! am trying to find information about about buying or perhaps biuldind a laptop type portable pc with a latest intel mobile Quad chip . Does Intel already configure or produce a Xeon quad mobile with a 64 bit program compatability or is there a configuration utility tool for matching the specific mobile quad chip to the proper mobile mother board? I cant find the information on the Intel site relating to this issue.Any advice or info would be appreciated .Thanks           e-mail:bobcat445@rogers.com
                                                       Regards,Alec Fistrovici ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-12465</link>
      <pubDate>Fri, 26 Dec 2008 16:33:22 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-12465</guid>
    </item>
    <item>
      <title>By FvM</title>
      <description><![CDATA[ Hola engo un problema con Mi Laptop pero quiero recuperarlo por que es mi Gusto ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-18372</link>
      <pubDate>Sat, 07 Feb 2009 19:27:08 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-18372</guid>
    </item>
    <item>
      <title>By Shih Kuo (Intel)</title>
      <description><![CDATA[ Sam, thanks for catching the typo. The reference code should work fine for i7s. ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-19855</link>
      <pubDate>Thu, 19 Feb 2009 12:54:50 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-19855</guid>
    </item>
    <item>
      <title>By lio</title>
      <description><![CDATA[ I meet the same problem specified by CQ Tang. 
Our platform consists of dual Woodcrest Xeon 5345 processors. 
From /proc/cpuinfo, #0 #1 #4 and #5 are in package 0. 
And with the reference code, the result is:
#0 and #1  share one L2 cache
#4 and #5  share another L2 cache
But the result from ping-pong test shows that:
Communication between #0 and #4 has the top perfermance.
Communication between #0 and #1 has the same perfermance with Communication between #0 and #5, and the perfermance is much lower than Communication between #0 and #4.
Communication between #0 and #2,#0 and #3,#0 and #6,#0 and #7 are same in performance.
It shows that #0 and #4 share a same L2 cache.

I have repeated the test for many times, and I think maybe the reference code has some bugs.  

P.S. Your work is great! ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-20203</link>
      <pubDate>Mon, 23 Feb 2009 07:05:18 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-20203</guid>
    </item>
    <item>
      <title>By ananthnarayan_s</title>
      <description><![CDATA[ The build script bug reported by Sam doesn't appear to have been fixed. :) ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-20646</link>
      <pubDate>Thu, 05 Mar 2009 20:28:58 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-20646</guid>
    </item>
    <item>
      <title>By Shih Kuo (Intel)</title>
      <description><![CDATA[ I had a few off-line email exchanges to root cause Mr. lio's observations. The correlation of expected performance characteristics relative to the output example of topology enumeration may be of interest to other folks. 

To recap the background, the situation under test was dealing with ping-pong data transimission of a pair of affinitized transmit/receive agents. 

If the affinitization of logical processors of the transimitter/receiver pair are sharing the same L2 cache, one can expect that the latency of such ping-pong data transmission will be lower than if data have to go out on the bus. 

The ping-pong test was run on a Linux system and affinity is controlled using a mask that is equivalent to the CPU_SET API. The hardware is a dual-socket quad core system.

So, how does one go about mapping a pair of ordinal numbers sent to the CPU_SET API and be cerain those ordinal numbesr control two processor coress sharing the same L2?
In this case there are three sets of numbers that are relevant:

1. The composite Affinity mask for the target L2, for a cache level shared by two logical processors, the composite mask will have two non-zero bit. 

The tutorial portion of the code example can print out a small box diagram that shows the L2 mapping to affinity masks, the diagram below depicts the L2 cache level in one of the two quadcore processor:

            +----+----+----+----+
Cache    |  L2        |  L2         |
Size       |  4M       |  4M        |
CmbMsk  |  11        |  22        |
            +----+----+----+----+

2. Since CPU_SET uses an ordinal index to specify individual logical processor, we need a listing of the logical processor affnity mask relative to the ordinal numbers that software can use to iterate individual logical processor. The example code also produces the following listing that connects the three sets of numbers (combined mask of each L2, and affinity masks of each logical processor listed in numerical order of the ordinal numbers)

Affinity mask 00000001 - apic id 0
Affinity mask 00000002 - apic id 2
Affinity mask 00000004 - apic id 4
Affinity mask 00000008 - apic id 6
Affinity mask 00000010 - apic id 1
Affinity mask 00000020 - apic id 3
Affinity mask 00000040 - apic id 5
Affinity mask 00000080 - apic id 7

3. From the Composite mask value of "11H", which corresponds to affinity mask value of 01H (ordinal number 0 in the listing) and 010H (ordinal number 4). The sequence of ordinal number pair that software can use to affinitize the ping/pong transmitter/receiver test to take advantage of L2 is (0, 4), or (1, 5), (2, 6), (3, 7). The first pair matches what was observed by Mr. Lio.

It is important to realize that the numercal order of an ordinal numbered index (for CPU_SET in Linux or SetThreadAffinity ) does not imply a direct mapping relationship to any cache level (L1 Data cache, L1 Instruction cache, L2, etc). 

In a single-socket quad-core system, you may see the following:
             +----+----+----+----+
Cache     |  L2        |  L2         |
Size        |  4M       |  4M        |
CmbMsk   |   3        |   c         |
             +----+----+----+----+

and an ordinal numbered listing of 

Affinity mask 00000001 - apic id 0
Affinity mask 00000002 - apic id 1
Affinity mask 00000004 - apic id 2
Affinity mask 00000008 - apic id 3

One interesting about this is that optimal pairing of affinity control (via affinity mask or equivalent) is highly workload specific.

As, Mr. Lio observed, pairing logical processor (0, 1) in a dual-socket, quad-core topology did not produce the best latency in transmission/receiver test. On a single-socket quadcore system, that pairing would. 

On the other hand, one factor in scheduling heuristic is to minimize cache evictions due to context switches. More than 90% of the wall clock cycles often lads on the system Idle loop, implying it is more common to have several unsubscribed logical processor resources available when a runnable task is added to the queue. 

On a dual-socket, quadcore system shown above, using consecutive sequence of ordinal number to schedule a new task can take advantage of the capacity of L2 without causing evictions on other tasks that were already running.

But applying the same consecutive sequence of ordinal-numbered heuristic to a single-socket quadcore system could result in competition of the L2 capacity by two tasks. So cache evictions caused by starting another task may have some visible performance impact, depending on the size of respective working set. 

  ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-20755</link>
      <pubDate>Mon, 09 Mar 2009 12:48:31 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-20755</guid>
    </item>
    <item>
      <title>By Kevin</title>
      <description><![CDATA[ Couple of minor points:

1 The Unix shell files should have unix line endings

2 when run the following warning was generated

cpu_topo.c: In function ‘DumpCPUIDArray’:
cpu_topo.c:1857: warning: comparison is always false due to limited range of data type

looks like the 'cres' variable should be unsigned perhaps... ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-24547</link>
      <pubDate>Tue, 19 May 2009 02:09:32 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-24547</guid>
    </item>
    <item>
      <title>By Intel Software Network Blogs &amp;raquo; Parallel Programming Talk - Counting Cores on the Listener Question Show</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-25442</link>
      <pubDate>Tue, 02 Jun 2009 09:47:20 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-25442</guid>
    </item>
    <item>
      <title>By Parallel Programming Talk - Counting Cores on the Listener Question Show</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-25457</link>
      <pubDate>Tue, 02 Jun 2009 11:43:53 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-25457</guid>
    </item>
    <item>
      <title>By Kumaravel</title>
      <description><![CDATA[ How to find the Number of Physical Processor Count & Number of Core(s) per Physical Processor on Intel Xeon E5540 Processor?
I have tried using the CPUID, But could not able to find exact bit location. But My code will working fine in Single (Intel Xeon) processor system.

Thanks
Kumaravel ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-31117</link>
      <pubDate>Wed, 16 Sep 2009 04:32:38 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-31117</guid>
    </item>
    <item>
      <title>By Geoffrey Grinton</title>
      <description><![CDATA[ I have been using this code for some time - thanks!
But I am also aware of, and have started using the code in Khang Nguyen's article and the HTMultiCore library.
Now I am trying to reconcile the two sets of data, and need help please.
Here is output on my Lenovo ThinkPad, using the distributed sample programs.

HTMultiCore tells me:
   Hyper-Threading Technology: Not Capable
   System: Multi-core
   Number of available logical processors per physical processor: 1
   Number of available cores per physical processor: 2
   Number of physical processors: 1

cpu_topo tells me:
  Number of logical processors visible to the OS: 2
  Number of logical processors visible to this process: 2 
  Number of processor cores visible to this process: 2
  Number of physical packages visible to this process: 1

And, for reference, the systeminfo command tells me:
  System type:       X86-based PC
  Processor(s):       2 Processor(s) Installed.
                           [01]: x86 Family 6 Model 15 Stepping 10 GenuineIntel ~1995 Mhz
                           [02]: x86 Family 6 Model 15 Stepping 10 GenuineIntel ~1994 Mhz

It is unclear to me whether the terms "physical processor" (HTMultiCore) and "physical package" (cpu_topo) are synonymous or not. If they are meant to be, then it seems to me that I am getting different answers, since in once case I see 1 logical processor per physical processor, and in the other case, 2 logical processors "visible" - which I presume means per physical package, since there is only one of them.

Can you help me understand how to interpret these results please?

Thanks.
Geoffrey
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-33016</link>
      <pubDate>Tue, 20 Oct 2009 21:58:00 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-33016</guid>
    </item>
    <item>
      <title>By Shih Kuo (Intel)</title>
      <description><![CDATA[ Kumravel

The number of physical processors is a system-level information, CPUID instruction only reports raw data within its own logical processor domain. If you look at the reference source code, you can find the utility routine GetSysProcessorPackageCount(). There is another utility routine that reports the total number of cores in the system across all the physical processors, you can derive core count per physical processor from those information. 

Geoffrey

My colleage Khang is aware his library code needs to be updated to correct this inconsistency when it tries to report the number of logical processor per core but the text heading was misleading.


As a separate note, Windows 7 and Server 2008 servicepack 2 have added support for Group_AFFINITY with new APIs to support up to 4 groups of affinity masks. I expect to provide an updated reference code that includes support for GROUP_AFFINITY in the near future.
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-34515</link>
      <pubDate>Wed, 11 Nov 2009 13:56:10 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/#comment-34515</guid>
    </item>
  </channel></rss>