Implement SSE3 instructions to improve synchronization between multiple agents. This technique is targeted for use by system software to provide more efficient thread-synchronization primitives.
Use the MONITOR and MWAIT instructions. MONITOR defines an address range used to monitor write-back stores. MWAIT is used to indicate that the software thread is waiting for a write-back store to the address range defined by the MONITOR instruction.
Software should know the exact length of the region that will be monitored for writes by the MONITOR/MWAIT instructions. Allocating and using a region smaller in length than the triggering area for the processor could lead to false wake-ups (resulting from writes to data variables that are incorrectly located in the triggering area). Conversely, allocating a region greater in length than the triggering area could lead to the processor not waking appropriately. CPUID allows for the determination of the exact length of the triggering area. This length has no relationship to any cache-line size in the system, and software should not make any assumptions to that effect. Based on the size provided by CPUID, the OS/software should dynamically allocate structures with appropriate padding. If correct allocation causes issues, choose not to use MONITOR/MWAIT.
While a single length should suffice for single cluster based systems, setting up the data layout for systems with multiple clusters will most likely be more complicated. Depending on the mechanism implemented by the chipset in such a system, a single monitor-line size may not suffice.
Typically, software will have a set of data variables that it monitors for writes. It will be necessary to locate these in the monitor-triggering area. To eliminate false wake-ups due to writes to other variables, software will need to add padding around the monitored variables. This is referred to as the padded area.
Multiple events other than a write to the triggering address range can cause a processor that executed MWAIT to wake up. These include the following:
Power-management-related events such as Thermal Monitor, Enhanced Intel SpeedStep® technology transitions or chipset-driven STP-CLK# assertion will not cause the Monitor event pending bit to be cleared. Debug traps and faults will not cause the Monitor event-pending bit to be cleared.
The example below shows the typical usage of MONITOR/MWAIT:
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804