Deprecating the PCOMMIT Instruction

Executive Summary

The PCOMMIT instruction has been deprecated.  Although it was documented earlier, Intel has dropped it from consideration for future products.  This blog post explains the details behind that decision.


Enabling Persistent Memory Programming                               

In preparation for the emerging persistent memory technologies, like Intel DIMMs based on 3D XPoint™ technology, Intel has defined several new instructions to enable the persistent memory programming model.  First, there are two new optimized cache flushing instructions, CLWB and CLFLUSHOPT.  These instructions are described in the Intel Architecture Instruction Set Extensions Programming Reference and are slated to appear on various platforms, including those supporting the Intel DIMM.  They provide a high performance method to flush stores from the CPU cache to the persistence domain, a term used to describe that portion of a platform’s data path where stores are power-fail safe.

Originally, the set of new instructions included one called PCOMMIT, intended for use on platforms where flushing from the CPU cache was not sufficient to reach the persistence domain.  On those platforms, an additional step using PCOMMIT was required to ensure that stores had passed from memory controller write pending queues to the DIMM, which is the persistence domain on those platforms.

The picture below illustrates the data path taken by a store (MOV) to persistent memory.

mov flow

As shown above, when an application executes a MOV instruction, the store typically ends up in the CPU caches.  Instructions like CLWB can be used to flush the store from the CPU cache.  At that point, the store may spend some amount of time in the write pending queue (WPQ) in the memory controller.  As shown above, the larger dashed box represents the power-fail safe persistence domain on a platform that is designed to flush the WPQ automatically on power-fail or shutdown.  One such platform-level feature to perform this flushing is called Asynchronous DRAM Refresh, or ADR.

When the persistent memory programming model was first designed, there was a concern that ADR was a rarely-available platform feature so the PCOMMIT instruction was added to ensure there was a way to achieve persistence on machines without ADR (platforms where the persistence domain is the smaller dashed box in the picture above).  However, it turns out that platforms planning to support the Intel DIMM are also planning to support ADR, so the need for PCOMMIT is now gone.  The result is a simpler, single programming model where the application need not contain logic for detecting whether PCOMMIT is required.  For this reason, PCOMMIT is being deprecated before ever shipping on an Intel CPU, removing any need to support the instruction in older software since no software could have contained it (the opcode has always produced an invalid opcode exception and will continue to do so).

As shown in the picture above, a platform may still have a way to flush the WPQ (shown as WPQ Flush above).  Unlike the PCOMMIT instruction, this is a kernel-only facility used to flush commands written to DIMM command registers, or used by the kernel in the rare case where it wants to ensure something is immediately flushed to the DIMM.  The application is typically unaware the WPQ Flush mechanism exists.


The Simpler Programming Model

The picture below shows a sample instruction sequence for storing values (10 and 20) to persistent memory locations.


The sequence on the left was required on platforms that did not have the ADR feature to flush the WPQ on power-fail/shutdown.  Since ADR is now a requirement for persistent memory support, the simpler sequence on the right can be used for all platforms.


Operating System and Toolchain Changes   

To prepare for persistent memory programming, some operating systems, compilers, assemblers, and libraries were modified to use the PCOMMIT instruction.  Since the instruction was not guaranteed to exist on a given platform, any software using PCOMMIT would only do so if the appropriate CPUID flag indicated PCOMMIT was supported (the exact flag is CPUID.(EAX=07H, ECX=0H):EBX, bit 22).  Since PCOMMIT is deprecated, that CPUID flag is now reserved to always be zero, rendering any code using PCOMMIT to be dead code that will never be executed.

The harmless dead code can be removed over time, but as of this writing, all known operating systems supporting persistent memory and the Non-Volatile Memory Libraries (NVML) at have already been updated to remove all uses of PCOMMIT.



The programming model for persistent memory on Intel CPUs has been simplified by deprecating the PCOMMIT instruction before its first implementation.  Most software, including the Non-Volatile Memory Libraries at are already updated to reflect this change.


Glossary of Terms

Power-fail Protected Domain


Persistent Domain

When storing to pmem, this is the point along the path taken by the store where the store is considered persistent


(Asynchronous DRAM Refresh)

A platform-level feature where the power supply signals other system components that power-fail is imminent, causing the Write Pending Queues in the memory subsystem to be flushed


An instruction allowing an application to flush-on-demand the memory subsystem Write Pending Queues.  With ADR required, this instruction is no longer necessary and is being deprecated.


(sometimes called TPQ)

Write Pending Queues in the memory subsystem




Instructions that flush lines from the CPU caches.  CLWB and CLFLUSHOPT are recent additions for better pmem performance


The mechanism allowing software to detect what features are supported by a CPU


For more complete information about compiler optimizations, see our Optimization Notice.


David Z.'s picture

Hi Andy. Thanks for clearing things up. I was using the ISA manual and I think it would have been more clear if the textual description for CLFLUSH* and CLWB refered to "all cache hierarchies" rather than "the cache hierarchy" which is confusing when two cores don't happen to share a last level cache.

Rudoff, Andy M (Intel)'s picture

@David Z: Yes.  CLWB is a cache coherent operation, so it flushes a dirty cache line regardless of which CPU's cache it is in.

David Z.'s picture

Thank you for your quick and detailed feedback. What about preemption before the CLWB and then resumption on another CPU? Does the CLWB reach out across the bus and force the other CPU to flush the cache line?

Rudoff, Andy M (Intel)'s picture

@David Z: The SDM on says this about CLWB:

CLWB instruction is ordered only by store-fencing operations. For example, software can use an SFENCE, MFENCE, XCHG, or LOCK-prefixed instructions to ensure that previous stores are included in the write-back.

So as long as the context switch ends up invoking some store-fencing operation, the result will fence the CLWBs that were issued so far.  In practice, context switches always end up taking a lock, so the LOCK-prefix used during that operation causes the fence.  (If there were some obscure OS out there that doesn't do any store-fencing operations on context switch, it would need to be modified to issue an SFENCE when switching, but I don't think you'll find any such beast.)

David Z.'s picture

How is this obligation supposed to work in presence of preemptive multitasking? I.e. if the OS preempts a thread after CLWB but before SFENCE, there is no guarantee that the OS will resume the thread on the same core or even the same CPU. Is the OS obligated to flush the whole cache hierarchy and SFENCE on every context switch just in case the thread is doing persistent memory activity?


Rudoff, Andy M (Intel)'s picture

@David Z: Using the CPU cache flush instructions is a semantic obligation, unless you have a custom platform that flushes those caches automatically (indicated by an ACPI property added in ACPI 6.2).  So for general-purpose software, the cache flushes are required and using only an SFENCE won't guarantee anything about persistence.


David Z.'s picture

When using persistent memory, are CLFLUSH/CLFLUSHOPT/CLWB performance optimizations, or a semantic obligation? Said differently, what happens when only SFENCE or a LOCK prefixed instruction are used to sequence stores to memory?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.