Intel® Software Guard Extensions Tutorial Part 9: Power Events and Data Sealing

Download [ZIP 598KB]

In part 9 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series we’ll address some of the complexities surrounding the suspend and resume power cycle. Our application needs to do more than just survive power transitions: it must also provide a smooth user experience without compromising overall security. First, we’ll discuss what happens to enclaves when the system resumes from the sleep state and provide general advice on how to manage power transitions in an Intel SGX application. We’ll examine the data sealing capabilities of Intel SGX and show how they can help smooth the transitions between power states, while also pointing out some of the serious pitfalls that can occur when they are used improperly. Finally, we’ll apply these techniques to the Tutorial Password Manager in order to create a smooth user experience.

You can find a list of all the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Source code is provided with this installment of the series.

Suspend, Hibernate, and Resume

Applications must be able to survive a sleep and resume cycle. When the system resumes from suspend or hibernation, applications should return to their previous state, or, if necessary, create a new state specifically to handle the wake event. What applications shouldn’t do is become unstable or crash as a direct result of that change in the power state. Call this the “rule zero” of managing power events.

Most applications don’t actually need special handling for these events. When the system suspends, the application state is preserved because RAM is still powered on. When the system hibernates, the RAM is saved to a special hibernation file on disk, which is used to restore the system state when it’s powered back on. You don’t need to add code to enable or take advantage of this core feature of the OS. There are two notable exceptions, however:

  • Applications that rely on physical hardware that isn’t guaranteed to be preserved across power events, such as CPU caches.
  • Scenarios where possible changes to the system context can affect program logic. For example, a location-based application can be moved hundreds of miles while it’s sleeping and would need to re-acquire its location. An application that works with sensitive data may choose to guard against theft by reprompting the user for his or her password.

Our Tutorial Password Manager actually falls into both categories. Certainly, if a laptop running our password manager is stolen, the thief would potentially have access to the victim’s passwords until they explicitly closed the application or locked the vault. The first category, though, may be less obvious: Intel SGX is a hardware feature that is not preserved across power events.

We can demonstrate this by running the Tutorial Password Manager, unlocking the vault, suspending the system, waking it back up, and then trying to read a password or edit one of the accounts. Follow those sequences, and you’ll get one of the error dialogs shown in Figure 1 or Figure 2.

Figure 1. Error received when attempting to edit an account after resuming from sleep.

Figure 2. Error received when attempting to view an account password after resuming from sleep.

As currently written, the Tutorial Password Manager violates rule zero: it becomes unstable after resuming from a sleep operation. The application needs special handling for power events.

Enclaves and Power Events

When a processor leaves S0 or S1 for a lower-power state, the enclave page cache (EPC) is destroyed: all EPC pages are erased along with their encryption keys. Since enclaves store their code and data in the EPC, when the EPC goes away the enclaves go with it. This means that enclaves do not survive power events that take the system to state S2 or lower.

Table 1 provides a summary of the power states.

Table 1. CPU power states

State

Description

S0

Active run state. The CPU is executing instructions, and background tasks are running even if the system appears idle and the display is powered off.

S1

Processor caches are flushed, CPU stops executing instructions. Power to CPU and RAM is maintained. Devices may or may not power off. This is a high-power standby state, sometimes called “power on suspend.”

S2

CPU is powered off. CPU context and contents of the system cache are lost.

S3

RAM is powered on to preserve its contents. A standby or sleep state.

S4

RAM is saved to nonvolatile storage in a hibernation file before powering off. When powered on, the hibernation file is read in to restore the system state. A hibernation state.

S5

“Soft off.” The system is off but some components are powered to allow a full system power-on via some external event, such as Wake-on-LAN, a system management component, or a connected device.

Power state S1 is not typically seen on modern systems, and state S2 is uncommon in general. Most CPUs go to power state S3 when put in “sleep” mode and drop to S4 when hibernating to disk.

The Windows* OS provides a mechanism for applications to subscribe to wakeup events, but that won’t help any ECALLs that are in progress when the power transition occurs (and, by extension, any OCALLs either since they are launched from inside of ECALLs). When the enclave is destroyed, the execution context for the ECALL is destroyed with it, any nested OCALLs and ECALLs are destroyed, and the outer-most ECALL immediately returns with a status of SGX_ERROR_ENCLAVE_LOST.

It is important to note that any OCALLs that are in progress are destroyed without warning, which means any changes they are making in unprotected memory will potentially be incomplete. Since unprotected memory is maintained or restored when resuming from the S3 and S4 power states, it is important that developers use reliable and robust procedures to prevent partial write corruptions. Applications must not end up in an indeterminate or invalid state when power resumes.

General Advice for Managing Power Transitions

Planning for power transitions begins before a sleep or hibernation event occurs. Decide how extensive the enclave recovery needs to be. Should the application be able to pick up exactly where it left off without user intervention? Will it resume interrupted tasks, restart them, or just abort? Will the user interface, if any, reflect the change in state? The answers to these questions will drive the rest of the application design. As a general rule, the more autonomous and seamless the recovery is, the more complex the program logic will need to be.

An application may also have different levels of recovery at different points. Some stages of an application may be easier to seamlessly recover from than others, and in some execution contexts it may not make sense or even be good security practice to attempt a seamless recovery at all.

Once the overall enclave recovery strategy has been identified, the process of preparing an enclave for a power event is as follows:

  1. Determine the minimal state information and data that needs to be saved in order to reconstruct the enclave.
  2. Periodically seal the state information and save it to unprotected memory (data sealing is discussed below). The sealed state data can be sent back to the main application as an [out] pointer parameter to an ECALL, or the ECALL can make an OCALL specifically to save state data.
  3. When an SGX_ERROR_ENCLAVE_LOST code is returned by an ECALL, explicitly destroy the enclave and then recreate it. It is strongly recommended that applications explicitly destroy the enclave with a call to sgx_enclave_destroy().
  4. Restore the enclave state using an ECALL that is designed to do so.

It is important to save the enclave state to untrusted memory before a power transition occurs. Even if the OS is able to send an event to an application when it is about to enter a standby mode, there are no guarantees that the application will have sufficient time to act before the system physically goes to sleep.

Data Sealing

When an enclave needs to preserve data across instantiations, either in preparation for a power event or between executions of the parent application, it needs to send that data out to untrusted memory. The problem with untrusted memory, however, is exactly that: it is untrusted. It is neither encrypted nor integrity checked, so any data sent outside the enclave in the clear is potentially leaking secrets. Furthermore, if that data were to be modified in untrusted memory, future instantiations of the enclave would not be able to detect that the modification occurred.

To address this problem, Intel SGX provides a capability called data sealing. When data is sealed, it is encrypted with advanced encryption standard (AES) in Galois/Counter Mode (GCM) using a 128-bit key that is derived from CPU-specific key material and some additional inputs, guided by one of two key policies. The use of AES-GCM provides both confidentiality of the data being sealed and integrity checking when the data is read back in and unsealed (decrypted).

As mentioned above, the key used in data sealing is derived from several inputs. The two key policies defined by data sealing determine what those inputs are:

  • MRSIGNER. The encryption key is derived from the CPU’s key material, the security version number (SVN), and the enclave signing key used by the developer. Data sealed using MRSIGNER can be unsealed by other enclaves on that same system that originate from the same software vendor (enclaves that share the same signing key). The use of an SVN allows enclaves to unseal data that was sealed by previous versions of an enclave, but prevents older enclaves from unsealing data from newer versions. It allows enclave developers to enforce software version upgrades.
  • MRENCLAVE. The encryption key is derived from the CPU’s key material and the enclave’s cryptographic signature. Data signed with the MRENCLAVE policy can only be unsealed by that exact enclave on that system.

Note that the CPU is a common component in the two key policies. Each processor has some random, hardware-based key material—physical circuitry on the processor—which is built into it as part of the manufacturing process. This ensures that data sealed by an enclave on one CPU cannot be unsealed by enclaves on another CPU. Each CPU will result in a different signing key, even if all other aspects of the signing policy (enclave measurement, enclave signing key, SVN) are the same.

The data sealing and unsealing API is really a set of convenience functions. They provide a high-level interface to the underlying AES-GCM encryption and 128-bit key derivation functions.

Once data has been sealed in the enclave, it can be sent out to untrusted memory and optionally written to disk.

Caveats

There is a caveat with data sealing, though, and it has significant security implications. Your enclave API needs to include an ECALL that will take sealed data as an input and then unseal it. However, Intel SGX does not authenticate the calling application, so you cannot assume that only your application is loading your enclave. This means that your enclave can be loaded and executed by anyone, even applications you didn’t write. As you might recall from Part 1, enclave applications are divided into two parts: the trusted part, which is made up of the enclaves, and the untrusted part, which is the rest of the application. These terms, “trusted” and “untrusted,” are chosen deliberately.

Intel SGX cannot authenticate the calling application because this would require a trusted execution chain that runs from system power-on all the way through boot, the OS load, and launching the application. This is far outside the scope of Intel SGX, which limits the trusted execution environment to just the enclaves themselves. Because there’s no way for the enclave to validate the caller, each enclave must be written defensibly. Your enclave cannot make any assumptions about the application that has called into it. An enclave must be written under the assumption that any application can load it and execute its API, and that its ECALLs can be executed in any order.

Normally this is not a significant constraint, but sealing and unsealing data complicates matters significantly because both the sealed data and the means to unseal it are exposed to arbitrary applications. The enclave API must not allow applications to use sealed data to bypass security mechanisms.

Take the following scenario as an example: A file encryption program wants to save end users the hassle of re-entering their password every time the application runs, so it seals their password using the data sealing functions and the MRENCLAVE policy, and then writes the sealed data to disk. When the application starts, it looks for the sealed data file, and if it’s present, reads it in and makes an ECALL to unseal the data and restore the user’s password into the enclave.

The problems with this hypothetical application are two-fold:

  • It assumes that it is the only application that will ever load the enclave.
  • It doesn’t authenticate the end user when the data is unsealed.

A malicious software developer can write their own application that loads the same enclave and follows the same procedure (looks for the sealed data file, and invokes the ECALL to unseal it inside the enclave). While the malicious application can’t expose the user’s password, it can use the enclave’s ECALLs to encrypt and decrypt the user’s files using their stored password, which is nearly as bad. The malicious user has gained the ability to decrypt files without having to know the user’s password at all!

A non-Intel SGX version of this same application that offered this same convenience feature would also be vulnerable, but that’s not the point. If the goal is to use Intel SGX features to harden the application’s security, those same features should not be undermined by poor programming practices!

Managing Power Transitions in the Tutorial Password Manager

Now that we understand how power events affect enclaves and know what tools are available to assist with the recovery process, we can turn our attention to the Tutorial Password Manager. As currently written, it has two problems:

  • It becomes unstable after a power event.
  • It assumes the password vault should remain unlocked after the system resumes.

Before we can solve the first problem we need to address the second one, and that means making some design decisions.

Sleep and Resume Behavior

The big decision that needs to be made for the Tutorial Password Manager is whether or not to lock the password vault when the system resumes from a sleep state.

The primary argument for locking the password vault after a sleep/resume cycle is to protect the password database in case the physical system is stolen while it’s suspended. This would prevent the thief from being able to access the password database after waking up the device. However, having the system lock the password vault immediately can also be a user interface friction: sometimes, aggressive power management settings cause a running system to sleep while the user is still in front of the device. If the user wakes the system back up immediately, they might be irritated to find that their password vault has been locked.

This issue really comes down to balancing user convenience against security, so the right approach is to give the user control over the application’s behavior. The default will be for the password vault to lock immediately upon suspend/resume, but the user can configure the application to wait up to 10 minutes after the sleep event before the vault is forcibly locked.

Intel® Software Guard Extensions and Non-Intel Software Guard Extensions Code Paths

Interestingly, the default behavior of the Intel SGX code path differs from that of the non-Intel SGX code path. Enclaves are destroyed during the sleep/resume cycle, which means that we effectively lock the password vault as a result. To give the user the illusion that the password vault never locked at all, we have to not only reload the vault file from disk, but also explicitly unlock it again without forcing the user to re-enter their password (this has some security implications, which we discuss below).

For the non-Intel SGX code path, the vault is just stored in regular memory. When the system resumes, system memory is unchanged and the application continues as normal. Thus, the default behavior is that an unlocked password vault remains unlocked when the system resumes.

Application Design

With the behavior of the application decided, we turn to the application design. Both code paths need to handle the sleep/resume cycle and place the vault in the correct state: locked or unlocked.

The Non-Intel Software Guard Extensions Code Path

This is the simpler of the two code paths. As mentioned above, the non-Intel SGX code path will, by default, leave the password vault unlocked if it was unlocked when the system went to sleep. When the system resumes it only needs to see how long it slept: if the sleep time exceeds the maximum configured by the user, the password vault should be explicitly locked.

To keep track of the sleep duration, we’ll need a periodic heartbeat that records the current time. This time will serve as the “sleep start” time when the system resumes. For security, the heartbeat time will be encrypted using the database key.

The Intel Software Guard Extensions Code Path

No matter how the application is configured, the system will need code to recreate the enclave and reopen the password vault. This will put the vault in the locked state.

The application will then need to see how long it has been sleeping. If the sleep time was less than the maximum configured by the user, the password vault needs to be explicitly unlocked without prompting the user for his or her master passphrase. In order to do that the application needs the passphrase, and that means the passphrase must be saved to untrusted memory so that it can be read back in when the system is restored.

The only safe way to save a secret to untrusted memory is to use data sealing, but this presents a significant security issue: As mentioned previously, our enclave can be loaded by any application, and the same ECALL that is used to unseal the master password will be available for anyone to use. Our password manager application exposes secrets to the end user (their passwords), and the master password is the only means of authenticating the user. The point of keeping the password vault unlocked after the sleep/resume cycle is to prevent the user from having to authenticate. That means we are creating a logic flow where a malicious user could potentially use our enclave’s API to unseal the user’s master password and then extract their account and password data.

In order to mitigate this risk, we’ll do the following:

  • Data will be sealed using the MRENCLAVE policy.
  • Sealed data will be kept in memory only. Writing it to disk would increase the attack surface.
  • In addition to sealing the password, we’ll also include the process ID. The enclave will require that the process ID of the calling process match the one that was saved when unsealing the data. If they don’t match, the vault will be left in the locked state.
  • The current system time will be sealed periodically using a heartbeat function. This will serve as the “sleep start” time.
  • The sleep duration will be checked in the enclave.

Note that verification logic must be in the enclave where it cannot be modified or manipulated.

This is not a perfect solution, but it helps. A malicious application would need to scrape the sealed data from memory, crash the user’s existing process, and then create new processes over and over until it gets one with the same process ID. It will have to do all of this before the lock timeout is reached (or take control of the system clock).

Common Needs

Both code paths will need some common infrastructure:

  • A timer to provide the heartbeat. We’ll use a timer interval of 15 seconds.
  • An event handler that is called when the system resumes from a sleep state.
  • Safe handling for any potential race conditions, since wakeup events are asynchronous.
  • Code that updates the UI to reflect the “locked” state of the password vault

Implementation

We won’t go over every change in the code base, but we’ll look at the major components and how they work.

User Options

The lock timeout value is set in the new Tools -> Options configuration dialog, shown in Figure 3.

Figure 3. Configuration options.

This parameter is saved immediately to the Windows registry under HKEY_LOCAL_USER and is loaded by the application on startup. If the registry value is not present, the lock timeout defaults to zero (lock the vault immediately after going to sleep).

The Intel SGX code path also saves this value in the enclave.

The Heartbeat

Figure 4 shows the declaration for the Heartbeat class which is ultimately responsible for recording the vault’s state information. The heartbeat is only run if state information is needed, however. If the user has set the lock timeout to zero, we don’t need to maintain state because we know to lock the vault immediately when the system resumes.

class PASSWORDMANAGERCORE_API Heartbeat {
	class PasswordManagerCoreNative *nmgr;
	HANDLE timer;
	void start_timer();
public:
	Heartbeat();
	~Heartbeat();
	void set_manager(PasswordManagerCoreNative *nmgr_in);
	void heartbeat();

	void start();
	void stop();
};

Figure 4. The Heartbeat class.

The PasswordManagerCoreNative class gains a Heartbeat object as a class member, and the Heartbeat object is initialized with a reference back to the containing PasswordManagerCoreNative object.

The Heartbeat class obtains a timer from CreateTimerQueueTimer and executes the callback function heartbeat_proc when the timer expires, as shown in Figure 5. The timer is sent a reference to the Heartbeat object, which in turn calls the heartbeat method in the Heartbeat class, which in turn calls the heartbeat method in PasswordManagerCoreNative and restarts the timer.

static void CALLBACK heartbeat_proc(PVOID param, BOOLEAN fired)
{
   // Call the heartbeat method in the Heartbeat object
	Heartbeat *hb = (Heartbeat *)param;
	hb->heartbeat();
}

Heartbeat::Heartbeat()
{
	timer = NULL;
}

Heartbeat::~Heartbeat()
{
	if (timer == NULL) DeleteTimerQueueTimer(NULL, &timer, NULL);
}

void Heartbeat::set_manager(PasswordManagerCoreNative *nmgr_in)
{
	nmgr = nmgr_in;

}

void Heartbeat::heartbeat ()
{
	// Call the heartbeat method in the native password manager
	// object. Restart the timer unless there was an error.

	if (nmgr->heartbeat()) start_timer();
}

void Heartbeat::start()
{
	stop();

	// Perform our first heartbeat right away.

	if (nmgr->heartbeat()) start_timer();
}

void Heartbeat::start_timer()
{
	// Set our heartbeat timer. Use the default Timer Queue

	CreateTimerQueueTimer(&timer, NULL, (WAITORTIMERCALLBACK)heartbeat_proc,
		(void *)this, HEARTBEAT_INTERVAL_SECS * 1000, 0, 0);
}

void Heartbeat::stop()
{
	// Stop the timer (if it exists)

	if (timer != NULL) {
		DeleteTimerQueueTimer(NULL, timer, NULL);
		timer = NULL;
	}
}

Figure 5. The Heartbeat class methods and timer callback function.

The heartbeat method in the PasswordManagerCoreNative object maintains the state information. To prevent partial write corruption, it has a two-element array of state data and an index pointer to the current index (0 or 1). The new state information is obtained from:

  • The new ECALL ve_heartbeat in the Intel SGX code path (by way of ew_heartbeat in EnclaveBridge.cpp).
  • The Vault method heartbeat in the non-Intel SGX code path.

After the new state has been received, it updates the next element (alternating between elements 0 and 1) of the array, and then updates the index pointer. The last operation is our atomic update, ensuring that the state information is complete before we officially mark it as the “current” state.

Intel Software Guard Extensions code path

The ve_heartbeat ECALL simply calls the heartbeat method in the E_Vault object, as shown in Figure 6.

int E_Vault::heartbeat(char *state_data, uint32_t sz)
{
	sgx_status_t status;
	vault_state_t vault_state;
	uint64_t ts;
	
	// Copy the db key

	memcpy(vault_state.db_key, db_key, 16);

	// To get the system time and PID we need to make an OCALL

	status = ve_o_process_info(&ts, &vault_state.pid);
	if (status != SGX_SUCCESS) return NL_STATUS_SGXERROR;

	vault_state.lastheartbeat = (sgx_time_t)ts;

	// Storing both the start and end times provides some 
	// protection against clock manipulation. It's not perfect,
	// but it's better than nothing.

	vault_state.lockafter = vault_state.lastheartbeat + lock_delay;
	
	// Saves us an ECALL to have to reset this when the vault is restored.

	vault_state.lock_delay = lock_delay;

	// Seal our data with the MRENCLAVE policy. We defined our
	// struct as packed to support working on the address
	// directly like this.	

	status = sgx_seal_data(0, NULL, sizeof(vault_state_t), (uint8_t *)&vault_state, sz, (sgx_sealed_data_t *) state_data);
	if (status != SGX_SUCCESS) return NL_STATUS_SGXERROR;
	
	return NL_STATUS_OK;
}

Figure 6. The heartbeat in the enclave.

It has to obtain the current system time and the process ID, and to do this we have added our first OCALL to the enclave, ve_o_process_info. When the OCALL returns, we update our state information and then call sgx_seal_data to seal it into the state_data buffer.

One restriction of the Intel SGX seal and unseal functions is that they can only operate on enclave memory. That means the state_data parameter must be a marshaled data buffer when used in this manner. If you need to write sealed data to a raw pointer that references untrusted memory (one that is passed with the user_check parameter), you must first seal the data to an enclave-local data buffer and then copy it over.

The OCALL is defined in EnclaveBridge.cpp:

// OCALL to retrieve the current process ID and
// local system time.

void SGX_CDECL ve_o_process_info(uint64_t *ts, uint64_t *pid)
{
	DWORD dwpid= GetCurrentProcessId();
	time_t ltime;

	time(&ltime);

	*ts = (uint64_t)ltime;
	*pid = (uint64_t)dwpid;
}

Because the heartbeat runs asynchronously, two threads can enter the enclave at the same time. This means the number of Thread Control Structures (TCSs) allocated to the enclave must be increased from the default of 1 to 2. This can be done one of two ways:

  1. Right-click the Enclave project, select Intel SGX Configuration -> Enclave Settings to bring up the configuration window, and then set Thread Number to 2 (see Figure 7).
  2. Edit the Enclave.config.xml file in the Enclave project directly, and then change the <TCSNum> parameter to 2.

Figure 7. Enclave settings dialog.

Detecting Suspend and Resume Events

A suspend and resume cycle will destroy the enclave, and that will be detected by the next ECALL. However, we shouldn’t rely on this mechanism to perform enclave recovery, because we need to act as soon as the system wakes up from the sleep state. That means we need an event listener to receive the power state change messages that are generated by Windows.

The best place to capture these is in the user interface layer. In addition to performing the enclave recovery, we must be able to lock the password vault if the system was in the sleep state longer than maximum sleep time set in the user options. When the vault is locked, the user interface also needs to be updated to reflect the new vault state.

One limitation of the Windows Presentation Foundation* is that it does not provide event hooks for power-related messages. The workaround is to hook in to the message handler for the underlying window handle. Our main application window and all of our dialog windows need a listener so that we can gracefully close each one.

The hook procedure for the main window is shown in Figure 8.

private IntPtr Main_Power_Hook(IntPtr hwnd, int msg, IntPtr wParam, IntPtr lParam, ref bool handled)
{
    UInt16 pmsg;

    // C# doesn't have definitions for power messages, so we'll get them via C++/CLI. It returns a
    // simple UInt16 that defines only the things we care about.
    pmsg= PowerManagement.message(msg, wParam, lParam);

    if ( pmsg == PowerManagementMessage.Suspend )
    {
        mgr.suspend();
        handled = true;
    } else if (pmsg == PowerManagementMessage.Resume)
    {
        int vstate = mgr.resume();

        if (vstate == ResumeVaultState.Locked) lockVault();
        handled = true;
    }

    return IntPtr.Zero;
}

Figure 8. Message hook for the main window.

To get at the messages, the handler must dip down to native code. This is done using the new PowerManagement class, which defines a static function called message, shown in Figure 9. It returns one of four values:

PWR_MSG_NONE

The message was not a power event.

PWR_MSG_OTHER

The message was power-related, but not a suspend or resume message.

PWR_MSG_RESUME

The system has woken up from a low-power or sleep state.

PWR_MSG_SUSPEND

The system is suspending to a low-power state.

UINT16 PowerManagement::message(int msg, IntPtr wParam, IntPtr lParam)
{
	INT32 subcode;

	// We only care about power-related messages

	if (msg != WM_POWERBROADCAST) return PWR_MSG_NONE;

	subcode = wParam.ToInt32();

	if ( subcode == PBT_APMRESUMEAUTOMATIC ) return PWR_MSG_RESUME;
	else if (subcode == PBT_APMSUSPEND ) return PWR_MSG_SUSPEND;

	// Don't care about other power events.

	return PWR_MSG_OTHER;
}

Figure 9. The message listener.

We actually listen for both suspend and resume messages here, but the suspend handler does very little work. When a system is transitioning to a sleep state, an application has less than 2 seconds to act on the power message. All we do with the sleep message is stop the heartbeat. This isn’t strictly necessary, and is just a precaution against having a heartbeat execute while the system is suspending.

The resume message is handled by calling the resume method in PasswordManagerCore. It’s job is to figure out whether the vault should be locked or unlocked. It does this by checking the current system time against the saved vault state (if any). If there’s no state, or if the system has slept longer than the maximum allowed, it returns ResumeVaultState.Locked.

Restoring the Enclave

In the Intel SGX code path, the enclave has to be recreated before the enclave state information can be checked. The code for this is shown in Figure 10.

bool PasswordManagerCore::restore_vault(bool flag_async)
{
	bool got_lock= false;	
	int rv;
	
	// Only let one thread do the restore if both come in at the 
	// same time. A spinlock approach is inefficient but simple.
	// This is OK for our application, but a high-performance
	// application (or one with a long-running work loop)
	// would want something else.

	try {
		slock.Enter(got_lock);

		if (_nlink->supports_sgx()) {
			bool do_restore = true;

			// This part is only needed for enclave-based vaults.

			if (flag_async) {
				// If we are entering as a result of a power event,
				// make sure the vault has not already been restored
				// by the synchronous/UI thread (ie, a failed ECALL).

				rv = _nlink->ping_vault();
				if (rv != NL_STATUS_LOST_ENCLAVE) do_restore = false;
				// If do_store is false, then we'll also use the
				// last value of rv_restore as our return value.
				// This will tell us whether or not we should lock the
				// vault.
			}

			if (do_restore) {
				// If the vaultfile isn't open then we are locked or hadn't
				// been opened to be begin with.

				if (!vaultfile->is_open()) {
					// Have we opened a vault yet?
					if (vaultfile->get_vault_path()->Length == 0) goto restore_error;

					// We were explicitly locked, so reopen.
					rv = vaultfile->open_read(vaultfile->get_vault_path());
					if (rv != NL_STATUS_OK) goto restore_error;
				}

				// Reinitialize the vault from the header.

				rv = _vault_reinitialize();
				if (rv != NL_STATUS_OK) goto restore_error;

				// Now, call to the native object to restore the vault state.
				rv = _nlink->restore_vault_state();
				if (rv != NL_STATUS_OK) goto restore_error;

				// The database password was restored to the vault. Now restore
				// the vault, itself.

				rv = send_vault_data();
			restore_error:
				restore_rv = (rv == NL_STATUS_OK);
			}
		}
		else {
			rv = _nlink->check_vault_state();
			restore_rv = (rv == NL_STATUS_OK);
		}

		slock.Exit(false);
	}
	catch (...) {
		// We don't need to do anything here.		
	}

	return restore_rv;
}

Figure 10. The restore_vault() method.

The enclave and vault are reinitialized from the vault data file, and the vault state is restored using the method restore_vault_state in PasswordManagerCoreNative.

Which Thread Restores the Vault State?

The Tutorial Password Manager can have up to three threads executing at any given time. They are:

  • The main UI
  • The heartbeat
  • The power event handler

Only one of these threads should be responsible for actually restoring the enclave, but it is possible that both the heartbeat and the main UI thread are in the middle of an ECALL when a power event occurs. In that case, both ECALLs will fail with the error code SGX_ERR_ENCLAVE_LOST while the power event handler is executing. Given this potential race condition, it’s necessary to decide which thread is given the job of enclave recovery.

If the lock timeout is set to zero, there won’t be a heartbeat thread at all, so it doesn’t make sense to put enclave recovery logic there. If the heartbeat ECALL returns SGX_ERR_ENCLAVE_LOST, it simply stops the heartbeat and assumes other threads will be dealing with it.

That leaves the UI thread and the power event handler, and a good argument can be made that both threads need the ability to recover an enclave. The event handler will catch all suspend/resume cycles immediately, so it make sense to have enclave recovery happen there. However, as we pointed out earlier it is entirely possible for a power event to occur during an active ECALL on the UI thread, and there’s no reason to prevent that thread from starting the recovery, especially since it might occur before the power event message is received. This not only provides a safety net in case the event handler fails to execute for some reason, but it also provides a quick and easy retry loop for the operation.

Since we can’t have both of these threads run the recovery at the same time, we need to use locking to ensure that only the first thread to arrive is given the job. The second one simply waits for the first to finish.

It’s also possible that a failed ECALL will complete the recovery process before the event handler enters the recovery loop. To prevent the event handler from blindly repeating the enclave recovery procedure, we have added a quick test to make sure the enclave hasn’t already been recreated.

Detection in the UI Thread

The UI thread detects power events by looking for ECALLs that fail with SGX_ERR_LOST_ENCLAVE. The wrapper functions in EnclaveBridge.cpp automatically relaunch the enclave and pass the error NL_STATUS_ENCLAVE_RECREATED back up to the PasswordManagerCore object.

Each method in PasswordManagerCore handles this return code uniquely. Some methods, such as initialize, initialize_from_header, and lock_vault don’t actually have to restore state at all, but most of the others do and they call in to restore_vault as show in Figure 11.

int PasswordManagerCore::accounts_password_to_clipboard(UInt32 idx)
{
	UINT32 index = idx;
	int rv;
	int tries = 3;

	while (tries--) {
		rv = _nlink->accounts_password_to_clipboard(index);
		if (rv == NL_STATUS_RECREATED_ENCLAVE) {
			if (!restore_vault()) {
				rv = NL_STATUS_LOST_ENCLAVE;
				tries = 0;
			}
		}
		else break;
	}

	return rv;
}

Figure 11. Detecting a power event on the main UI thread.

Here, the method gets three attempts to restore the vault before giving up. This retry count of three is an arbitrary limit: it’s not likely that we’ll have multiple power events in rapid succession but it’s possible. Though we don’t want to just give up after one attempt, we also don’t want to loop forever in case there’s a system issue that prevents the enclave from ever being recreated.

Restoring and Checking State

The last step is to examine the state data for the vault and determine whether the vault should be locked or unlocked. In the Intel SGX code path, the sealed state data is sent into the enclave where it is unsealed, and then compared to current system data obtained from the OCALL ve_o_process_info. This method, restore_state, is shown in Figure 12.

int E_Vault::restore_state(char *state_data, uint32_t sz)
{
	sgx_status_t status;
	vault_state_t vault_state;
	uint64_t now, thispid;
	uint32_t szout = sz;

	// First, make an OCALL to get the current process ID and system time.
	// Make these OCALLs so that the parameters aren't be supplied by the
	// ECALL (which would make it trivial for the calling process to fake 
	// this information)

	status = ve_o_process_info(&now, &thispid);
	if (status != SGX_SUCCESS) {
		// Zap the state data.
		memset_s(state_data, sz, 0, sz);
		return NL_STATUS_SGXERROR;
	}

	status = sgx_unseal_data((sgx_sealed_data_t *)state_data, NULL, 0, (uint8_t *)&vault_state, &szout);
	// Zap the state data.
	memset_s(state_data, sz, 0, sz);

	if (status != SGX_SUCCESS) return NL_STATUS_SGXERROR;

	if (thispid != vault_state.pid) return NL_STATUS_PERM;
	if (now < vault_state.lastheartbeat) return NL_STATUS_PERM;
	if (now > vault_state.lockafter) return NL_STATUS_PERM;

	// Everything checks out. Restore the key and mark the vault as unlocked.

	lock_delay = vault_state.lock_delay;

	memcpy(db_key, vault_state.db_key, 16);
	_VST_CLEAR(_VST_LOCKED);

	return NL_STATUS_OK;
}

Figure 12. Restoring state in the enclave.

Note that unsealing data is programmatically simpler than sealing it: the key derivation and policy information is embedded in the sealed data blob. Unlike data sealing there is only one unseal function, sgx_unseal_data, and it takes fewer parameters than its counterpart.

This method returns NL_STATUS_OK if the vault is restored to the unlocked state, and NL_STATUS_PERM if it is restored to the locked state.

Lingering Issues

The Tutorial Password Manager as currently implemented still has issues that need to be addressed.

  • There is still a race condition in the enclave recovery logic. Because the ECALL wrappers in EnclaveBridge.cpp immediately recreate the enclave before returning an error code to the PasswordManagerCore layer, it is possible for the power event handler thread to enter the restore_vault method after the enclave has been recreated but before the enclave recovery has completed. This can cause the power event handler to return the wrong status to the UI layer, placing the UI in the “locked” or “unlocked” state incorrectly.
  • We depend on the system clock when validating our state data, but the system clock is actually untrusted. A malicious user can manipulate the time in order to force the password vault into an unlocked state when the system wakes up (this can be addressed by using trusted time, instead).

Summary

In order to prevent cold boot attacks and other attacks against memory images in RAM, Intel SGX destroys the Enclave Page Cache whenever the system enters a low-power state. However, this added security comes at a price: software complexity that can’t be avoided. All real-world Intel SGX applications need to plan for power events and incorporate enclave recovery logic because failing to do so will lead to runtime errors during the application’s execution.

Power event planning can rapidly escalate the application’s level of sophistication. The user experience needs of the Tutorial Password Manager took us from a single-threaded application with relatively simple constructs to one with multiple, asynchronous threads, locking, and atomic memory updates via simple journaling. As a general rule, seamless enclave recovery requires careful design and a significant amount of added program logic.

Sample Code

The code sample for this part of the series builds against the Intel SGX SDK version 1.7 using Microsoft Visual Studio* 2015.

Release Notes

  • Running a mixed-mode Intel SGX application under the debugger in Visual Studio will cause an exception to be thrown if a power event is triggered. The exception occurs when an ECALL detects the lost enclave and returns SGX_ERROR_LOST_ENCLAVE.
  • The non-Intel SGX code path was updated to use Microsoft’s DPAPI to store the database encryption key. This is a better solution than the in-memory XOR’ing.

Coming Up Next

In Part 10 of the series, we’ll discuss debugging mixed-mode Intel SGX applications with Visual Studio. Stay tuned!

For more complete information about compiler optimizations, see our Optimization Notice.
There are downloads available under the Intel® Software Export Warning license. Download Now