Find Your Leaked Persistent Memory Objects Using the Persistent Memory Development Kit (PMDK)

Download Sample Code

Introduction

Bugs in programs that dynamically allocate and deallocate objects have the potential to leak memory. Persistent memory (PMEM) code is no different. While in volatile programs memory leaks are mainly thought of as a problem related to resource exhaustion and performance degradation, in PMEM there is the need to also think about data corruption and loss. Fortunately, the C/C++ libpmemobj library from the Persistent Memory Development Kit (PMDK) has a useful feature to test for (as well as recover from) PMEM leaks. In this article, I start by giving a brief introduction to the problem of memory leaks. This is followed by the presentation of a simple C++ code with an obvious memory leak bug. Finally, I show how to discover (as well as recover from) the bug using the available feature in libpmemobj.

What Is a Memory Leak?

In computing systems, complex applications allocate the memory they need dynamically. This is because applications rarely know the exact amount of memory they are going to need during the entirety of their computation. As an example, consider a web server. The amount of memory the server needs at a particular moment directly depends on the number of concurrent clients it is serving. If we are running multiple applications on the same system, we want those applications to—at some point—free not-needed memory, in order to make room for the needs of other applications.

Leaks occur when memory that the operating system (OS) allocated to an application is no longer reachable from the application itself, usually as a result of programming errors (for example, losing the last reference to a memory region on the heap). Since the OS is not allowed to free this memory until the application finishes, the result is that memory leaks tend to accumulate over time and exhaust resources, ultimately degrading system performance or even causing application crashes.

Traditionally, this problem is tackled either automatically using a garbage collector (GC) (which can be part of the programming language itself like in Java* or added as an external library), or manually using a design pattern with support for reference counting (for example, the std::shared_ptr class in C++). In the former, the GC scans the program memory to find unreachable objects. The memory for those objects is then freed. In the case of the latter, all pointer objects pointing to the same underlying object share a reference counting. It is then the responsibility of the last pointer object left to free the memory. Realize that, apart from avoiding leaks, this technique also prevents us from double-freeing objects or accessing already freed ones.

The main libraries at the core of PMDK are implemented in C, which does not support garbage collection natively. In addition, none of the libraries included in PMDK has a custom GC. It should be said, though, that there is a Java API for libpmemobj (implemented as persistent collections) where the Java garbage collection works for PMEM code as it would for any other program. This API is out of the scope of this article, although I plan to write more about it in the future (stay tuned).

The option of persistent reference counting is not available either. Consider that we can have pointer objects in volatile memory pointing to persistent objects and hence increase their reference counting. If power were to go down in this scenario, reference counting would not make sense on the next boot. Fortunately, the C/C++ libpmemobj library from the PMDK has a useful feature to test for (as well as recover from) PMEM leaks.

A Persistent Memory Leaking Program

The leaky code

The following code sample (which can be downloaded from GitHub* by clicking the button at the top of the article) uses the C++ bindings of the libpmemobj library (part of PMDK), making an already user-friendly C library (which supports high-level abstractions such as objects and transactions) even friendlier. Nevertheless, in order to be able to access the built-in feature that allows us to recover from PMEM leaks (more on that later), we still need to use part of the C API.

Another library used here is the C++ boost::filesystem library from the Boost C++ library collection, basically to handle input/output in a non OS-specific way.

The following listing corresponds to the file leaker.cpp:

#include "common.hpp"

using namespace std;
using namespace pmem;
using namespace pmem::obj;
namespace fs = boost::filesystem;
/* globals */
pool<root> pop;
/* main */
int
main (int argc, char *argv[])
{
        /* reading params */
        if (argc < 2) {
                cout << "USE " << string (argv[0]) << " pmem-file ";
                cout << endl << flush;
                return 1;
        }
        /* Opening pmem-file */
        if (fs::exists (argv[1])) { /* file exists, deleting it... */
                cout << "pmem-file '" << string (argv[1]) << "' exists. ";
                cout << "Do you want to overwrite it? (Y/n) ";

                string input;
                getline (cin, input);
                if (input == "n") {
                        cout << "bye bye" << endl << flush;
                        return 0;
                }
                fs::remove (argv[1]);
        }
        pop = pool<root>::create (argv[1], "PMEMLEAK", POOLSIZE, S_IRWXU);
        auto proot = pop.get_root ();
        /* initialization... */
        transaction::exec_tx (pop, [&] {
                proot->num_employees = 0;
                proot->employees = nullptr;
        });
        /* creating some objects, leaking even IDs */
        persistent_ptr<employee> new_ptr;
        for (size_t i = 64; i > 0; i--) {
                transaction::exec_tx (pop, [&] {
                        new_ptr = make_persistent<employee> ();

                        new_ptr->id = i - 1;
                        pmemobj_tx_add_range_direct (new_ptr->name, SSIZE);
                        pmemobj_tx_add_range_direct (new_ptr->department, SSIZE);
                        strcpy (new_ptr->name, "Test Name");
                        strcpy (new_ptr->department, "Fake Department");

                        if ((i - 1) % 2 == 1
                            || (i - 1) == 0) { /* only linking odd IDs */
                                new_ptr->next = proot->employees;
                                proot->employees = new_ptr;
                        }
                        proot->num_employees = proot->num_employees + 1;
                });
        }

        return 0;
}

The first thing this program does is to create a memory pool on the file passed as a parameter with pool<root>::create(). If the file already exists, the user is asked if he/she wants it to be overwritten. If not, the program will not continue.

After the pool is created, the program goes on to allocate a bunch of employee objects and store them on the root data structure.

The PMEM data structures used are defined in a header file called common.hpp:

#include <boost/filesystem.hpp>
#include <iostream>
#include <libpmemobj++/make_persistent.hpp>
#include <libpmemobj++/p.hpp>
#include <libpmemobj++/persistent_ptr.hpp>
#include <libpmemobj++/transaction.hpp>

#define POOLSIZE ((size_t) (1024 * 1024 * 64)) /* 64 MB */

/* PMEM data structures */
#define SSIZE 256
struct employee {
        pmem::obj::persistent_ptr<employee> next;
        pmem::obj::p<size_t> id;
        char name[SSIZE];
        char department[SSIZE];
};
struct root {
        pmem::obj::p<size_t> num_employees;
        pmem::obj::persistent_ptr<employee> employees;
};

Nothing too fancy here as you can see. We have the definition of the root object, which is always the first object on a PMEM pool (serves as the main anchor to link all of the other objects created in the program), and the definition of the employee object. The root object holds a linked list of employees (and its size). The C++ classes pmem::obj::p<> (for basic types) and pmem::obj:: persistent_ptr<> (for pointers to complex types) are used to tell the library to pay attention to those memory regions during transactions (changes done to those objects are logged and rolled back in the event of a failure in the middle of a transaction). You may have noticed that neither employee.name nor employee.department are declared using these special C++ classes. In this case, the reason is purely didactical (I just wanted to show how you can add memory regions to transactions explicitly as well, using the C API).

Now, let’s go back to the main code leaker.cpp. The for() loop at the end iterates 64 times creating an object each time. Here we can see how the memory regions for the strings name and department are added explicitly to the transaction with pmemobj_tx_add_range_direct(). The memory leak is introduced after the strings are set, by only linking objects to proot->employees with odd IDs.

How to recover

What we have after running the leaker program is a corrupted PMEM data structure. The persistent variable num_employees tells us that we have 64 entries in our list, when the reality is that we only have 33. Worst still, the other 31 entries are lost (for now) and occupying precious space on our PMEM pool.

Fortunately, libpmemobj keeps track of all the allocated objects (of any type) in a special internal list, and this list can be consulted (and manipulated) using the C API of the library.

The following code snippet corresponds to the file fixer.cpp:

...
/* recovering missing IDs */
PMEMoid raw_root = proot->employees.raw ();
PMEMoid raw_obj;
persistent_ptr<employee> emp;
size_t recovered = 0;
/* iterating over all allocated objects in the pool... */
POBJ_FOREACH (pop.get_handle (), raw_obj)
        {
                /* checking if object is of type 'employee' */
                if (pmemobj_type_num (raw_obj)
                      == pmemobj_type_num (raw_root)) {
                /* transforming to persistent_ptr (C to C++) */
                emp = persistent_ptr<employee> (raw_obj);
                bool found = false;
                for (vector<size_t>::iterator it
                     = missing_ids.begin ();
                     it != missing_ids.end (); ++it) {
                             if (emp->id == *it) {
                                     found = true;
                                     break;
                             }
                     }
                     if (found == true) { /* recovering object */
                             transaction::exec_tx(pop, [&] {        
                                     emp->next = proot->employees;
                                     proot->employees = emp;
                             });
                             recovered += 1;
                     }
                }
        }
cout << recovered << " objects recovered." << endl << flush;
...

The first thing we do is to get the C pointer object (of type PMEMoid) for the head of our employee list. This pointer (raw_root) is used to check object type when iterating through the internal list (although in this case we only have objects of type employee).

The list is iterated using the macro POBJ_FOREACH() (internally, this macro is composed of two functions: pmemobj_first() and pmemobj_next()). We need to pass to the macro the C handle for the pool—which we can get from the C++ object by calling pop.get_handle()—and a PMEMoid object (raw_obj) to point to the current position as we iterate the list. In the body of the loop, we perform the following steps:

  1. We check that the current object is of the type we are looking for by comparing its type to that of the head of our employee list. For this, we use the C function pmemobj_type_num().
  2. We transform the PMEM C pointer to the current object back to C++. For this, we create a new object of type persistent_ptr<employee>, passing the PMEMoid object to the constructor.
  3. Finally, we iterate the list of missing IDs to check whether the current object is missing. If it is, we just insert it back to the list.

Constructing the list of missing IDs for this case is straightforward so I will not put the code here (but check the code at the repository if you want to know how).

Let's run it!

Now we can run it to see if it really does the job. First, we call the leaker and construct the corrupted data structure (this assumes that a PMEM device—real or emulated using RAM—is mounted at /mnt/mem):

$ ./leaker /mnt/mem/leaked_objects.pool

After this, let’s run the checker (included in the code; it is basically the fixer but just outputting the list of missing IDs without fixing anything) to see what is going on:

$ ./checker /mnt/mem/leaked_objects.pool
There are 31 missing IDs. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62,
$

This is exactly what we were expecting. The leaker program didn't link any employees with even IDs to the main data structure. It is time to run the fixer:

$ ./fixer /mnt/mem/leaked_objects.pool
There are 31 missing IDs.
Do you want to fix missing objects? (Y/n) Y
31 objects recovered.
$

Awesome. Now we can run either the checker or the fixer (or both) to make sure everything is in order:

$ ./checker /mnt/mem/leaked_objects.pool
There are 0 missing IDs.
$ ./fixer /mnt/mem/leaked_objects.pool
There are 0 missing IDs.
$

As a final thought realize that, if everything you need for your PMEM data structure is a simple linked list, you do not need to implement one at all. Every time you allocate/delete an object, you are adding/removing it to/from this internal list anyway (so no need to implement the add/delete logic). In addition, this data structure—which is used internally by the library itself—has been implemented with performance considerations in mind (since the library depends on it), and hence it may very well be the best option in terms of performance.

The full API for this list is the following:

  • POBJ_FIRST(pop, t): Get first element of type t from pool pop
  • POBJ_NEXT(o): Get next element after object o (and of the same type as o)
  • POBJ_FOREACH(pop, varoid): Iterates over all objects in the pool pop using varoid to access each object
  • POBJ_FOREACH_SAFE(pop, varoid, nvaroid): Safe variant of POBJ_FOREACH() in which pmemobj_free() on varoid is allowed
  • POBJ_FOREACH_TYPE(pop, var): Iterates over all objects in the pool of a specific type (the type of variable var)
  • POBJ_FOREACH_SAFE_TYPE(pop, var, nvar): Safe variant of POBJ_FOREACH_TYPE() in which pmemobj_free() on var is allowed

Summary

In this article, I gave a brief introduction to the problem of memory leaks. I then presented a simple C++ PMEM code with an obvious leaking bug, and showed how you can use the internal list of allocated objects in libpmemobj to recover leaked objects (through the C API of the library). At the end of the article, I discussed how this internal list can be used as a main data structure too (if the only structure needed is a simple linked list).

About the Author

Eduardo Berrocal joined Intel as a Cloud Software Engineer in July 2017 after receiving his PhD in Computer Science from the Illinois Institute of Technology (IIT) in Chicago, Illinois. His doctoral research interests were focused on (but not limited to) data analytics and fault tolerance for high-performance computing. In the past he worked as a summer intern at Bell Labs (Nokia), as a research aide at Argonne National Laboratory, as a scientific programmer and web developer at the University of Chicago, and as an intern in the CESVIMA laboratory in Spain.

Resources

  1. The Persistent Memory Development Kit (PMDK), http://pmem.io/pmdk/
  2. Persistent Collections for Java, https://github.com/pmem/pcj
  3. The Boost C++ Library Collection, http://www.boost.org/
  4. How to emulate Persistent Memory, http://pmem.io/2016/02/22/pm-emulation.html
  5. Link to sample code in GitHub, https://github.com/pmem/pmdk-examples/tree/master/pmem_leak
For more complete information about compiler optimizations, see our Optimization Notice.