by Daniel Robbins
In this two-part series, I'll introduce you to two technologies that can be used to accelerate the process of compiling C and C++ programs: cache and distcc. In this article, we take a look at ccache, a system that allows accelerated compilation by using a novel on-disk compilation cache. Ccache is one of those technologies that sounds weird at first, but proves to be of tremendous value once you get around to installing and seeing it in action.
In the Linux* world, one of the most common CPU-intensive tasks is the compilation of software from sources. Many software packages such as glibc, XFree86 and Mozilla can take several several hours to compile, and many of us would like to to speed up compilation so that it isn't such a chore.
Up until recently, there weren't many options available for Linux systems. Many hoped that better clustering technology such as OpenMosix would help to distribute compilation work over an entire cluster and thus reduce the time needed to compile things. Unfortunately OpenMosix is ill-suited for this type of application. The reason? OpenMosix does a great job migrating long-running processes, but the average "gcc" process usually lasts only a few seconds if that. Most of the time, these compiler processes don't exist for enough time to justify the effort required to have OpenMosix migrate them to an idle node.
So, what other options are available? Well, nearly all of us have used a powerful compilation tool called "make." Make will normally only recompile the parts of a program that have changed. When make does its thing, it can save us a good deal of CPU time.
However, make does have its limits. A lot of the time, we need to compile software from scratch and don't have the luxury of already having a partially compiled source tree available. And often, we need to perform a "make clean" to wipe out any existing object files, particularly if we apply a bad patch and need to revert to an earlier version of our sources, for example. Also, if we happen to be using SRPMS or another similar software building technology, our sources are unpacked from their original tarballs each time, so we're continually starting from scratch with every compile. Make is only helpful as a work-saving tool if you're able to keep and re-use your build tree, which isn't always possible.
So until recently, there hasn't been a good general-purpose tool for speeding up compilation of software under Linux. Then ccache came on the scene. Ccache works by caching the results of your compiles. Then, if you ever need to compile the same source code using the same compilation options and same compiler, ccache pulls the result from its cache rather than launching a CPU-intensive compiler process. The end result? You can "make clean; make" inside a source tree and have the rebuild take minutes instead of hours. Even though your object files were wiped out of your build tree, they're still available from the compiler cache. This can be extremely handy when compiling the Linux kernel source tree. If you make any significant change to your kernel compilation options, a "make mrproper" is often needed. Unfortunately, this has the side effect of wiping out all your object files. But with ccache, all your previously compiled object files can be pulled from the compiler cache, saving tons of CPU time.
Ccache can also speed things up in other ways. Let's say you compile and install openssh-3.4 from source. But then openssh-3.4_p1 comes out. To compile this new version, you need to create a completely new openssh-3.4_p1 build tree and throw away your original openssh-3.4 build directory. If you use ccache, any source files that haven't changed between openssh-3.4 and openssh-3.4_p1 will still be in the cache and not need to be recompiled! That may save you a few minutes of CPU time in the case of openssh, which is great. But when upgrading other more complex packages like Xfree86 or glibc, this capability can save hours of CPU time.
When I first heard of ccache, I immediately wondered whether it was robust enough for production use. As the Chief Architect of the Gentoo* Linux distribution, I could definitely stand to benefit from solid compiler acceleration technology, but to be useful, the technology had to work for everything. In the past, I had tried other compiler acceleration technologies, and they had worked fine when compiling most build trees, but would crash and burn maybe 5% of the time. While this may be adequate for someone who only builds a few standard build trees over and over again, it definitely made the technology useless for compiling an entire Linux distribution of 2100+ packages from source.
Fortunately, ccache is incredibly well-designed and takes every possible step to ensure that it produces the exact same output that would be produced by "gcc" or "g++" directly. For example, ccache takes the following steps to ensure that it always does the right thing:
- ccache checks the mtime and size of the actual compiler binary to ensure that it hasn't changed
- ccache determines if the pre-processed C/C++ source has changed, thus catching changes to headers
- ccache compares compiler options and will not serve something from its cache if the compiler options are different.
- ccache will not cache any linking steps, because they involve external binaries that may have changed.
Andrew Tridgell, the author of ccache, answers the "Is it safe?" question on the ccache Web site:
"Is it safe?
Yes. The most important aspect of a compiler cache is to always produce exactly the same output that the real compiler would produce. The includes provide exactly the same object files and exactly the same compiler warnings that would be produced if you use the real compiler. The only way you should be able to tell that you are using ccache is the speed.
I have coded ccache very carefully to try to provide these guarantees."
Based on my tests, his claim is accurate. I've used ccache to compile Xfree86, Mozilla, gcc, binutils, glibc, KDE 3, GNOME 2, and many other packages without a hitch.
Fortunately for us, installing ccache is quite easy. First, head over to http://ccache.samba.org* and download the most recent version of the ccache tarball (currently "ccache-1.9.tar.gz.") Then ext ract, configure, compile and install as follows:
# cd /tmp # tar xzvf /path/to/ccache-1.9.tar.gz # cd ccache-1.9 # configure --prefix=/usr # make
Ccache is compiled. Now we'll install ccache as follows:
# mkdir /usr/bin/ccache # cp ccache /usr/bin/ccache/ # gzip -9 ccache.1 # cp ccache.1.gz /usr/share/man/man1
You'll notice that we're installing ccache into a /usr/bin/ccache directory. There's a method to this madness-in a bit, we'll be able to add /usr/bin/ccache to our path when we want to enable ccache, and remove it from our path when we don't. Putting ccache stuff in its own directory makes this possible.
Now, we need to set up some symbolic links in the /usr/bin/ccache directory. We'll create a bunch of symlinks that have the same names as our common C and C++ compilers, and they will point to the main "ccache" executable. First, some basic symlinks:
# cd /usr/bin/ccache # ln -s ccache gcc # ln -s ccache cc # ln -s ccache c++ # ln -s ccache g++
Next, we need to create symlinks using the longhand architecture-specific names of our compilers. To find the correct names to use on your system, type "ls /usr/bin/*gcc," and so on to see what shows up. On my system, I need to create the following symlinks:
# ln -s ccache i686-pc-linux-gnu-c++ # ln -s ccache i686-pc-linux-gnu-g++ # ln -s ccache i686-pc-linux-gnu-gcc
I'd better explain how all these symlinks allow ccache to be transparently integrated into the build environment. To enable ccache, we'll add /usr/bin/ccache to the beginning of our path, ensuring that it comes before /usr/bin. This way, when a Makefile tries to run "gcc," it'll run /usr/bin/ccache/gcc instead of using /usr/bin/gcc directly. Of course, /usr/bin/ccache/gcc is a symbolic link to our /usr/bin/ccache/ccache executable. When ccache starts up, it detects that it is being run through a symlink named "gcc," and will then find our regular "gcc" executable in /usr/bin. It then checks its cache. If it already has the result of this compilation in the cache, it can write the cache file to the proper location on disk and exit. Otherwise, it'll execute gcc and cache the result for later use. Similar steps will occur if our Makefile tries to call "g++," "cc," "i686-pc-linux-gnu-gcc," and so forth.
So that's the theory behind this setup. Now, let's complete our ccache configuration so that we use ccache under a normal user account.
Once under a normal user account, decide how much storage you want to allocate for the compiler cache. Then, initialize your cache and set this maximum bound as follows (I chose two gigabytes:)
$ /usr/bin/ccache/ccache -M 2G
The previous command initializes a cache directory at ~/.ccache. If you prefer to locate your cache directory somewhere else, use the CCACHE_DIR environment variable to point to another location. Just be sure that your user account has read and write access to this location.
Now, all that's left to do is to add /usr/bin/ccache to the beginning of our PATH environment variable. If you're a bash user (if you don't know whether you are or not, you probably are) this can be done by adding the following line to the end of your ~/.bashrc file:
If you've decided to store your compiler cache at a location besides ~/.ccache, you should also add the following line to your ~/.bashrc:
You'll also want to ensure that your ~/.bash_profile sets the path appropriately. Typically, I instruct my ~/.bash_profile to source my ~/.bashrc. This ensures that my PATH (and my CCACHE_DIR variable) are set up correctly when I log in, and its presence in ~/.bashrc also allows me to take advantage of ccache remotely using rsh or "ssh email@example.com ( cd ~/builddir; make )" if I so desire. To tell ~/.bash_profile to source your ~/.bashrc file, add the following to the end of your ~/.bash_profile (or ~/.bash_login if that's what you use:)
ccache in Action
Now, if you log out and log back in, you should be able to type "echo $PATH" and see /usr/bin/ccache right smack at the beginning of it. You're now ready to use ccache. Go ahead and compile your favorite source tree, such as the Linux kernel. Then type "make clean" and "make" to the sources again. You should notice that the second compile goes significantly faster than the first. To ensure that ccache is working for you, view your cache statistics by typing:
$ /usr/bin/ccache/ccache -s cache hit 9155 cache miss 48666 called for link 6349 multiple source files 192 compile failed 872 preprocessor error 148 not a C/C++ file 3042 autoconf compile/link 11735 unsupported compiler option 11195 no input file 4177 files in cache 97332 cache size 1.3 Gbytes max cache size 2.0 Gbytes
You definitely won't have a cache this full after a single compile, but your statistics should indicate that ccache is indeed active. Also note that to take advantage of ccache, you'll need to build sources under the user account you just configured. So ccache won't automatically be active for root unless you configure ccache for the root account, which you can do by following the exact user configuration steps described earlier. Also note that multiple users do not normally share a .ccache directory. While it is possible to share a .ccache directory among several users, it is not a recommended practice due to security implications.
I hope you've enjoyed this small introduction to ccache. Please join me next article when we take a look at ccache's cousin, distcc, which will allow us to easily and efficiently distribute compilation work over many machines. :)
Visit http://ccache.samba.org/*, the home of ccache.
Be sure to check out the ccache man page; type "man ccache"
About the Author
Residing in Albuquerque, New Mexico, Daniel Robbins is the Chief Architect Gentoo Linux (http://www.gentoo.org*), an advanced ports-based Linux for ia32, PowerPC, Sparc and Sparc64 systems. He currently writes articles, tutorials and tips for the IBM developerWorks Linux Zone and has also served as a contributing author for several books, including Samba Unleashed and SuSE Linux Unleashed. Daniel enjoys spending time with his wife, Mary, and his daughter, Hadassah. You can contact Daniel at firstname.lastname@example.org.