Performance and cmake_parse_arguments

The only variable “type” that exists in the CMake language is the humble string. The language uses some library code on top of this fundamental type to weakly implement other types, like numbers and lists.

Lists in CMake are implemented as semicolon separated strings. If you wanted to iterate or find something in a list, then you’d tokenise it and work with the tokens. That’s what the built-in list family of functions do under the good.

Function call arguments in CMake are implemented as a list as well. The runtime sets a variable called ARGV in the function’s scope. It also helpfully maps values from that list sequentially to the names passed to function when it was defined. Excess list items in the “call arguments” are put in ARGN. Most of the time you’ll only ever deal with named arguments, but if you want to have a function call with variadic arguments you’ll need to deal with ARGN.

Things start to break down when you want to pass lists to functions. If you want to pass a value directly to a function, so that one of its arguments contains the value you just passed, then usually you would dereference the variable in the function call, like so:

function_call (${MY_VARIABLE})

Things start to break down when you want to pass a list. CMake parses space-separated identifies as a “list”. If you dereference two list-containing variables next to each other, you get a single list. This makes cases like the following (which are perfectly reasonable) work the way you expect:

set (MY_LIST

When this code runs, CMake sees something like this:

set (MY_LIST

Unfortunately, this makes life hard when you want to call a function:

endfunction ()

my_function (${MY_LIST} ${MY_STRING})

When MY_LIST and MY_STRING get expanded, CMake sees a single list, as follows:

my_function ("ITEM_ONE;ITEM_TWO;STRING")

And when CMake maps everything to variable names:


This is almost certainly what you would not expect. After all, the two variable dereferences were space separated and looked like they were intended to fill two separate arguments. Alas, that’s not how CMake sees things. Its just one big flattened list.

There’s a few solutions to this problem, but they all require the caller to keep track of when the intention is to pass a list as opposed to a single item of that list.

The first option is to quote the variable dereference at the call-site.

my_function ("${MY_LIST}" "${MY_STRING}")

The second option is to pass the name of the list as opposed to its value. This works because scopes have runtime lifetime as opposed to structural lifetime, so any live variables on the stack prior to the function call will also be available in that function’s body:

my_function (MY_LIST ${MY_STRING})

The third option, which appears to be the most prevalent, is to use a system of keyword arguments to denote what values as opposed to map to which names:


The idea at this point would be to loop through all the items in ARGN and use the “markers” to determine where to set or append values. That’s exactly what cmake_parse_arguments does. However, as with most things its always a question of trading usability for performance, and the performance implications can get very scary very quickly.

cmake_parse_arguments has a concept of “option arguments”, “single value arguments” and “multi value arguments”. If I were to use a table to summarise:

option arguments: Set to `ON` or `OFF` depending on whether name is present.
single value arguments: Set as “active” when encountered. Active variable is overwritten with subsequent values until another variable becomes “active”.
multi value arguments: Set as “active” when encountered. Subsequent values appended until another variable becomes “active”.

In order to implement this, you need to iterate all the values in ARGN (N) and then check whether any one of them matches a marker in either the option (M), single value (O) or multi-value arguments (P). So its O(NMOP). It gets really slow when you start passing the contents of long lists as the “value” to a multi-value token.

As an example, I just finished doing some profiling on a project I was working on, where CMake was taking a long time to run. Profiling indicated that cmake_parse_arguments was taking 38 seconds to run, which is absurdly long. I was calling cmake_parse_arguments to pass each line from a file I had just read using file (STRINGS ...). It so happened that this file can be quite lengthy in some circumstances, which meant that cmake_parse_arguments had to do a lot of needless parsing. It was just faster to pass the filename in the end and open it in the local function. Making that change cut runtime to a few milliseconds.

As a general guideline, I now think that cmake_parse_arguments should probably be used sparingly, when you don’t expect callers to give you a huge number of arguments. The way it works was always inherently going to be quite CPU-intense. If you’ve got a slow-running project, then passing too much stuff to cmake_parse_arguments may well be the culprit.

Creating mini-distribution containers with “fake” root access on travis-ci

For most of the projects I’ve started in the last two years, I’ve been using a service called Travis CI. Travis CI is a free-for-open-source projects continuous integration service which runs in the cloud. Its also really easy to set up – just specify a list of steps (as shell commands) in a .travis.yml in the root directory of your project. If they all return with a successful exit code (e.g., 0), then your build is considered passing. Travis CI runs the same script on every new revision of your project and on all its pull-requests. It ties into GitHub’s status notification API and is just all-round super useful.

What makes Travis CI so easy to use for all kinds of projects is that for each build you get an OpenVZ virtual machine with full root access, based on Ubuntu 12.04 LTS. You can use apt to install all your dependencies, add new repositories to your hearts content, download arbitrary files, execute arbitrary code, etc etc.

Moving to Containers

One of the big downsides of virtual machines though is that they import a considerable amount of overhead. In order to provision them on demand, you need to have a a bunch of running instances in memory that you can dispatch jobs to. Once a job is completed, you need to roll back the disk image to an earlier state, kill the instance and clone a new one from an existing “clean” image. Keeping all these virtual machines around consumes a non-trivial amount of resources and in the cloud that costs money. We’ve recognised that virtual machines are not really the way to go for the future of the cloud for a little while now, and more lightweight “container” solutions like Docker and LXD are seeing increased adoption.

Container based solutions are kind of like a “chroot-on-steriods”, in that they provide a way to run a (more or less) isolated user-space on top of an existing kernel, just like any other process. There’s very little overhead involved. Travis CI recently started rolling out infrastructure based on Docker, where builds can be provisioned in seconds as opposed to minutes. I’ve tested this on some of my own projects and it really is true – in non-peak times builds have been provisioned within five to ten seconds of pushing code, and in peak times, about 30 seconds. That is an insanely good turnaround time for continuous integration.

Problems with Containers

The caveat with contained based solutions, however, is that everything runs as a much more restricted user. You don’t have access to sudo and as such you don’t have access to tools like apt. This makes doing a lot of the build tasks which were easy to do on the OpenVZ based infrastructure almost impossible on containers. Travis CI has suggested using precompile binaries uploaded to S3 and downloaded as part of your build process as a replacement for apt in the time being. That’s not really an ideal solution, especially when you want to track a rolling package release cycle.


I was quite keen on switching over as many of my builds to the container based infrastructure as possible. But the lack of root access was going to be a bit of a problem as most of my builds require the installation of packages outside the default install set.

I initially had the idea of using debootstrap to create a chroot inside the container where I could install my own stuff, just like how a pbuilder works.  Unfortunately both chroot and debootstrap require root access.

I did, however, come across another interesting project which could fill the niche quite well. PRoot (short for ptrace-root) is a project that uses the Linux ptrace utility to hook system calls and effectively pretend to be the root user operating on another root filesystem. This works quite well in the majority of cases – applications think that they are running as the root user and also believe that the directory you pass to the proot command is the root directory.

Most linux distributions ship a “minimal” or “core” version – usually a few megabytes, which contains the bare necessities to bootstrap and install packages, but is otherwise a fully-functioning, booting filesystem. This can be extracted to a subdirectory and used directly with proot. An added bonus is that the proot authors have added support for Qemu user space binary translation, which means that you can download a distribution root filesystem for another CPU architecture and have its code dynamically translated to run on the host architecture directly.

Using proot, it is possible to create a mini-distribution where apt can be installed to install whatever packages you want to install, and to run and link to the resulting packages inside the mini-distribution. This was perfect for use with travis-ci’s containers.

Incidentally, Travis CI also enabled build caching for projects running on the container based infrastructure. This mean that you can cache the directory the mini-distribution was created in between builds to avoid having to download and install packages in it all the time.

Introducing Polysquare Travis Container

I wanted to make this functionality easy to use for people looking to move to the container based infrastructure, so I’ve created a project called polysquare-travis-container on GitHub. It isn’t available on PyPI, but you can install it with the following:

pip install git+

Two commands are available. The first, psq-travis-container-create allows you to create a mini-distribution in a specified directory. It automatically downloads proot and qemu for your CPU architecture. The –distro, CONTAINER_DISTRO environment variable allows you to specify the name of a Linux Distribution to use (Ubuntu, Fedora, Debian). The –release, CONTAINER_RELEASE option and environment variable allow you to specify the name of the release to use. –arch, CONTAINER_ARCH are used to specify a target CPU architecture.  You can also specify –repositories PATH_TO_FILE and –packages PATH_TO_FILE to specify files containing lists of repositories and packages to be installed inside that mini-distribution.

If a container exists in the specified directory with that configuration, it will be retained and nothing will be re-downloaded. This allows you to seamlessly integrate the mini-distribution with the caching system.

psq-travis-container-exec can be used to execute commands inside a container. It reads the same options and environment variables as psq-travis-container-create as well as an optional –cmd to specify the command to run. The command is looked up in the mini-distribution’s PATH, so –cmd bash would run the mini-distribution’s version of bash and not the host’s.

This is what it looks like on your build output:

✓ Using pre-existing proot distribution
Configured Distribution:
 - Distribution Name: Ubuntu
 - Release: precise
 - Architecture: amd64
- Package System: Dpkg
✓ Using pre-existing folder for distro Ubuntu precise (amd64)
✓ Container has been set up in /home/travis/container

Concluding Notes

I hope this project is useful to anyone who was thinking about moving to the new container based infrastructure after it was announced late last year. I’ve already started using it for one of my own projects (which I’ll post about later) and I plan to move many more to it in future.

Buffer your IO on CMake!

A small CMake optimisation win that’s probably worth sharing.


file (APPEND

can massively impact your configure time performance. Especially on Windows, where it appears that filesystem writes are unbuffered (at least this is the case for CMake on Windows or Windows generally. I’m not sure).


foreach (VAR RANGE 500)
endforeach ()
file (WRITE output ${MY_FILE_CONTENTS})


foreach (VAR RANGE 500)
    file (APPEND output "${VAR}\n")
endforeach ()
$ time cmake -P BufferedIO.cmake

real 0m.032s
user 0m0.000s
sys 0m0.000s

$ time cmake -P AppendEverything.cmake

real 0m.657s
user 0m0.000s
sys 0m0.000s


Using debuggers in everyday life

One of the things that I teach students that I mentor is that an understanding of how computers work, at least on the lower level, gives you an unexplainable edge. Its like being a doctor, or a lawyer or a mechanic, where your field of specialty (which is really something that you interact with in everyday life) stops becoming a black box of magic and starts becoming something you are empowered to interact with in a meaningful way.

Recently I was drafting up some important documents for someone close to me and we had spent hours and hours working on. Every detail was critical. Unfortunately, our intense concentration meant that we forgot to save the document and this happened after attempting to insert a new paragraph:


The application had hung completely and looked like its was stuck in an infinite loop. We’d have to throw away the three or four hours of work as there was no way we were going to get the contents of that (now multipage) document.

Or was there?

It turns out that having been a hobbyist programmer for years, I knew a thing or two about debugging. I also knew that when you type something into a document, the contents of that document are usually somewhere in memory – and most simple text editors would probably just store the document’s contents in memory as plain or marked up text. That’s what I’d do if I wrote a simple text editor.

If we could just search the program’s memory for the contents that we can still see on-screen, then we might be able to retrieve a fair bit of the document.

It turns out that debuggers are well suited to that task.

LLDB Start


The debugger I’m using is called LLDB and it come with Xcode. It’s part of the LLVM suite of tools.

LLDB like most debuggers has a scripting capability and there’s a script shipped by default that lets you search heap memory for a matching string called cstr_refs.



Just by searching for the very first string, we found an allocation of 320 bytes.

If its just a simple nul-terminated string all the way to the end, we can just cast the pointer to a string and print out the rest of it.

Printing out string


There’s our recovered contents!

Sometimes we’re not so lucky and we don’t get a null-terminated string. Or the memory ends up being fragmented across a few different pools.

When that happens we can just read memory, so long as its within the program’s mapped range, with the memory command.

memory read

You can use debuggers for all kinds of cool stuff, including momentarily getting back those few hours of lost work.

HasNoExceptMemFun type trait

When creating policy based classes we can use std::enable_if with SFINAE and type traits as a mostly-works substitute for real concepts in order to enforce certain things about those policies. This is somewhat like the override keyword that clients can have to ensure that they conform exactly to part of an interface, including const ‘ness and noexcept .There does not appear to be any type traits to determine if a type’s member function is const or noexcept . This is important if you want to call a template argument’s member function and need to maintain a const or noexcept guarantee. noexcept in particular is particularly easy to get wrong as one could inadvertently call a function which throws an exception inside another noexcept func, risking calls to std::terminate .By mucking about with std::declval and perfect forwarding we can test a member function to see if it is noexcept .

template <class Object, class MemFun, class... Args>
struct HasNoExceptMemFun
    static constexpr bool value = noexcept (((std::declval <Object> ()).*(MemFun ()))(std::declval <Args> ()...));

template <class Object, class MemFun>
struct EnableIfHasNoExceptMemFun :
    public std::enable_if <HasNoExceptMemFun <Object, MemFun>::value>::type

template <class T,
          typename = typename EnableIfHasNoExceptMemFun <T, decltype (&T::MyFunc)>::type>
struct MyTemplate
    void MyFunc () noexcept
        T t;
        t.MyFunc ();

One might be tempted to say gesundheit after seeing something like that, but it can be decomposed fairly easily:

class... Args

is what’s known as a parameter pack and effectively allows you to have an arbitrary number of parameters to a template. We use this to grab the types of the arguments to the member function.

noexcept ()

Generates a const-expression based on whether or not a particular operation is noexcept. If any operation within noexcept () is not noexcept (true) then noexcept () will return constexpr false.

std::declval <Object>

Is the magic that allows us to skip having to “construct” Object in order to “call” MemFun inside the noexcept block. It simply turns a type into an rvalue reference. One might be posed to ask why such a seemingly unsafe construct (turn nothing into an rvalue reference?!) made its way into the standard, but it is effectively a function that does nothing in practice and is only there to help the compiler. With our new eval reference, we can go ahead and call MemFun

((std::declval <Object> ()).*(MemFun ()))

Since the rvalue reference returned by std::declval is a temporary, we need to put it in brackets. .* is the pointer-to-member operator which allows us to access some member by providing a pointer to it. Since MemFun is a type and not a pointer we need to “construct” it and also put it in braces since it is a temporary. Finally the entire operation needs to be put in brackets as operator() takes precedence over operator.*.

(std::declval <Args>...)

Turns out that std::declval can be used with parameter packs too, so we’re going to get “free” rvalue references to all of MemFun‘s parameters.

Now that we’ve “called” MemFun with Object the value will be stored in the static bool const-expression value.

From there we can use it with std::enable_if.

std::enable_if <HasNoExceptMemFun <Object, MemFun>::value>::type

std::enable_if is basically one of the closest things we have to concepts at the moment. Effectively, it creates an erroneous substitution if the value passed as its first argument is false and a valid substitution if the value passed as its first argument is true. The second argument, if specified, represents the type that the nested typedef type will be.

Remember that HasNoExceptMemFun will be true if the arguments are a no-except member function of that type and false if they aren’t. Thus, it is only in the true situation that we’re able to inherit from type.

In order to use this, we define a small helper as follows (helps to avoid really long lines):

template <class Object, class MemFun>
struct EnableIfHasNoExceptMemFun :
    public std::enable_if <HasNoExceptMemFun <Object, MemFun>::value>::type

And then we use that helper like so

template <class T,
          typename = typename EnableIfHasNoExceptMemFun <T, decltype (&T::MyFunc)>::type>

Remember that the MemFun template parameter is a type and not a pointer so we need to convert the actual member function pointer which we wish to check to its type. We can do that using decltype.

From there, we can finally be sure that in calling that member function of the template parameter that we’re not running the risk of having exceptions thrown, because we can guarantee that the function is noexcept.

Feel free to play with it here.

Iterator Invalidation

You might be fooled into thinking by using std::vector and avoiding bare arrays created with new T[] that you’ll be avoiding all possible old-fashioned pointer-misuse bugs from the C era for that variable, for instance, null pointer dereferences, out of bounds access, memory corruption, etc. Not so – if you keep iterators  and references around to elements of a vector and then do anything to change its size, the standard says that those iterators and references may be invalidated by that operation(*). More importantly, you don’t get a warning about this and static analysis tools like cppcheck might not detect it in all cases (for instance, collection is modified by a function further down the call stack).

At least for operations like push_back which only affect the end of a group of iterators – there’s a simple trick you can do to save and restore a group of iterators after such an operation.

namespace detail
    /* Main implementation of KeepIteratorsAlive -
     * counts down through all the iterators through
     * a recursive call until we get to N = 0, at which
     * point call the function in question */
    template <typename Function,
              typename Collection,
              typename IteratorTuple,
              size_t   N>
    struct PreserveIterator
        operator () (Function      const &function,
                     Collection          &collection,
                     IteratorTuple const &tuple)
            /* We need to call std::begin twice as the value of it
             * may change after we've called our iterator-modifying
             * function */
            auto distance = std::distance (std::begin (collection),
                                           std::get <N - 1> (tuple));
            auto result (PreserveIterator <Function,
                                           N - 1> () (function,
            std::get <N - 1> (result) = std::begin (collection) +
            return result;

    /* Specialisation for N == 0, at this point
     * we call the function and return an empty tuple
     * to fill up with the correct iterators again */
    template <typename Function,
              typename Collection,
              typename IteratorTuple>
    struct PreserveIterator <Function, Collection, IteratorTuple, 0>
        operator () (Function      const &function,
                     Collection          &collection,
                     IteratorTuple const &tuple)
            function ();
            return IteratorTuple ();

template <typename Function,
          typename Collection,
          typename IteratorTuple>
KeepIteratorsAlive (Function      const &function,
                    Collection          &collection,
                    IteratorTuple const &tuple)
    typedef Function F;
    typedef Collection C;
    typedef IteratorTuple T;
    constexpr size_t const S = std::tuple_size <IteratorTuple>::value;

    return detail::PreserveIterator <F, C, T, S> () (function,

You can use it like

std::vector <MyType> vector;
vector.push_back (MyType ());
auto it = vector.begin ();
std::tie (it) = KeepIteratorsAlive ([&vector]() {
                    vector.push_back (MyType ());
                std::make_tuple (it));

As for reference and pointer stability – this is more difficult. You can either wrap the object type in std::unique_ptr (and take a reference to it) or std::shared_ptr (and make a copy of the std::shared_ptr), which means you’ll pay the cost of another allocation and indirection costs on access, or you can assign each object a unique identifier by which you can use to look it up in the std::vector later.

If you want to avoid messing around with iterators, additional allocation or identifiers and in general iterator, reference and pointer stability is more important than performance then you can use boost::stable_vector

The main difference between the two is that boost::stable_vector is not contiguous, so if you have a lot of internal fragmentation (for instance, in these tests, 2 random erases for every three insertions) it can be a lot slower than std::vector. If array access performance is more important than insert/delete, then iterator adjustment and identifier tracking above can be used.

[smspillaz@平和猫とケーキ build (work)] $ ./src/benchmark 2000
2000 of boost::stable_vector (cold)
 0.000143s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
2000 of boost::stable_vector (warm)
 0.000111s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
2000 of std::vector (cold)
 0.000048s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
2000 of std::vector (warm)
 0.000040s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
2000 of std::list (cold)
 0.000104s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
2000 of std::list (warm)
 0.000069s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)

(*) Detailed by Louis Feng here.