libanimation for everyone

Big important edit to set the record straight

This blog post got reposted on reddit with the title “Compiz effects are coming back to GNOME-Shell”. That is not what is happening here at all. Don’t expect any of this stuff to make it into any upcoming Shell releases, because it hasn’t been discussed with anyone in the Shell team.

Instead, what I was trying to say was that I was looking into whether it was possible to do so and created libanimation in the process. Libanimation is a math library. Someone still has to write a binding on the compositor side to consume its outputs and make it work. I can say that at least for GNOME-Shell it is certainly possible to do so, but someone else would need to write it and upstream it.

End big important edit: original blog post upcoming.

Something I worked on when I first started at Endless was the rather interesting task of “making Wobbly Windows from Compiz work on GNOME-Shell”.

This task is interesting in the sense that GNOME-Shell doesn’t really work in the way that the wobbly plugin for Compiz wanted it to. There is the “Wobbly Windows” extension for the shell, but that was sort of out for Endless, since we didn’t want to rely on the extensions mechanism and that extension also didn’t quite work the way I had come to know and love.

What I really wanted to do with this project was replicate the same smooth wobbly windows from Compiz in the shell itself.

Where do you even start on such a project? The first hard part is trying to get the math to work and knowing that you have the math working. Thus, libwobbly was born – a library that reimplements the same math that Compiz itself used, allowing callers to create a 3×3 mesh of springs representing the physical effect of “wobbly” windows and providing functions to interpolate into that 3×3 mesh. Then, we used libwobbly in our fork of GNOME-Shell along with ClutterDeformEffect and a bridge to get the same, buttery smooth wobbly windows in GNOME-Shell.

However, this approach had its shortcomings. Up until very recently it only exported a C++ interface, because I very foolishly was obstinate about keeping things in C++ at the time. This made integration into the shell a pain for a variety of reasons, among them frequent ABI breaks due to compiler or boost changes, needing to have a whole separate GObject-Introspection typelib just to handle the C++ stuff, separate build rules, etc etc. Of course, this state of affairs wasn’t well appreciated by others who work on our fork of the shell, with the running joke that whenever the shell breaks, it was always libwobbly’s fault.

For a few reasons I won’t get into, window animations became a priority at Endless again and we wanted to see if we could do a better take on libwobbly. I didn’t want to throw the whole thing out, since it is quite well tested and a lot of thought went into its design.

The result is an improved library, libanimation designed to be used from C++ programs with a C++ interface as well as C and language bindings with a GObject interface wrapping the C++ interface. In future, we should also be able to create emscripten bindings to the C++ interface, allowing it to be used directly in web programs too.

The name change represents a broadening of the project scope. For a few reasons, we wanted to preserve not only wobbly windows, but also the other (decent) effects from Compiz too, among them zoom, bounce, glide and magic lamp. The library does not insist on taking over any of the rendering or scene graph management. Instead, libanimation is designed to provide “the math” and can be seen as a black box, where you feed it a requested animation configuration, a step function (linear, reverse, quadratic, sigmoid) and can “get out” a series of affine transformations or deformed co-ordinates given a pair of unit co-ordinates.

Over time, more animations will be added. I hope that the library will be useful to authors of other compositors or applications and help to preserve some of the more magical parts of Compiz as technology itself matches forward.

Serverless React Boilerplate

I’ve been keeping my eye on serverless as a way to do hosting and deployments for my next react project when I get around to it and this week I finally got a chance to look into it.

React Serverless.png

The tl;dr is that I’ve forked react-boilerplate as react-boilerplate-serverless if you just want to get started with serverless deployments. You should be able to immediately use npm run serverless-deploy.

For the uninitiated, the main difference brought with so-called “serverless” applications is that they don’t run on a dedicated virtual machine or container that is running 24/7. Instead, your code is abstracted behind infinitely scaleable architecture and you only pay for the execution time of your code as opposed to the server itself. This is the sort of automatic scaleability that people like to talk about with cloud computing.

There’s a few important distinctions with serverless computing that comes with its architecture. First, we call serverless deployments “functions” because they really are just that – the infrastructure hosting the serverless service takes care of all the network IO and just calls your “function” when a particular endpoint is accessed. In that sense, you lose control over how network IO is handled and the process which is handling that network IO. The “function” you are deploying in this case is really a loadable library which can be a collection of scripts and data but with no associated running “process”. Instead the library is loaded at runtime, which means that the serverless computing provider can bill you only for the executing time your library actually uses in handling requests.

In practice, this distinction tends not to matter so much. It turns out to be relatively trivial (with some caveats) to turn any express-like application into a serverless application thanks to tools like serverless-http. If you wrap your express application with the serverless wrapper from that package, infrastructure like AWS lambda will just call your route handlers directly after being invoked from a frontend server like AWS API Gateway. To your application, this is essentially no different than the route handlers being invoked asynchronously by libuv and node’s nextTick function.

There’s some other practical annoyances to be aware of as well, though they can all be mitigated.

Support for compression and binary data is still sketchy at best

Unfortunately, AWS API Gateway handles binary content in a particularly awkward way, expecting it to be serialised to base64 by the request handler (e.g., your serverless function) before it goes back to the API Gateway (which should, in theory, unserialise it back into binary). For a long time, API Gateway didn’t even support binary responses and only gained support for it last year making it possible to actually host a server that serves image data directly with Lambda. But you still need to manually specify the MIME types that should be treated as binary on the API Gateway configuration. I’ve found that this works well enough for things like images but it completely broke down when it came to compressed text.

Most node servers, including the server set up in react-boilerplate use compression-middleware which allows for the client and server to agree for responses to be gzipped. Unfortunately, API Gateway chokes on the binary data if you give it the compressed response directly and specifying that the binary data be encoded in base64 causes the client to choke. I haven’t found any way to get around this problem other than just disabling compression. Maybe the story will improve in time to come and we won’t have to worry about messing about with settings to support binary content.

Execution time

Since popular servers like node are single-threaded and event based you probably shouldn’t be performing any too-long-running or expensive tasks from within your server anyway. But you really need to watch out with in-function asynchronous tasks that could take longer than 10 minutes, since the execution time is limited to 300 seconds. This could be a problem if you need to upload large files with the help of the function. One workaround for that is to just have the client upload the files directly to an S3 bucket, though that might require a little more stuffing about than you would like.

Everything is stateless

Again, if you designed your web server in line with RESTful principles this shouldn’t be a problem in theory, though it could be a problem if you have logic that runs on first start.

You don’t own the infrastructure

If you’re used to Platform-as-a-Service this isn’t a huge deal, but its worth keeping in mind that you don’t control the runtime that your application is “running” on. In fact, you don’t even control whether your application is running at a given moment, other than when it is handling requests.


Check it out

So far I’ve had enough luck to be able to get a render of a page including images and associated scripts and styles, which proves that this definitely can be done. Hopefully this will become a more affordable and simpler way to deploy simple servers in the future, without having to worry about provisioning entire virtual machines or containers running 24/7 for the sake of hosting.

Exploring faster blurs

This week I a saw blog post on Hacker News entitled “Fastest Gaussian Blur” (by Ivan Kutskir). That caught my interest, because one thing that always irked me back when I worked on Unity was the fact that our UI blurs felt quite sluggish and there didn’t seem to be much we could do to speed them up given the size of the blur radii that we were using. I had a Sunday afternoon to kill and recently broke my ankle again (no gym or rock climbing), so I figured why not try implementing some of them.

Gaussian Blur

The post goes over four different algorithms for implementing gaussian blur, the first one being the standard approach of just doing an R-squared pass over every pixel and calculating its normalised gaussian distribution weights inline. As you can imagine, this is terribly slow, because you need to sample R-squared pixels for the entire image (and needlessly recompute the gaussian weights!). As is typically the case, its also the most trivial to implement in a fragment shader – you just take the input image and do two nested loops for the radius, sampling each neighbouring pixel and applying its calculated gaussian weight.

As a simple blur shader, its pretty satisfying to implement, since it is probably the most accurate in terms of what we consider a “blur” to be. But alas, the performance sucks. So can we do better?

Box Blur

Well, the article says yes, we can. Algorithms 2 and 3 in the article go over a technique called “box blur”, which is basically a way to approximate gaussian blur without having to compute weights. The trick here is to do several passes over the image each with different radii that you work out using some calculus. The article goes into more detail about why the math behind this works. As it says, the nice thing about box blurs is that you don’t need to worry about computing weights – the application of different blur radii tends to converge on the same result.

The only difference Algorithms 2 and 3 is that Algorithm 3 explicitly separates out the horizontal and vertical passes (thus, you have an O(2(nr)) operation as opposed to an O(n*r^2) operation. Most blurring approaches I’ve seen tend to do this because the complexity saving alone outweighs the cost of doing multiple framebuffer passes. In practice, it does turn out to be more performant, which is nice.

Precomputed Weights with a Data Texture

I was tempted to go back and try implement this in GNOME-Shell, though I saw that someone had already beaten me to it. Initially I was a little defeated, but then I became curious looking at the approach. It seems as though this extension uses a bunch of hardcoded weights separated into “bands”, though it initialises those weights on each pixel. I thought that was a little odd, so I dug a bit deeper into what it was doing by reimplementing it.

What was neat about this approach is that it is essentially a 2-pass gaussian blur, O(2(n * r)), except without the cost of having to recompute all those weights all the time. That’s handy, though since the weights are precomputed, its easy to run into a situation where you “run out” of weights if you want a bigger blur kernel.

Luckily, the author included a tool to compute weights up to a given blur radius. What was interesting is that the weights are put into “bands” for each stepping of 5 units on the radius. I presume that this was done to save on space inside the fragment shader since they are all hardcoded.

Now, the size of the computed weights array changes as you increase the radius, since that gaussian distribution function kind of “flattens out” and more area along the x-axis becomes significant. The original author dealt with this problem by hardcoding a size-31 array that only fills up once we get to the last “band” and assigning the each hardcoded weights array to it depending on which “band” your radius was in. Unfortunately, the notation the author uses for directly assigning the array isn’t implemented in OpenGL ES 2 (which is what WebGL uses). First I tried just brute-forcing the problem and using a regex to generate code that assigned the weights directly, but I realised that there is a much nicer alternative.

You can’t pass variable length arrays to shaders through uniforms in GLSL – I believe that doing so would probably violate the linker’s expectations as to memory layout etc. But you can pass handles to textures to sample instead!

Textures used to be this special datatype in OpenGL, but thanks to OES_texture_float and hacks where we treat textures as “memory”, they’re now useful as general purpose variable-length storage. So much so its how GPGPU in WebGL works  – the threejs people even have an example on how to use it. I once did a project where I used the depth buffer value to dynamically scale a blur radius – maybe that can be the subject of a future blog post.

So of course, what you can do to pass these gaussian weights down to the shader without having to hardcode or precompute them is to create a texture using GL_FLOAT as the storage type and set the red channel to the weight value. Then just sample the texture using texture2D inside the fragment shader to get the weight value for a given radius offset.

In my testing, such an implementation seemed to have pretty good performance even for very large blur kernels, however, it appears as though the sampling mechanism diverges from what I’m used to with gaussian blur  and we end up giving more weight to things outside the focal point of the pixel very quickly. Perhaps something to be looked into for next time.


Explaining Boyer-Moore

I had to implement the Boyer-Moore string search algorithm from scratch for an advanced algorithms class in 2015. It remember it took ages to get my head around the various concepts and I was so happy when I finally got it. So it was unsurprisingly disappointing that fast-forward to today, I’ve forgotten how it works.

Luckily, I’m on a bit of a mission to learn Rust and algorithms are a pretty good testing bed. So this article is partly for my benefit to summarise my own findings and partly for the benefit of the reader who might be scratching their head about how it works.

Naive Substring Search

To understand why such a big deal is made about Boyer-Moore, you need to appreciate why “naive” substring search scales so poorly.  “naive” substring search is basically exactly how you’d expect to implement a substring search:

char needle[] = "some word";
char haystack[] = "Why oh why must I always find the word, some word, any word";

size_t needle_len = strlen(needle);
size_t haystack_len = strlen(haystack);

for (size_t i = 0; i < (haystack_len - needle_len); ++i) {
    bool match = true;
    for (size_t j = 0; j < needle_len; ++j) {
        if (needle[i] != haystack[i + j]) {
            match = false;

    if (match) {
        return i;

return -1;

This implementation of substring search is dutiful – it will try and match, from front to back, the needle at every position in the haystack. The problem is exactly that, it will try and match the needle at every single position in the haystack. In the worst case, it needs to check the length of the needle against the length of the haystack, haystack times. We say that this is O(nm).

If you’re just matching one or two characters against a very long body of text, its probably not too bad. But if you’re thinking of checking if a paragraph is contained a book, you’re gonna have a bad time.

If you visualize what is going on in the naive search, you’ll probably notice all the unnecessary work:

Why oh why must I always find the word, some word, any word
some word
 some word
  some word
   some word
    some word
     some word
      some word

The astute reader might see this and realize that in every case above, we’re just matching the letter “s” over and over again against cases that are futile anyway, because for the entire string to match, the last letter must also necessarily match and … it doesn’t. So lets handle that case now:

char needle[] = "some word";
char haystack[] = "Why oh why must I always find the word, some word, any word";

size_t needle_len = strlen(needle);
size_t haystack_len = strlen(haystack);

for (size_t i = 0; i < (haystack_len - needle_len); ++i) {
    bool match = true;
    for (size_t j = 0; j < needle_len; ++j) {
        /* Compare back to front */
        if (needle[(needle_len - 1) - j] != haystack[i + ((needle_len - 1) - j)]) {
            match = false;

    if (match) {
        return i;

return -1;

Main difference here is that we compare back to front – so at least we’re catching all those futile mismatches early. While this is a little better, it is still technically O(nm) – the same problem could just happen in reverse, all the characters at the back could match, except for the first one. Now you’re back to square one.

Skipping Ahead

The impatient reader might wonder why we’re bothering to re-examine all those sub-patterns within the pattern we just failed. That alignment didn’t match, so why not check the next one instead? That would make the whole thing O(n), right?

Why oh why must I always find the word, some word, any word
some word
         some word
                  some word
                           some word
                                    some word
                                             some word
                                                  some word

Damn! We missed the match! The problem with this approach is that in skipping words you might partly overlap the match, then partly overlap another part of the match, but never actually see whole match. So this substring search approach is actually incorrect. What we’ll need to do “intelligently” figure out how many characters to skip ahead so that we’ll eventually align with the match in the haystack. And that is exactly what Boyer-Moore is about!

The Boyer-Moore algorithm defines two lookup tables, the bad character table and the good suffix table. The tl;dr of both of these tables is as follows:

  1. The bad character table tells you how many characters to shift the pattern forward if you want to align the character in the haystack string where the mismatch occurred with the closest character in the needle string such that the two will match. The theory goes that if there’s at least a character matching in the middle of the pattern, there’s some likelihood that the new alignment itself is a match.
  2. The good suffix table tells you how many characters to shift the pattern forward such that the substring into the pattern you just matched (eg, all the characters that did match up until the mismatch) occurs again. Generally speaking the good suffix table doesn’t come into play very much, but it is relevant in cases where you have repeating patterns within the needle string and you need to check the repeated pattern against the input string. Usually this comes up in the context of trying to match DNA strings, where you only have an alphabet of GATC.

That probably didn’t mean very much without examples, so lets start with the bad character rule. Look again at what we were doing when we tried to just blindly skip ahead by the text length:

Why oh why must I always find the word, some word, any word
some word
         some word
                  some word
                           some word
                                    some word
                                             some word
                                                  some word

If you’re observant, you’ll see that we got close, but we just missed out, specifically, here:

ord, some 
some word

We mismatched on the last character (remember, starting from the back!), d != e. But we know that the character ‘e’ occurs in our pattern! If we just shift the pattern forward such that the two ‘e’ characters align….

some word 
some word

Ah-hah! We have our match! So by using the bad character rule we were able to keep skipping words(*) all the way until we got close to the match, and then we just aligned with the match! That’s very very close to linear performance for something that was previously quadratic!

So where does the good suffix rule come in to it? Well, to illustrate this, I’ll have to use another example:

ababa abcbb abcab

Yes, this is slightly more contrived, but it demonstrates that the bad character rule on its own is not sufficient to ensure correct matching:

ababa abcbb abcab abccb

Uh-oh, we missed it! So why did that happen? Well, we got close, but then this happened:

ababa abcbb abcab abccb

“ab” matched “ab” at the end of the needle, which was a good start. But then “c” didn’t match “a”. Since ” ” does not occurr within our string, we skip forward by five characters (eg, the length of the needle) and completely miss the fact that we could have aligned “ab” at the start of the string with the “ab” that we just matched! The good suffix rule accounts for this. We kept a record of how far back in the string the same subpattern “ab” occurs again. In this case, it is three characters away, because we matched two characters, we shift forward by three:

ababa abcbb abcab abccb

And we’re done!

Computing the tables

By far the most annoying part of implementing the algorithm the algorithm is computing the tables for the bad character rule and good suffix rule. Most of the content I’ve seen around this has literally been as useful as code snippits with cryptic variable names or just snapshots of the already-computed tables. I’m going to walk you through how the tables actually work, step by step.

Starting with the bad character rule table, recall what the bad character rule actually did:

Why oh why must I always find the word, some word, any word
some word
         some word
                  some word
                           some word        5 <- (shift for e from d)
                                    some word
                                         some word (shift by only 5)

We saw that we mismatched on ‘e’ at the last character in the needle. We know from there that the character ‘e’ is five characters before that last character, so we shift forward only five characters the next time. So it stands to reason that we’ll need to compute a 2D table – for every character in the alphabet and every position in the needle, how far back do we need to look in order to the character in the alphabet in the needle?Again, it isn’t the character in the needle at the given index that we’re interested in, its the character in the alphabet that we’re looking at (and we do it for each character in the alphabet). So yes, this is quite a large table:

    a b c d e f g h i j k l m n o p q r s t u v ... w
0 s
1 o                                     1
2 m                             1       2
3 e         1               1   2       3
4           2               2   3       4
5 w         3               3   4       5
6 o         4               4   5       6           1
7 r         5               5   6       7           2
8 d         6               6   7     1 8           3

Notice a convenient pattern? All we’re doing is figuring out how many characters backwards into the needle we need to go in order to reach that character in the alphabet. If we can’t find it, then no worries, we just don’t put an entry  there (or typically, we’d put some sentinel value like 0 or -1). In rust, you might have code that looks like this:

fn find_pending_character_index(chars: &Vec<char>, start: usize, needle: &char) -> usize {
    let len = chars.len();

    for i in (start +1)..len {
        if chars[i] == *needle {
            return i - start;

    return 0;
fn bad_character_table(pattern: &str) -> Vec<Vec<usize>> {
    /* Go backwards through the string for each character and work out how many characters
     * away that character is (backwards) */
    let mut table =vec![vec![(0asusize); pattern.len()]; ASCII_LOWER.len()];

    /* String gets reversed here */
    let chars: Vec<char>= pattern.chars().rev().collect();
    let len = chars.len();

    /* Enumerate through the characters, back to front (idx here starts
     * at zero, but think of it as (len - 1) - idx */
    for (idx, _) in chars.iter().enumerate() {
        for (alphabet_index, c) in ASCII_LOWER.iter().enumerate() {
            /* How many characters from this one until we reach the end
             * or see this character again */
            let res = find_pending_character_index(&chars, idx, c);
            /* Since idx was zero-index, we invert it here, insert
             * number of steps backwards we have to go to find this
             * character in the alphabet */
            table[alphabet_index][(len - 1) - idx] = res;

    return table;

Okay, now for the good suffix table. Thankfully, it is much smaller than the bad character table, but it is a little more complicated to generate. Recall what the good suffix rule was doing:

ababa abcbb abcab abccb
         abcab <- (shift forward by 3 to align subpattern)
            abcab (shifted only by 3, not by 5)

So essentially we need to keep track of, for each suffix how many characters we can look backwards in the needle to find the same suffix again. Simple enough to say it, but how do we actually implement something like that?

Well, the best way to explain it is that for each suffix, we have a sliding window starting from one character before which is the same size of the suffix. We scan both the substring pointed to by the window and the suffix to see if there is a match, and if so, we take note of how many characters away our window was:

    b (suffix)
   _ (window)
 b (suffix appeared again, 4 characters away)

Same thing with the “ab” suffix…

   ab (suffix)
ab (suffix appeared again, 3 characters away)

So our overall good suffix table will look like this:

0 1 2 3 4
a b c a b
      3 4

What does that look like in implementation? Well, something like this:

fn good_suffix_table(pattern: &str) -> Vec<usize> {
    /* For each character in the string, take the subpattern from char[i]..char[len]
     * and see if such a subpattern exists elsewhere in the string, moving the window
     * back one by one. This is an O(N^2) operation over the pattern, since you have
     * to check the subpattern over each window from the current position. */
    let chars: Vec<char>= pattern.chars().rev().collect();
    let mut table =vec![(0asusize); chars.len()];

    for idx in 0..chars.len() {
        /* Add one here since the outer loop was exclusive, we do not want the inner
         * loop to be exclusive of the last index */
       for candidate in 1..((chars.len() +1) - idx) {
           /* Checking two empty subpatterns against each other doesn't make
            * much sense - we want a full shift in that case */
           if idx > 0 && matching_subpatterns(&chars, 0, candidate, idx) {
               table[(chars.len() - 1) - idx] = candidate;

    return table;

Putting it all together

So now that we went to all that effort to construct our good suffix and bad character tables, how do we make use of them in the overall algorithm? Well, it is remarkably simple:

fn boyer_moore(input: &str,
               pattern: &str,
               bad_character: &Vec<Vec<usize>>,
               good_suffix: &Vec<usize>) -> Option<usize> {
    /* Some obvious cases */
    let pattern_chars: Vec<char>= pattern.chars().collect();
    let input_chars: Vec<char>= input.chars().collect();

    if pattern_chars.len() > input_chars.len() {
        return None;

    /* Align at the first character */
    let mut alignment = 0;
    let max_alignment = input.len() - pattern.len();

    loop {
        let mut mismatch =false;

        for (pattern_index, i) in (alignment..(alignment + pattern_chars.len())).enumerate().rev() {
            if input_chars[i] != pattern_chars[pattern_index] {
               mismatch = true;

               /* If we're on the last alignment, return now,
                * as there are no matches */
               if alignment == (input.len() - pattern.len()) {
                  return None;

               /* Mismatch occurred here, look up this character/index pair
                * in the good suffix and bad character tables to see how far
                * forward to shift. If the shift amount is zero, shift
                * forward by the entire pattern size */
               let charidx = (input_chars[i] as usize) - ('a' as usize);
               /* Shift amount is the higher of either the bad character
                * rule or the good suffix rule */
               let shift = std::cmp::max(bad_character[charidx][pattern_index],

               if shift == 0 {
                   /* No shift found, shift as far as we can, either to
                    * the end or by the length of the needle */
                   alignment = std::cmp::min(max_alignment,
                                             alignment + pattern_chars.len());
               } else {
                   alignment = std::cmp::min(max_alignment, alignment + shift);


    /* No mismatches, return the current alignment */

    if !mismatch {
        return Some(alignment);

Essentially what we’re doing is starting at the beginning and matching from back to front. If we find a mismatch, we record which character we found a mismatch on and look up both the input character and the mismatch character index in the pattern in the good suffix and bad character tables. If those tell us how far to shift, we shift by that amount. Otherwise, we shift forward by the full pattern length.

And that, is how you get super fast linear time string matching!

A better way to think about Quicksort

Recently I’ve been learning haskell, a purely functional programming language. I still don’t really understand why I’m doing that; I think functional style is great and it makes a lot of sense to divide programs up into mostly atomic units, composable at the domain level. In a functional style, you generally try to put the majority of your code into imdepotent functions whose outputs are strictly defined by their inputs. The function itself is kind of a black box and checks the “functional style” checkbox as long as there’s no observable side effects. Imperative shell, functional core. I find that programs that use a functional style is a lot easier to reason about, to test and to re-use. Purely functional programming on the other hand always seemed like overkill to me, sacrificing comprehensibility for the sake of functional programming principles.

So what does this have to do with Quicksort? Well, it turns out that Chapter 5 of Learn You a Haskell has a very motivating example:

quicksort :: (Ord a) => [a] -> [a]  
quicksort [] = []  
quicksort (x:xs) =  
    let smallerSorted = quicksort [a | a <- xs, a <= x] 
        biggerSorted = quicksort [a | a <- xs, a > x] 
    in  smallerSorted ++ [x] ++ biggerSorted

Now I appreciate that haskell’s syntax is a little obtuse, so lets rephrase that in another more widely known language:

def quicksort(elements):
    if elements:
        pivot = elements[0]
        smaller = quicksort([e for e in elements[1:] if e =< pivot])
        larger = quicksort([e for e in elements[1:] if e > pivot])
        return smaller + [pivot] + larger

    return []

When I first saw this example it was like a lightbulb went off in my head. Quicksort finally made sense. It wasn’t this weird looking algorithm that just happened to work and have decent time complexity for some reason. See, if you ever took a Data Structures and Algorithms course at university, you were probably taught (in-place) Quicksort like this:

quicksort(elements, left, right) {
    if (left == right) {

    pivot = partition(elements, left, right)
    quicksort(elements, left, pivot)
    quicksort(elements, pivot, right)

Which probably lead to the first “WTF” – in my case – “WTF does a partition and a pivot have to do with sorting?”

partition(elements, left, right) {
    cmp = elements[left];
    pivot = left + 1;

    for (i = (left + 1); i < right; ++i) {
        if (elements[i] < cmp) {
            tmp = elements[pivot];
            elements[pivot] = elements[i];
            elements[i] = elements[pivot];


    tmp = elements[pivot];
    elements[pivot] = cmp;
    elements[left] = tmp;

    return pivot - 1;

At which point 95% of people in the class would think be thinking “WTF????????”

And then the prof will usually open their mouth and say something like “so first you pick a pivot” and then everyone in the room will immediately switch off and just pray that they can rote-learn their way through the upcoming test and forget about it.

Really, the functional way of looking at it makes far more sense.

But hang on, you say, “my textbook said that Quicksort was O(n log n) in the best case and O(n^2) in the worst case – why would you want to use it over merge sort when merge sort is guaranteed to be O(n log n) and both methods require concatenating lists together?”

That’s a good point. The purely functional style doesn’t allow for inner mutability of the array, so you can’t exactly sort in place. But it is a good way to understand what on earth is going on with the in-place sort. But to understand that, first you need to understand why the recurrent version of quicksort works and how it is related to merge sort.

If you’ve seen the recurrent version of merge sort, it looks sort of similar to the recurrent version of quicksort:

def mergesort(elements):
    if len(elements) <= 1:
        return elements;

    midway = floor(len(elements) / 2)
    left = mergesort(elements[0:midway])
    right = mergesort(elements[midway:1])

    merged = []

    leftIdx = 0
    rightIdx = 0
    while leftIdx < len(left) or rightIdx < len(right):
        if leftIdx < len(left) and left[leftIdx] < right[rightIdx]:
            leftIdx += 1
        else if rightIdx < len(right):
            rightIdx += 1

    return merged

An important property of comparison sorts is that lower time complexity bound is n log n. That is to say that in order to provide some sort of operation which gets the array into a “more sorted” state than it was before, you need to evaluate all the elements and at best you’ll probably have to do that operation about log n times. So what is the incremental thing that merge-sort is doing here? Well, it looks at two chunks of an array and once it is done with its O(n) operation, it guarantees the inner ordering of that chunk, so if you were looking at [1, 5] and [4, 7], the result would be [1,4,5,7]. But it doesn’t say anything about the ordering of the resultant chunk in relation to other chunks in the array, nor does it guarantee that that chunk contains all the elements in within the range 1-7 that also appear in the array. All that said, the more you keep comparing two separate chunks and guaranteeing their inner ordering, the more “sorted” the array becomes. Because we kept halving the array, it stands to reason that if you do this about log n times, you’ll eventually end up with a sorted array.

If you compare the two recurrent equations, you’ll realise that quicksort is the same thing but in reverse. Quicksort guarantees that the outer ordering between two chunks is maintained – each number in the right hand side will always be greater than each number in the left hand side. But it doesn’t say anything about the inner ordering of those two chunks. All that said, if you keep going and doing the same thing on each of the two chunks, you’ll eventually get to the point where there is no outer ordering to be maintained and thus each chunk has an inner ordering. Again, in the best case, you’ll probably be halving the array, so you can recursively maintain outer-ordering between two chunks in log n time (although in the worst case, you’ll be splitting the array into two parts – 1 and n – 1 elements, at which point you have to maintain outer-ordering n times).

Okay, so what does this have to do with the overly-complicated looking in-place partition function? Well, imagine there was some atomic function that could take an array and make it outer ordered, such that all the elements in one part were always greater than all the elements in some other part just by looking at each element in the list once. That’s what partition does. All the elements less than a certain value get swapped over to the left hand side and all the elements greater than it get swapped over to the right hand side. Then you just put the centrepiece back in place and you’re done.

I still think the functional style is a lot easier to understand though. Maybe purely functional programming isn’t so bad after all.

Vale Unity

Its been almost six years since Ubuntu shipped with Unity as the default desktop and compiz as the underlying compositor. For every release since then, a similar software stack has shipped on every subsequent release up until 17.04 . Those ten releases make up about half of the Ubuntu desktop’s lifespan and certainly more than half of the person-hours invested into the project, so today’s announcement to wind it down is a pretty significant moment.

I haven’t worked on Unity or Ubuntu for a very long time now, but the announcement today has filled me with a couple of emotions. On one hand its disappointing to know that your legacy and a project that you personally invested a lot into is going away. Time truly is the destroyer of all things. I know Mark and a lot of other people at Canonical must be feeling particularly gutted right now. On the other, maybe its a sigh of relief. Unity certainly hasn’t been the most conflict free or easy project and maybe with this difficult decision the people involved finally have the freedom to move on without feeling guilty.

The Team who Built Unity

I have nothing but good things to say about the Desktop Experience Team at Canonical – later the Product Strategy Team and the associated Ubuntu community. Our core engineering staff was a killer combo. We had two people, Jason and Neil, who had amazing experience with graphical toolkits and a passion for Desktop GL. We had Mirco, Gord and Jay who knew how to extract every last ounce of performance from a GPU. We had Tim and Alan who were a marvellous leaders and a champion of stability and software maturity. We had Alex, Thomi and Thomas who championed automated testing. We had Neil again because he literally filled far more shoes, so he deserves more than one mention. We had Michal and Mikkel –  who were a masters of desktop search. We had Didier and Seb, who could get things shipped, no matter what. We had Robert – who knew how to put the pieces together for groundbreaking web-integration. We Marco and Andrea, who took the lead in maintaining the desktop even when nobody ask them to. We had Daniel who was a leader in desktop quality. We had John, Otto and Rosie who were amazing designers. We had Jorge and Jono who were personable and enthusiastic community managers. There were countless others making invaluable contributions. And then we had me – and I’m just humbled to have crossed paths with all these people.

We were co-workers but we were also friends. Not only friends, but very close, lifelong friends. We often talked about non-work related stuff to each other and really went the extra mile to support each other. We were under a lot of pressure to make the dream a reality and when you gel well together, that pressure makes friendships even stronger. We didn’t see each other in person much, but when we did, it honestly made up some of the happiest times of my life.

The Fun Moments

We had a lot of great moments with each other building Unity. There was a time when we had to instruct Mark on how to pull and build a new build back when we didn’t have stable updates in the alpha release. Mark was on a cellular connection and had to log out of IRC in order to restart his session. I remember Jason saying “whelp, if he doesn’t come back in 1 minute, assume we are all fired”. That got a few laughs.

There was another time when the DX team and the Design team had an ongoing prank-war with each other. This started with the “flies in the ice-cream” prank and eventually culminated with a sardine pizza being sent to Jason from England. I still have no idea how that happened. Then there was also the bug report.

We slipped all sorts of easter eggs into Unity. One, which when launched with UNITY_NEKO=1, replaced all images in the dash with images of cats. Another which almost made it in filled the screen with fire when the konami code was pressed.

The Low Moments

Of course, Unity’s development had its bad times. The development was, understandably not well received by the established community members who saw the project as fragmentation. We faced numerous quality control problems when trying to get the desktop out on such a tight timeline. The stacking bugs kept me up day and night and required a lot of personal sacrifice to get to a state where they stopped occurring so frequently. Some of our users were randomly hateful and/or toxic. For instance, someone in a Reddit AMA told Jason that they hoped he would be “hit by a bus”. That’s not a very nice thing to say. In the later half of the 11.10 cycle, I began to feel overwhelmed with matters going on in both my personal life and the status of the project and entered a deep state of depression. There was additional pressure in the company as Unity 2D was developed and the projects inevitably, though perhaps not intentionally, were pitted against each other.

Then there were the post 12.10 staff departures. It was really sad to see the team break apart almost as quickly as it formed. I left during this time as well after most of my friends on the team had left.

Unity 8

Unity 8 was a total reboot of the project, learning from the failures of Unity 7. There were a lot of high hopes. Canonical had recently been on a recruitment drive and hired a lot of smart people. On paper, Unity 8 looked technically groundbreaking. Unity was going to move away from an ageing, unmaintainable and limited display server architecture to one which was pervasively multithreaded, could take better advantage of mobile and desktop hardware, had high test coverage and really top-notch developers working on it. Global package management was going away in favour of isolated and atomically upgradeable containers known as ‘snaps’. The OS was going to make mobile hardware a first class citizen. Touch was going to become the new interaction paradigm. Everything was going to become more internet connected and integrated than before. Apps were going to become smarter on power and resource usage. The project was moving to a toolkit which at the time had an incredible backing and momentum behind it. Legacy apps were going to be supported. “We have to go deeper” was the motto of the team.

The pieces were in motion. A very real demo of this was ready in 2013 for announcement to the world. The technology was so ground breaking that it was going to continue under wraps for a while until it could reach perfection. And then we’d be in free software utopia. Convergence would become a reality. Ubuntu would win, by sheer force of being better than anything that came before it. The plan, idea and resources available to execute on that plan were perfect.

And then it was delayed.

And then it was delayed again.

And then it was delayed for a third time.

And now its not shipping.

An outsider’s postmortem

I’d be lying if I said I didn’t really see this coming, but not for the reasons that other people might say. The strategic decisions made were necessary – X11 was seriously limiting our ability to deliver on features and stability, without atomic updates, its hard to give people a secure and stable phone – but they were ultimately fatal. I remember back when we decided to rewrite compiz and completely underestimated the scale and effort to make that work in a bug-free way. It killed the project. Once you rewrite one thing, its very difficult to resist the urge to start rewriting other things while you’re at it. Porting compiz to C++ wasn’t enough; we had to split rendering into plugins; we had to support the non-composting case; we had to support reparenting; we had to support a replacement for function pointer hooking; we had to drop multi-screen and we had to move to a more up to date rendering model. All at once. We had to understand code that we hadn’t written and code where the original authors were long gone from the project. And to make it look like progress was made, we had to bolt on more features on top of the existing rewritten ones with fewer development resources than the ones who had already burnt out and quit.

About this time last year Unity 8 started to look an awful lot like compiz++ on a larger scale. Not only was it going to use Qt, it was going to use an entirely new display server! Not only was it going to use an entirely new display server, it was going to use an entirely new driver model! Not only was it going to use an entirely new driver model, it was going to use Android as the base for mobile! Not only was it going to use Android and Debian, but Debian would go away and we’d have containerised applications! Not only was it going to use containerised applications, but the applications would be all-new and written in Qt! And not only all that, but its going to be perfect!

The delays just kept on happening. And after ever delay, so as to motivate the base and show that Unity was still relevant, more scope was added. More scope led to more delays! More delays led to more scope! And its not going to ship until its perfect!

I became worried around this time last year that it was never going to ship. The reality was that “perfect” became a moving target. And the thing about aiming for perfection is that you can never hit it until you’ve had your imperfect product in front of users for a few years. Ironically, Unity 7 is now seen as “perfect” for a lot of people, because they don’t remember the change from GNOME 2 to Unity. And now there’s going to be another transition in moving to GNOME 3.


Unity was all about turning a community made platform into a product. The goals were noble – we wanted to get this amazing bit of community goodness into the hands of so many other users. We wanted to preach the word of free software. We wanted to show people that they do have an alternative to the big four companies that control the software they use on a daily basis. We wanted to show people that free software could win! And the only way to do that was to turn it into a product that was so irresistible and incredible that you couldn’t not buy it.

And that’s where the problem was. The developers making it wanted a community but in order to get it to where it needed to be, it needed to become a product. There needed to be one direction instead of many. Linux was no longer about choice. It was about the masses. Unity had lost its soul.

Where its all going

For now, it looks like Canonical has folded on client to double down on server. That was a very difficult decision to make and I applaud the leadership for their courage and for their ability to recognise what was becoming sunk cost. Its sad for the developers, product people and community involved, but at the end of the day, you have to make money.

Free software on the desktop might enter a dark age soon. There aren’t very many companies in this space now.  Intel was out as of halfway through last year. Google is focused on Android and Chrome OS. Novell and Sun folded a long time ago. Nokia fell off a burning platform. IBM is nowhere to be found. And now Canonical is pivoting away too. Samsung’s Tizen is potentially an interesting player, but everyone knows its a plan B.  Red Hat and SUSE remain and so does Endless. An of course there is still Collabora, Codethink, Igalia and other free software contracting firms, though (edit) their participation depends on the re-use of our technology by larger players outside the desktop space. There’s room to grow, but there has certainly been a lot of disruptive change lately.

(edit 10/04/2017: This section got a little more traffic than I was expecting and I wrote it quite quickly. I’ve made some changes to better reflect the state of affairs based on feedback. I’ve replaced “their participation depends on the larger players” with “their participation depends on re-use of our technology by larger players outside the desktop space” and clarified that its not all doom and gloom! Just that there has been a lot of change lately and things aren’t pointing in the most golden direction right now)

A new hope

Lets not forget where the desktop free software revolution came from. It came from ordinary people, you and I, who wanted to make a difference. It came from people who didn’t want to accept the Microsoft and Apple duopoly on computing. Even if the resources start to dry up now, it doesn’t mean that free software is gone forever. Some of the best innovations came during the post-bubble period of 2000-2010 where the software world had become stagnant with Windows XP and IE6.

There will always be people who want to do something different. A community will form again. And once everyone has a clearer head, free software will rise again.


When you relied on us to grow your population and economy, your birthright to exclude the people they loved too ceased.

When you wanted to spread your culture and cultural norms beyond your border, your birthright to exclude the people who accepted the change and allowed that culture to spread ceased.

When you wanted to mine our land for minerals, your birthright to exclude the workers and their families ceased.

When you wanted to give your sons and daughters the opportunity to marry the person they loved, your birthright to exclude their mothers, fathers, family and friends ceased.

When you wanted to rely on foreign talents to build your intellectual capital and hoard knowledge your birthright to exclude those who would give you that knowledge ceased.

When you wanted to rely on us to do the jobs nobody else wanted to do because they paid terribly and had awful working conditions, your birthright to exclude the workers and their community ceased.

When you wanted to rely on us to sell you goods and services at prices which are grossly unfair to us, your birthright to exclude the workers who toiled for hours to make those goods for you ceased.

When you wanted to rely on us to teach you our language and culture so that you could make your military intelligence even better, your birthright to exclude the people who carried that language and culture through the ages ceased.

When you bombed us, forcing millions of us to fear for our lives and flee, when you created instability on our home town because of your own agenda, when we all had to walk for months and pack ourselves into over-full boats and buses not knowing where we were going or if we were even going to live to make another home for ourselves, your birthright to exclude us ceased.