Regression Testing for crashes on Google Test

One common use for automated testing is to put tests in place that reproduce bugs to ensure that the same kind of bug can’t happen again once the failing test is fixed.

There is a small caveat when doing this with bugs that result in segfaults however – which is that a test that fails-by-crashing will cause the entire test binary to halt as well, meaning that you won’t see the results of any tests afterwards. Its also difficult to see what it was in the test itself that crashed at a glance without using a debugger of some sort.

Thankfully, google-test provides a method to test for things-that-shouldn’t-crash – by forking and checking the exit code of the forked process.

So my new best practice for writing these sorts of tests is now:

using ::testing::ExitedWithCode;

TEST(FooDeathTest, FooPtrFromBarIsSafe)
{
    std::unique_ptr <Foo> foo (from_bar ());

    EXPECT_EXIT({
        foo->baz = 1;
        std::cerr << "Success!" << std::endl;
        exit (0);
    }, ExitedWithCode (0), "Success");
}

Instead of:

TEST(Foo, FooPtrFromBarIsSafe)
{
    std::unique_ptr <Foo> foo (from_bar ());
    foo->baz = 1; // won't crash if foo is valid
}

There are three benefits to doing it this way:

Whoever is reading the test source can instantly see that the thing you’re testing is that some operation succeeds and exits cleanly

This is especially the case when EXPECT_EXIT ({ }, ExitedWithCode (0), “”) becomes somewhat of an idiom.

You can test the inverse – that is – you can test that something will crash with the same style:

using ::testing::ExitedWithCode;

TEST(FooDeathTest, FooPtrFromBarIsNotSafe)
{
    std::unique_ptr <Foo> foo (from_bar ());

    EXPECT_DEATH({
        foo->baz = 1;
        exit (0);
    }, "");
}

You get a much more informative error message at a first glance:

[RUN    ] FooDeathTest.FooPtrFromBarIsSafe
Failed expectation (lineno):
    Expected that the block:
    {
        foo->baz = 1;
        std::cerr << "Success!" << std::endl;
        exit (0);
    }
    Would exit with the message "Success" and an exit code of 0;
Actual:
    Exited with no message and an exit code of 9
[    FAIL] FooDeathTest.FooPtrFromBarIsSafe
(other tests run here)

As opposed to:

[RUN    ] Foo.FooPtrFromBarIsSafe
Segmentation Fault
(no other tests run here)

Impostor

I’ve been getting a bit of impostor syndrome lately, something which I feel is holding me back a fair bit. Not sure what it is, but it seems like every time I try to move on a moment of inspiration, I start to second-guess. Is what I’m doing a useful endeavor? Maybe someone else has already done this? Am I good enough to write this yet?

Of course, its terribly annoying – I’ve been left with a bunch of half-finished projects for a broader initiative, which seems to create a nasty feedback loop confirming the sense of unworthiness.

Perhaps this is the wrong place to ask this – but does anybody have a good idea on how to make these thoughts go away? I’d much rather be presenting cool things in this space than saying that I’ve talked myself out of yet another idea.

Moving forward

In the past few months I’ve been involved in a bit of a public disagreement about my relationship with the Ubuntu project, namely in terms of some work I was doing on Compiz and Unity back in December to February.

While I believed that my statements represented the truth at the time, in hindsight, I believe they were more or less a product of my frustration. I wanted to publicly apologize for that, and share a clarified timeline and sequence of events now that I’ve had a chat with a few of the people involved so set the record straight, and I also want to share about the plan that we’ve come up with to move forward.

I blogged in December that I had come up with a set of changes for both Compiz and Unity to take advantage of the new GLX_EXT_buffer_age extension. Importantly, these changes were substantial, required change across three key areas of the stack and required testing on several different hardware and driver combinations.

I had posted a set of changes up for review around that time, but the review process didn’t happen for a number of largely unrelated reasons. First of all, it was around the time of the December holidays, so people were just starting to take breaks. Second of all, there were a few staff movements within Canonical which meant that the number of reviewers available for that work was lower than it normally would have been.

In late February the decision was taken by the team that manages the Ubuntu distribution to branch Compiz for raring and only accept minor changes. I was consulted on that decision and agreed to it. I’ve been told that the rest of the team at Canonical working on Unity was also informed of this. I was unhappy at the time that no review of my changes occurred given the two month period, but I also accepted that we were close to feature freeze and it wouldn’t make sense to have any large changes in place then. At this point, I decided to stop working on those changes, instead with a view to pick them up again for 13.10.

Around the same time, a discussion about a rolling release took place on the public mailing list and at vUDS.

I posted in various comments that I was disappointed with the lack of review over the past two months, attributing it to the confusion caused by the rolling release and the apparently evident preference for a closed development model. These comments were mistaken. The real cause appears simply to be that there was a staff shortage, and not enough was done by me to draw the staff’s attention to these proposals.

The rolling release proposal said that this model should be adopted “now”, and many people on the list agreed. Adopting the “rolling release” for 13.04 would mean two things:

  1. 13.04 would become the “rolling release” and would not be “released” at its usual release point.
  2. With no release of 13.04, there would also be no feature freeze.

Whilst we were operating under that assumption and whilst that discussion was ongoing, I was told that my changes were likely to be merged in at some point in the near future. As such, the review process started again in spite of the previous agreement to keep large changes out of raring. This was because both the reviewers and myself believed that these changes were now appropriate for the release. My main error here was failing to contact the distribution team to check if there had been a misunderstanding.

In the meantime, we faced a number of high and critical regressions stemming from a small fix for java and a workaround for a hang in virtualbox. This didn’t do a whole lot for confidence for anybody. It just so happened that the regressions we kept on having were in parts of the code that were difficult to get any kind of automated testing around.

The review process ended again a week or two afterwards, citing that the release team was unhappy with having those changes for raring, because they would present a risk by breaking feature freeze for the 13.04 release, which was now happening. I was disheartened that this was the case and stated that on the merge review.

To be honest, I think my behavior here wasn’t very constructive. The better approach would have been to engage properly with the distribution team to devise proper test plans and QA strategies, and to clarify with everyone what the true status of all pending work was. That being said, things finally came to a head today. The distribution team and I have had a good long talk about everything, and we’ve come up with a plan to move forward and past this mess. The stupid conflict and a war of words by myself was entirely unnecessary in order to come to this resolution.

I want to make it clear that these difficulties were the fault of nobody. Nobody was negligent, nobody was inconsiderate (except perhaps myself) and nobody was acting in bad faith. These difficulties were the product of circumstance, which didn’t apparently manifest themselves in that way when I was looking at them.

It is anticipated that the relevant changes are going to be slated for review and testing for 13.10 at the beginning of the cycle. The test plan is going to include the usual test suite + ensuring that we don’t have any autopilot failures, as well as some extensive user testing to try and uncover any subtle bugs that may have been missed during the time that this work was QA’d.

Right now, I encourage everyone to test the PPA that Trevino has put up which contain the most up to date version of these changes sync’d with trunk for both nux, unity and compiz. Test it on both nvidia and non-nvidia hardware, since the changes affect both. File (tagged) bugs and I’ll get on to them when I can.

Lets make 13.10 rock!

Integrating vera++ with CMake

I’ve been looking into making automated code style checks part of a broader QA plan for a project I’ve been working on while its still relatively young. Astyle seems to be good at handling lots of different languages but there isn’t any mode for it to error out when it sees something in the code that it thinks is wrong. I like the idea of automated formatting, but sometimes for complicated expressions and macros, automated formatters can get things wrong. I wanted something that required manual intervention, but could still be run on an automated basis. And specifically designed for C++.

I noticed that Mir was using vera++ in order to do something like this, but there wasn’t any kind of automated build rule for running these checks on a continuous basis. I wanted to ensure that we could run checks on the files that were actually built so that style failures were errors actionable when the target was built. I also wanted to ensure that we didn’t do style checks on things like autogenerated or imported code that would be difficult to exclude using a find rule.

Finally, I realized that having these checks integrated with the buildsystem could be useful for lots of other projects too, so I wanted to make it something separate from what I was working on already.

As such, I’ve just posted the first bits of veracpp-cmake on my GitHub profile. veracpp-cmake is basically just two CMake macros to integrate vera++ into your build. First of all, it provides a FindVeraPP macro so that you can use it with find_package. It also provides some useful commands.

verapp_import_default_rules_into_subdirectory_on_target
verapp_import_default_transformations_into_subdirectory_on_target
verapp_import_default_profiles_into_subdirectory_on_target

Will import the default rules, transformations and profiles installed with vera++ into a project subdirectory (usually in the build-dir) just before some target is executed. This means that you don’t have to fork the default vera++ rules and can instead import them dynamically from the user’s installation. All these three commands are just wrappers around

verapp_copy_files_in_dir_to_subdir_on_target

Which provides a rule to copy files from on subdirectory to another just before a target is executed. Using this rule, you can copy your own rules, profiles or transformations into the same subdirectory as the default ones at build-time and then use both depending on which profile is set. Vera++ requires that all the rules, etc be in the same subdirectory of the same directory tree (eg, scripts / rules/ | transformations/, profiles /).

verapp_profile_check_source_files_conformance

This function does a lot of the grunt-work in terms of checking source code file compliance. You just provide it with a target, the path to vera++ scripts, a profile and whether or not style check failures should be fatal or nonfatal errors. Then, just before your target is linked, it will have its source files scanned for style errors. That way, this integrates nicely with IDEs which parse the output of make.

Hopefully this will be useful to other projects. You can import it as a git submodule and then adjust your CMAKE_MODULE_PATH and use it right away. The only caveat is that you need to both add it as a subdirectory with add_subdirectory () and also include it with include () because we need to build a small tool to set the exit status correctly when vera++ exits. I’ve filed a bug about this and hopefully I can get rid of this requirement soon.

On Publishing Private Correspondence

There’s been a fairly proximate drama (which I won’t identify) related to the graphics world, and a lot of it has developed into posting logs of correspondence in public forums.

Initially, it started as posting log snippets of public IRC channels in public fora such as on mailing lists, news forums and publicly accessible webpages. This irked me a little, because while I think that the people who were posting that data were making an important point and needed evidence to do so, publishing chat logs is frowned upon by the Freenode channel guidelines. Those guidelines read:

If you’re considering publishing channel logs, think it through. The freenode network is an interactive environment. Even on public channels, most users don’t weigh their comments with the idea that they’ll be enshrined in perpetuity. For that reason, few participants publish logs.

If you just want to publish a single conversation, be careful to get permission from each participant. Provide as much context as you can. Avoid the temptation to publish or distribute logs without permission in order to portray someone in a bad light. The reputation you save will most likely be your own.

There isn’t anything legally wrong with publishing information obtained from a public fora, just like there isn’t anything legally wrong with recording or photographing and publishing public places. However, its an ethically dubious area. Freenode is not logged by default, and the channel guidelines recommend that where channels are logged, participants should be informed of this fact:

If you’re publishing logs on an ongoing basis, your channel topic should reflect that fact.

The difference between a public fora that is archived and a public fora that is not archived is subtle but important. In the former, it is assumed that participants are interacting in a way that makes their interaction publicly accessible even to those not in the fora. In the latter, the assumption goes as far as making the interaction accessible to those present. In the two fora, people are comfortable communicating different kinds of things. Publishing logs after the fact breaks the assumption in the latter case.

I think the situation got worse once members were taking to forums to publish logs of private conversations on IRC, not in any public channel. To me – this was certainly crossing a line. While I understand that the participants publishing the logs were again making a perfectly valid point, the point in question had already been made and could have been made just with hearsay evidence. I should have spoken up about it earlier, but kept quiet, partially because I’m guilty of doing this myself (but only in a private context).

What really motivated me to write this post is that a prominent journalist within the community published a series of private email correspondence relating to this ongoing drama. While the journalist in question was illustrating procurement of a scheme which I believe is unethical, the journalist in question really should have known better than to publish that private correspondence. Both participants in that conversation had an expectation that the correspondence was private and off the record. The journalist broke that expectation.

Publishing private and off-the-record information is sometimes justifiable. This might be the case where publishing the information would serve as a great public benefit, and raise awareness of a place where the public may be placed at a vulnerability as a result of another person’s actions. However, what distinguishes this case from those is that no actions had been taken to put the public at a special disadvantage. The public didn’t have a right to know, because knowing would not put them in a better position.

To those who are thinking of posting any more private correspondence in public, just don’t. Don’t do it. The point that you wish to make can be made in so many other ways, and it can easily be made by other reputable sources backing up your claim. In addition, by posting this information in public, especially where the information was conveyed with a reasonable expectation of confidentiality, you are exposing yourself to liability in Tort or Equity depending on your jurisdiction. That means that a person can bring an action against you for publishing this information, and in some cases, can be awarded monetary damages.

To the journalist who posted private correspondence in public – Good luck getting anybody to talk to you off the record again. Your credibility is damaged, and it will take a substantial amount of justification for it to be repaired.

Thinking about Code Review in Free Software

Code review can be a bit of a recipe for drama. There was a large-ish amount of drama in a close project quite recently that stemmed from patch review, and it got me thinking about how we handle this in free software.

In free software code review, along with other practices that we call “agile practices” (such as continuous integration, unit testing, behavior driven design, test driven development) is a relatively new thing in some projects, especially those on the desktop stack.

Code review tends to fall into an intersection which might otherwise be called an “HR-nightmare”. There are lots of competing factors which can make things very dramatic.

  1. The person submitting the patch has invested some time and effort into it.
  2. The reviewer’s responsibility is mediating the change proposed by the patch and the quality of the code as a whole.
  3. People disagree on the best way to do things, and sometimes these differences are fundamental and irreconcilable.
  4. People have differing views on what the purpose of review is, and what priorities should be given to reviews.
  5. Reviews are often seen as a “chore” and a necessary part of project management.

Beck and Mezaros have used the terms “Code Smells” and “Test Smells” to describe whats wrong with the non-agile way of doing both of those things, perhaps its time we coined the term “Review Smells” for looking at how we can do review better? Though maybe not. Maybe it would be good to look at what makes for a good review, and how we as a community can do review better.

Reviews aren’t for rejecting what’s bad, but growing what’s good

In most (legacy) projects, code review generally starts up because a manager heard about this new-fangled thing called “agile development” and how code review along with other  agile practices would do amazing things like reduce technical debt and improve quality, allowing engineers to be even more efficient, which in turn means that your organization can cut ballooning costs and not increase resources so much. Managers say “we’ve had enough of this, we’re not allowing any more crap in the codebase, so we’re introducing code review”.

While parts of this are certainly true, its not really the right way to start reviews. If you want to stop crappy code from going into the codebase, then you don’t hire engineers who don’t know what they’re doing. Of course, in free software this isn’t really an option.

Generally I live my life by the mantra “people don’t think they’re evil”. If someone proposes a change to a project, they generally think they’re trying to make it better. And generally speaking – they are, whether or not that be adding a new spec’d out feature, or fixing a bug or cleaning up some of the code.

This of course, doesn’t mean that you just accept the change because all changes are amazing. The point is that no change is perfect, but the job of the reviewers is to mentor the one proposing the change to make it the best they possibly can. Good reviewers ask questions and provide suggestions on:

  1. How can we make this change, and prevent regressions in other areas?
  2. How can we ensure this change is well tested, so that it can’t accidentally be stomped on in future?
  3. How can we make the code even clearer to everyone who uses the project than it is now?
  4. How can we make this code even faster than it is now?

Those kinds of questions are the kinds of questions that promote healthy discussion and help to both the reviewee and the reviewer to learn new things in the process. Its often the case that in reviews like this, both parties will come up with a solution that was even better than either one of them could have done alone. Its an environment that promotes collaboration and rewards both the reviewer and the reviewee.

It also means that the quality of your codebase will improve moreso than if the policy is to just reject things that don’t meet the standards. Having a policy of saying “no” to anything you don’t like without providing mentorship might mean that bugs never get fixed, or that specs never get completed, because nobody wants to go through that process only to run the very high risk of just being turned down again.

Keep code reviews about code, and RFCs for specs

I’ve seen it many times before – someone proposes a patch to change the behavior of a system and the patch gets rejected because the system wasn’t meant to behave that way in the first place. That’s a fair call for the maintainers – the scope of the software needs to remain determinate, as does it’s specified behavior.

The best thing to do in this case is document exactly how your system is supposed to work, even for the bits that haven’t been completed yet.

Then review becomes a two-step process – first, contributors propose an RFC to change the proposed behavior, get that added to the specification, and then they propose the code to make that specification a reality.

No wasted time writing patches that get turned down because of the unwanted change in behavior  Clearer expectations for everyone involved.

Use a centralized review system

Many free software projects use the model of “patches on a mailing list”. This works for small-scale projects and small-scale patches with a small number of maintainers, because the patches just flow in with the rest of their email. It gets really out of hand for large projects. Here are some of the problems with using email to manage patches:

  1. The email filibuster can kill pretty much anything: The huge problem with mailing lists is that they invite endless discussion, and email is not very good at keeping context. Stuff can be discussed endlessly, and its often not about the code
  2. Keeping track of multiple patches is a pain: Email doesn’t provide you a centralized list of unmerged patches. Its just all over the place in your inbox. Better hope that someone tagged it with [PATCH]
  3. Making changes to patches is a pain and also slow: If you want to make a change to a patch on a mailing list, you have to rebase all of your patches in your vcs checkout, and then you have to undo a bunch of commits and re-do all the commits. Then you have to mail the new patches to the list and go through the review process all over again, with all of the original context lost in history. Granted, tools like quilt make the first part of this a little easier, but not the second part.

There are so many tools out there for keeping track of patches and reviews nowadays. There are the project hosts like GitHub and Launchpad which provide integrated code review based on the merge-model, or there are tools you can host yourself like patchwork, reviewboard, gerrit and if you don’t mind paying, proprietary tools like Crucible from Atlassian.

All these tools take the pain out of patch-management. The developer just hacks away on their own clone of the repo in their own branch, pushes stuff to that branch and then when ready, proposes a “merge” of that branch into mainline. Most tools allow you to make comments directly on specific parts of the code, and automatically update diffs as soon as new changes are made.

Automate!

There is so much about patch review that is totally boring. Nobody likes hand-inspecting a piece of code to make sure it fits all the formatting and style conventions, making sure that it has adequate test coverage, making sure that it doesn’t have any bugs that could be found by static analysis.

The good news is that most of this stuff can be automated. At Canonical we ran the same continuous-integration job on every active merge proposal, which, at least in the form that I worked with it, checked that the branch in it’s current state could be:

  1. Merged
  2. Builds
  3. Passes all tests
  4. Installs correctly
  5. Passes any runtime tests

You can do so much more with continuous integration too. You can also check that the code matches the style conventions (StyleCop, astyle). Furthermore, you can do some rudimentary static analysis with clang’s scan-build tool. You can check if all the various #ifdef combinations build correctly. You can check for performance regressions by having some standardized performance tests. In code, more stats about how your change affects the codebase are king, and serve to inform reviews rather than make them do guesswork about how changes in code might affect your product. That being said, metrics shouldn’t drive review, but rather inform it. A goal of review should be to understand why the metrics say what they say, whether or not that’s important, and then use that to determine where to go next with the patch.

Apply the “better-in-than-out” principle

The thing about perfect is that its impossible. The question to any good review is “would the proposed change in its current state be something which we’d rather ship tomorrow as opposed to what trunk is today?”. If so, then know where to put the boundaries of what the scope of the review is and call it a day. All software is a work-in-progress. If you know where to go after the patch is merged, then there’s no sense delaying it any longer when it could be serving a good purpose now.

Set a review deadline

One of the things that can absolutely kill a patch review and leave lots of patches lying around everywhere is a review that goes for a seemingly endless period of time. Especially in free software, people get frustrated, people get distracted, and then they move on, leaving the maintainer wondering what to do with the patch that still had some follow up left.

Review deadlines really help here. Evaluate the complexity of the change, think about what it affects and what work needs to be done at first instance, and then set a deadline based on that. Have both parties engage with the review process until that date, and then apply the better-in-than-out principle at that date, or if new circumstances arise, renegotiate the deadline. Having clear expectations about where a patch is going and how long a review is going to take will take away a real source of demotivation amongst contributors.

Make it mandatory for everyone

There’s nothing worse than a project where some people are treated as better than others. Make code review mandatory for everyone, including for that person who has been working on it for ten years. Not only will they learn new things they might not have thought of from fresh blood in the project, but it also instills a sense of both responsibility and equality in new contributors too, because they feel as though everyone is on an equal footing and not at some kind of prejudicial disadvantage by virtue of the fact that they are new.

This isn’t an exhaustive list of all the things that will make code review a rewarding process as opposed to a dramatic one, but it certainly makes up some of the major factors that I’ve found in code review processes that are functional as opposed to dysfunctional.