Integrating vera++ with CMake

I’ve been looking into making automated code style checks part of a broader QA plan for a project I’ve been working on while its still relatively young. Astyle seems to be good at handling lots of different languages but there isn’t any mode for it to error out when it sees something in the code that it thinks is wrong. I like the idea of automated formatting, but sometimes for complicated expressions and macros, automated formatters can get things wrong. I wanted something that required manual intervention, but could still be run on an automated basis. And specifically designed for C++.

I noticed that Mir was using vera++ in order to do something like this, but there wasn’t any kind of automated build rule for running these checks on a continuous basis. I wanted to ensure that we could run checks on the files that were actually built so that style failures were errors actionable when the target was built. I also wanted to ensure that we didn’t do style checks on things like autogenerated or imported code that would be difficult to exclude using a find rule.

Finally, I realized that having these checks integrated with the buildsystem could be useful for lots of other projects too, so I wanted to make it something separate from what I was working on already.

As such, I’ve just posted the first bits of veracpp-cmake on my GitHub profile. veracpp-cmake is basically just two CMake macros to integrate vera++ into your build. First of all, it provides a FindVeraPP macro so that you can use it with find_package. It also provides some useful commands.

verapp_import_default_rules_into_subdirectory_on_target
verapp_import_default_transformations_into_subdirectory_on_target
verapp_import_default_profiles_into_subdirectory_on_target

Will import the default rules, transformations and profiles installed with vera++ into a project subdirectory (usually in the build-dir) just before some target is executed. This means that you don’t have to fork the default vera++ rules and can instead import them dynamically from the user’s installation. All these three commands are just wrappers around

verapp_copy_files_in_dir_to_subdir_on_target

Which provides a rule to copy files from on subdirectory to another just before a target is executed. Using this rule, you can copy your own rules, profiles or transformations into the same subdirectory as the default ones at build-time and then use both depending on which profile is set. Vera++ requires that all the rules, etc be in the same subdirectory of the same directory tree (eg, scripts / rules/ | transformations/, profiles /).

verapp_profile_check_source_files_conformance

This function does a lot of the grunt-work in terms of checking source code file compliance. You just provide it with a target, the path to vera++ scripts, a profile and whether or not style check failures should be fatal or nonfatal errors. Then, just before your target is linked, it will have its source files scanned for style errors. That way, this integrates nicely with IDEs which parse the output of make.

Hopefully this will be useful to other projects. You can import it as a git submodule and then adjust your CMAKE_MODULE_PATH and use it right away. The only caveat is that you need to both add it as a subdirectory with add_subdirectory () and also include it with include () because we need to build a small tool to set the exit status correctly when vera++ exits. I’ve filed a bug about this and hopefully I can get rid of this requirement soon.

On Publishing Private Correspondence

There’s been a fairly proximate drama (which I won’t identify) related to the graphics world, and a lot of it has developed into posting logs of correspondence in public forums.

Initially, it started as posting log snippets of public IRC channels in public fora such as on mailing lists, news forums and publicly accessible webpages. This irked me a little, because while I think that the people who were posting that data were making an important point and needed evidence to do so, publishing chat logs is frowned upon by the Freenode channel guidelines. Those guidelines read:

If you’re considering publishing channel logs, think it through. The freenode network is an interactive environment. Even on public channels, most users don’t weigh their comments with the idea that they’ll be enshrined in perpetuity. For that reason, few participants publish logs.

If you just want to publish a single conversation, be careful to get permission from each participant. Provide as much context as you can. Avoid the temptation to publish or distribute logs without permission in order to portray someone in a bad light. The reputation you save will most likely be your own.

There isn’t anything legally wrong with publishing information obtained from a public fora, just like there isn’t anything legally wrong with recording or photographing and publishing public places. However, its an ethically dubious area. Freenode is not logged by default, and the channel guidelines recommend that where channels are logged, participants should be informed of this fact:

If you’re publishing logs on an ongoing basis, your channel topic should reflect that fact.

The difference between a public fora that is archived and a public fora that is not archived is subtle but important. In the former, it is assumed that participants are interacting in a way that makes their interaction publicly accessible even to those not in the fora. In the latter, the assumption goes as far as making the interaction accessible to those present. In the two fora, people are comfortable communicating different kinds of things. Publishing logs after the fact breaks the assumption in the latter case.

I think the situation got worse once members were taking to forums to publish logs of private conversations on IRC, not in any public channel. To me – this was certainly crossing a line. While I understand that the participants publishing the logs were again making a perfectly valid point, the point in question had already been made and could have been made just with hearsay evidence. I should have spoken up about it earlier, but kept quiet, partially because I’m guilty of doing this myself (but only in a private context).

What really motivated me to write this post is that a prominent journalist within the community published a series of private email correspondence relating to this ongoing drama. While the journalist in question was illustrating procurement of a scheme which I believe is unethical, the journalist in question really should have known better than to publish that private correspondence. Both participants in that conversation had an expectation that the correspondence was private and off the record. The journalist broke that expectation.

Publishing private and off-the-record information is sometimes justifiable. This might be the case where publishing the information would serve as a great public benefit, and raise awareness of a place where the public may be placed at a vulnerability as a result of another person’s actions. However, what distinguishes this case from those is that no actions had been taken to put the public at a special disadvantage. The public didn’t have a right to know, because knowing would not put them in a better position.

To those who are thinking of posting any more private correspondence in public, just don’t. Don’t do it. The point that you wish to make can be made in so many other ways, and it can easily be made by other reputable sources backing up your claim. In addition, by posting this information in public, especially where the information was conveyed with a reasonable expectation of confidentiality, you are exposing yourself to liability in Tort or Equity depending on your jurisdiction. That means that a person can bring an action against you for publishing this information, and in some cases, can be awarded monetary damages.

To the journalist who posted private correspondence in public – Good luck getting anybody to talk to you off the record again. Your credibility is damaged, and it will take a substantial amount of justification for it to be repaired.

Thinking about Code Review in Free Software

Code review can be a bit of a recipe for drama. There was a large-ish amount of drama in a close project quite recently that stemmed from patch review, and it got me thinking about how we handle this in free software.

In free software code review, along with other practices that we call “agile practices” (such as continuous integration, unit testing, behavior driven design, test driven development) is a relatively new thing in some projects, especially those on the desktop stack.

Code review tends to fall into an intersection which might otherwise be called an “HR-nightmare”. There are lots of competing factors which can make things very dramatic.

  1. The person submitting the patch has invested some time and effort into it.
  2. The reviewer’s responsibility is mediating the change proposed by the patch and the quality of the code as a whole.
  3. People disagree on the best way to do things, and sometimes these differences are fundamental and irreconcilable.
  4. People have differing views on what the purpose of review is, and what priorities should be given to reviews.
  5. Reviews are often seen as a “chore” and a necessary part of project management.

Beck and Mezaros have used the terms “Code Smells” and “Test Smells” to describe whats wrong with the non-agile way of doing both of those things, perhaps its time we coined the term “Review Smells” for looking at how we can do review better? Though maybe not. Maybe it would be good to look at what makes for a good review, and how we as a community can do review better.

Reviews aren’t for rejecting what’s bad, but growing what’s good

In most (legacy) projects, code review generally starts up because a manager heard about this new-fangled thing called “agile development” and how code review along with other  agile practices would do amazing things like reduce technical debt and improve quality, allowing engineers to be even more efficient, which in turn means that your organization can cut ballooning costs and not increase resources so much. Managers say “we’ve had enough of this, we’re not allowing any more crap in the codebase, so we’re introducing code review”.

While parts of this are certainly true, its not really the right way to start reviews. If you want to stop crappy code from going into the codebase, then you don’t hire engineers who don’t know what they’re doing. Of course, in free software this isn’t really an option.

Generally I live my life by the mantra “people don’t think they’re evil”. If someone proposes a change to a project, they generally think they’re trying to make it better. And generally speaking – they are, whether or not that be adding a new spec’d out feature, or fixing a bug or cleaning up some of the code.

This of course, doesn’t mean that you just accept the change because all changes are amazing. The point is that no change is perfect, but the job of the reviewers is to mentor the one proposing the change to make it the best they possibly can. Good reviewers ask questions and provide suggestions on:

  1. How can we make this change, and prevent regressions in other areas?
  2. How can we ensure this change is well tested, so that it can’t accidentally be stomped on in future?
  3. How can we make the code even clearer to everyone who uses the project than it is now?
  4. How can we make this code even faster than it is now?

Those kinds of questions are the kinds of questions that promote healthy discussion and help to both the reviewee and the reviewer to learn new things in the process. Its often the case that in reviews like this, both parties will come up with a solution that was even better than either one of them could have done alone. Its an environment that promotes collaboration and rewards both the reviewer and the reviewee.

It also means that the quality of your codebase will improve moreso than if the policy is to just reject things that don’t meet the standards. Having a policy of saying “no” to anything you don’t like without providing mentorship might mean that bugs never get fixed, or that specs never get completed, because nobody wants to go through that process only to run the very high risk of just being turned down again.

Keep code reviews about code, and RFCs for specs

I’ve seen it many times before – someone proposes a patch to change the behavior of a system and the patch gets rejected because the system wasn’t meant to behave that way in the first place. That’s a fair call for the maintainers – the scope of the software needs to remain determinate, as does it’s specified behavior.

The best thing to do in this case is document exactly how your system is supposed to work, even for the bits that haven’t been completed yet.

Then review becomes a two-step process – first, contributors propose an RFC to change the proposed behavior, get that added to the specification, and then they propose the code to make that specification a reality.

No wasted time writing patches that get turned down because of the unwanted change in behavior  Clearer expectations for everyone involved.

Use a centralized review system

Many free software projects use the model of “patches on a mailing list”. This works for small-scale projects and small-scale patches with a small number of maintainers, because the patches just flow in with the rest of their email. It gets really out of hand for large projects. Here are some of the problems with using email to manage patches:

  1. The email filibuster can kill pretty much anything: The huge problem with mailing lists is that they invite endless discussion, and email is not very good at keeping context. Stuff can be discussed endlessly, and its often not about the code
  2. Keeping track of multiple patches is a pain: Email doesn’t provide you a centralized list of unmerged patches. Its just all over the place in your inbox. Better hope that someone tagged it with [PATCH]
  3. Making changes to patches is a pain and also slow: If you want to make a change to a patch on a mailing list, you have to rebase all of your patches in your vcs checkout, and then you have to undo a bunch of commits and re-do all the commits. Then you have to mail the new patches to the list and go through the review process all over again, with all of the original context lost in history. Granted, tools like quilt make the first part of this a little easier, but not the second part.

There are so many tools out there for keeping track of patches and reviews nowadays. There are the project hosts like GitHub and Launchpad which provide integrated code review based on the merge-model, or there are tools you can host yourself like patchwork, reviewboard, gerrit and if you don’t mind paying, proprietary tools like Crucible from Atlassian.

All these tools take the pain out of patch-management. The developer just hacks away on their own clone of the repo in their own branch, pushes stuff to that branch and then when ready, proposes a “merge” of that branch into mainline. Most tools allow you to make comments directly on specific parts of the code, and automatically update diffs as soon as new changes are made.

Automate!

There is so much about patch review that is totally boring. Nobody likes hand-inspecting a piece of code to make sure it fits all the formatting and style conventions, making sure that it has adequate test coverage, making sure that it doesn’t have any bugs that could be found by static analysis.

The good news is that most of this stuff can be automated. At Canonical we ran the same continuous-integration job on every active merge proposal, which, at least in the form that I worked with it, checked that the branch in it’s current state could be:

  1. Merged
  2. Builds
  3. Passes all tests
  4. Installs correctly
  5. Passes any runtime tests

You can do so much more with continuous integration too. You can also check that the code matches the style conventions (StyleCop, astyle). Furthermore, you can do some rudimentary static analysis with clang’s scan-build tool. You can check if all the various #ifdef combinations build correctly. You can check for performance regressions by having some standardized performance tests. In code, more stats about how your change affects the codebase are king, and serve to inform reviews rather than make them do guesswork about how changes in code might affect your product. That being said, metrics shouldn’t drive review, but rather inform it. A goal of review should be to understand why the metrics say what they say, whether or not that’s important, and then use that to determine where to go next with the patch.

Apply the “better-in-than-out” principle

The thing about perfect is that its impossible. The question to any good review is “would the proposed change in its current state be something which we’d rather ship tomorrow as opposed to what trunk is today?”. If so, then know where to put the boundaries of what the scope of the review is and call it a day. All software is a work-in-progress. If you know where to go after the patch is merged, then there’s no sense delaying it any longer when it could be serving a good purpose now.

Set a review deadline

One of the things that can absolutely kill a patch review and leave lots of patches lying around everywhere is a review that goes for a seemingly endless period of time. Especially in free software, people get frustrated, people get distracted, and then they move on, leaving the maintainer wondering what to do with the patch that still had some follow up left.

Review deadlines really help here. Evaluate the complexity of the change, think about what it affects and what work needs to be done at first instance, and then set a deadline based on that. Have both parties engage with the review process until that date, and then apply the better-in-than-out principle at that date, or if new circumstances arise, renegotiate the deadline. Having clear expectations about where a patch is going and how long a review is going to take will take away a real source of demotivation amongst contributors.

Make it mandatory for everyone

There’s nothing worse than a project where some people are treated as better than others. Make code review mandatory for everyone, including for that person who has been working on it for ten years. Not only will they learn new things they might not have thought of from fresh blood in the project, but it also instills a sense of both responsibility and equality in new contributors too, because they feel as though everyone is on an equal footing and not at some kind of prejudicial disadvantage by virtue of the fact that they are new.

This isn’t an exhaustive list of all the things that will make code review a rewarding process as opposed to a dramatic one, but it certainly makes up some of the major factors that I’ve found in code review processes that are functional as opposed to dysfunctional.

Delivering Free Software to the Masses

Its only somewhat recently (probably within the last fifteen years, starting with Darwin at Apple) that organizations with large marketing budgets have tried to take free and open source software and deliver it to mass amounts of people. This aligns fairly well with the overall goal of free and open source software, which is mostly a Darwinian one – to show that free software is the best kind of software, and will come out on top as the majority and dominant ideology in the clash between free software and non-free software.

The question for a lot of organizations is … okay, this is great, but how?

You see, there’s a trend I’ve noticed and I haven’t quite been able to put a name to it. I wrote a paper on it last semester for my senior communications project, although I kind of wish I spent a little more time devoted to this particular topic that I’ve been thinking of.

And then this blog post from Martin Owens came along which got me thinking about it again.

My theory is this. As the breadth of the audience that you are trying to reach goes up, your community participation and involvement goes down. And then if it really is that this is the rule rather than the exception, then I think it is impossible to achieve the goals of “deliver free software to the masses” and “make it built entirely be a passionate and active community of volunteers” at the same time.

I call this the “Apple / Google / Canonical” Model of Open Source or Free Software. Darwin and Android are still Open Source projects. Ubuntu Touch is Free Software. But its a different kind of Free Software than say, GNOME or KDE. As far as most people should be concerned, Open Source and Free Software has kinda already won the Darwinist struggle. Android makes up the majority of the smartphone market, Apple is growing a huge amount in the desktop market. iOS has Darwin at its core.

Its a bit of a letdown though, and somewhat anti-climatic, because all of the community who were so vocal and passionate about free and open source software were not really involved in any of this to a great extent.

Sounds pretty cynical and I wish it wasn’t true, and I’ve been looking for ways to make it untrue, and I’m sure every company in this position has as well. I mean, what company in their best shareholder’s interests wouldn’t want free no-strings-attached labor to achieve their goals? Well, maybe there are strings attached.

Somewhere, at some point on the line, societal forces caused the community to stop becoming an “asset” and start becoming a “risk”. The kind of body required to sustain such a massive distribution effort in such a complex market can’t be a charity forever. Somebody has to make money or at least break even, otherwise its just not worth it. Here are some risks that the “community” might start posing if you’re a big company:

  • No accountability mechanism: When you are a company spending money on a project, having inputs to that project that you can’t fire and cut off is a kind of scary prospect. It really messes up your broader business plan if your engineering department comes back and tells you that they can’t really predict the future of your codebase because they don’t have full political control over it. Sure, if someone keeps on screwing up, you can just cut off their commit access and ignore their patches, but what are the PR repercussions of doing that? And then what if you have zero room for risk because your system is continuously deployed? Who takes the blame when someone else screws up and your company distributes that software? Who are people going to sue when things go wrong?
  • Trade secrets and competition can’t also be transparent and open: One of the most effective ways to compete with someone is to take them by surprise. Do lots of R&D to figure out what their shortcomings are, come up with something and then right when they don’t expect it, boom, sweep the market from underneath their feet. This is how every company ever is successful. If you want to reach a mass market and unseat the people who already have the lion’s share of the pie, this is how you do it. But then how do you involve your community? Well, you don’t. You can’t just be open and transparent about it unless you’re okay with your competitive effort being largely ineffective. You can go the “get people to sign an NDA” approach, but even that has its risks. One screw up by that other person and you’ve lost a huge competitive opportunity. And then your legal recourse is pretty weak – maybe you can get an injunction but there are jurisdictional issues and by the time you get one its already too late. Maybe you can sue that person, although they’re probably not going to be worth as much as the potential opportunity you’ve just lost. Better not to be open about anything
  • People don’t care about politics: Sad, but true. Most people don’t know that the world runs on open source and free software. People don’t know that they’re probably holding something which is largely comprised of open source software. To them, its just a phone, it runs apps, it looks cool and cost me $0 on a 24 month contract. I’m sure people would like to care about the politics, but we just don’t have time to do that. Its basically the whole reason why we have representative government in most countries – politics is hard, and I can’t care about everything because I have a job to get to. The best way to get free software in the hands of everyone is by creating products that work in our existing market system. Politics just won’t work. And it might even turn some people off.
  • Community can’t keep pace with business developments: In a business  you work 9-5. You have mailing lists internal to the business  You talk to people face-to-face and figure out the next solutions there. Businesses just don’t have time to keep consulting with their community counterparts, and the community has a job to go to every day and needs to focus on their family or education. They aren’t in the same room. The only way to fix this problem is to put them in the same room, and then pay them so that they don’t have to worry about going to some other job. Then everyone can work on free software! Wait, nevermind.

So after all of this, I’m really not surprised that I keep on seeing this trend appear over and over again. I want someone to prove me wrong, but I suspect its going to be a very long time before that ever happens.

The good news is that it appears spreading free software to the masses really is an attainable goal. The bad news is that its not really going to be the spirit that we all expected. But I think some trade-offs just need to be accepted.

Mir Display Server

Congratulations to Canonical on the release of the Mir Display Server. Mir will be an important step for them in delivering the converged Ubuntu product and I wish them all the best.

It isn’t something I’ll be able to work on in the future though. I’m trying my best to focus on my studies for now, and I don’t want to get sucked into it.