Optimizing backporting collateral evolutions
In November, 2005 the first paper to my knowledge was published that coined the term of "collateral evolutions". This paper, authored by Yoann Padioleau Gilles Muller, and our very own kernel hacker: Julia L. Lawall, formalized what collateral evolutions are with emphasis on Linux kernel development: software evolutions on Linux kernel APIs which require respective Linux device driver code updates. You likely had not heard of the term "collateral evolution" but you likely have seen patches by Julia already and if her name rings a bell you likely are associating it with some odd patch commit log messages which look like this:
@@ identifier ret; expression e,e1,e2,e3,e4,x; @@
( if (\(ret != 0\|ret < 0\) || ...) { ... return ...; } | ret = 0 ) ... when != ret = e1 *x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...); ... when != x = e2 when != ret = e3 *if (x == NULL || ...) { ... when != ret = e4 * return ret; } //
This is all in SmPL (Semantic Patch Language) and the research team who wrote the paper above designed the language specifically to describe collateral evolutions in Linux kernel development. The paper states that about 70% of operating system software consists of device drivers and that 30% of software updates to a Linux kernel consists of addressing collateral evolutions on the Linux kernel. The paper provides some case studies on collateral evolutions, and proposing to work on "Coccinelle" an engine to enable developers evolve the Linux kernel by optimizing on how to express collateral evolutions and applying them.
Its year 2012 and by now there is likely no subsystem in the Linux kernel that has not taken advantage of SmPL. In fact we're a bit far beyond that too in terms of reasearch and how we can apply and use SmPL to evolve the Linux kernel. I'm not going to get into the technical details, instead I'll only provide references for you to do homework if you are interested in the topic but I will provide the conclusions I have and empirical implications that I am observing and foresee for us in Linux kernel development. My interest with SmPL was heightened when trying to move drivers out of the staging area of the Linux kernel and seeing how broadly you could use SmPL. The biggest use case I saw potential for SmPL though was for some work I was focusing on for a while: automatically backporting the Linux kernel.
Everyone and their mother backports code, but they tend to historically have tried backporting their own things: their own driver, their own stacks, and at times for their own Linux distributions and only for a set of supported Linux kernels, and at times forking the Linux kernel and never merging things back upstream. For a while now I have taken a slightly different approach to backporting: backport all drivers on a subsytem, backporting for all Linux distributions with priority to all known supported Linux kernel releases, share as much code as possible, and always prioritize upstream. Turns out that what you have to end up backporting are collateral evolutions. The same SmPL that could be used to help you evolve the Linux kernel with one SmPL patch could therefore in theory be used as a reversed SmPL patch to backport that same collateral evolution. As per review with Julia, this is theoretically possible, and although we have quite a few Linux kernel developers using SmPL to write collateral evolutions of the Linux kernel, not all evolutions are written this way and we should not assume we could convince everyone to do so. Furthermore the learning curve for learning SmPL is steep, its not easy to learn it.
At the 2011 Linux Plumbers conference I met Julia for the first time and at this conference Julia revealed to me the holy grail to the big picture of collateral evolutions and what we'd need to use it for backporting given that not everyone can be expected to write collateral evolutions using SmPL: spdiff. Julia explained to me Jesper Andersen wrote this utility to help you generate SmPL provided you give it two patches which illustrated a collateral evolution. The implications are huge for Linux kernel development. You don't really need to learn SmPL to write SmPL. You don't even need to learn SmPL to evolve a collateral evolution for the Linux kernel for all subystems which require changes within it. It also means that if you backport a collateral evolution for two drivers in the Linux kernel you could in theory backport that collateral evolution for all other Linux kernel drivers.
If true -- and I plan on proving this over time -- the implications are significant for us. It means we can indeed focus more on advancing the Linux kernel faster than ever before but at the same time having to worry less when we need to backport a collateral evolution onto older kernels. As I will explain in later blog posts though not all things are easily backported though -- but fortunately a lot of things are, and with a bit of work and realization of how we can help optimize code to help backport we may be able to backport code even faster over time. If we also evolve the Linux kernel with SmPL with new collateral evolutions it may also mean (although this would need to be proven) less bugs caused by collateral evolutions and obviously also less bugs in backporting as the backports themselves would be the direct inverse of the expressed SmPL.
Comments