So after stating that I was thinking about what comes "after convergence" I rambled on a bit about social networks and the cloud before veering wildly towards applications written as sets of processes, or multi-process architectures (MPA). I wrote about the common benefits of MPA and then the common downsides of them. Before trying to pull this full-circle back to the original starting point all those blog entries ago, I need to say a few things about opportunities presented by MPA systems that are rarely taken advantage of and why that is.
I don't want to give the impression that I think I've come up with the ideas in this blog entry. Most already exist in production somewhere in the world out there. There are certainly great academic papers to be found on each of the topics below. It just seems to be a set of topics that is not commonly known, thought about or discussed, particularly outside the server room. So nothing here is new, though I suspect at least some of the ideas might be new to many of the readers.
Before getting into the topics at hand,
I'd also like to give a nod to the Erlang
community. When I started toying about with the questions that inspired this line of though, I went looking for solutions that already existed. There almost always is someone, somewhere working on a particular way to approach a given problem. This is the beauty of a global Internet with so many people trained to work with computers. While looking around at various things, Erlang caught my eye as it is designed for radical MPA applications. While messing around with Erlang I began to understand just how desperately little most applications were getting out of the MPA pattern, and it helped illuminate new possible paths of exploration to tough challenges we face such as actually getting the most out of "convergence" when the reality is people have multiple devices (if only because each person has their own device). It's pretty amazing what many of the people in that community are up to .. for instance:
May as well start with this one since I posted that video above; that way I can pretend it was a clever segue instead of just nerd porn. ;) Typically when we think of upgrading an application, we think of upgrading all of it. We also generally expect to have to restart the application to get the benefits of those upgrades. Our package managers on Linux often tell us that explicitly, in fact.
There are exceptions to this. The most obvious example is plugins: you can change a plugin on disk and, at least on next start, you get new functionality without touching the rest of the installed application components. When the plugins are scripted, it becomes possible to do this at runtime.
Generally, however, applications are upgraded monolithically. This is in large part because componentization, MPA or not, is not as common as it ought to be. It's also in part because components rely on interfaces and too many developers lack the discipline to keep those interfaces consistent over time. (The web is the worst for this, but every time I see Firefox checking plugins after a minor version upgrade I shake my head.) It also doesn't fit very nicely into the software delivery systems we typically use, all of which tend to be built around the application level in granularity.
Those are all solvable issues. What is less obvious is how to handle code reloading at runtime when the system is in active use. When upgrading a single component while the application is running, how does one separate it safely from the rest of the running code to perform an in-place update of the code to be executed?
With an MPA approach the answer seems fairly straight-forward: bring up a new process, switch to using the new process, kill the old one. Since the processes are already separated from each other, upgrading one component on the fly ought to be possible without fiddling with the rest of the application. It's less straight-forward in practice, however, as one needs to handle any tasks the old process is still processing, any pending messages sitting in the old process' queue (or whatever is being used for message passing) and handing over all messaging connections to the new process. It is possible though, as the video above demonstrates, though even in Erlang where this was designed into the system one needs to approach with a measure of thoughtfulness.
This is an uncommon feature because it is not easy to get right and pretty well all the existing systems out there lack any support for it whatsoever. However, if we want applications that run "forever", then this needs to become part of our toolbox. Given that the Linux kernel people have been working on addressing this issue (granted, for them restarting the "application" means a device restart which is even more drastic than restarting a single application), our applications ought to catch up.
Most MPA applications spin up processes and let them sit there until the process is finished being useful. For things like web and file servers using this design, the definition of "useful" is pretty obvious: when the client connection drops and/or the client request is done being handled, tear down the process.
Even then there is a fly in the ointment: spinning up a process can take more time than one wants and so perhaps it makes sense to just have a bunch of processes sitting there laying in wait to process incoming requests one after the other. Having just started apache2 on my laptop to check that it does what I remember it doing, indeed I have a half-dozen httpd-prefork processes sitting around doing nothing but wasting resources. The documentation says
: "Apache always tries to maintain several spare or idle server processes, which stand ready to serve incoming requests. In this way, clients do not need to wait for a new child processes to be forked before their requests can be served."
Keeping processes around can also make it a lot easier when creating an MPA structure: spin up everything you need and let it sit around. This is a great way to avoid complex "partial ready" states in the application.
With multiple processes, one would hope it would be possible simply hibernate a specific process on demand. That way it exists, for all intents and purposes, but isn't using resources. This keeps the application state simple and can avoid at least some of the process spin-up costs. Unfortunately, this is really, really hard to do with 100% fidelity. People have tried to add this kind of feature to the Linux kernel, but it never has made it in. One of the biggest challenges in handling files and sockets sensibly during hibernation.
This article on deploying node.js applications with systemd
is interesting in its own rights, but also shows that people really want this kind of functionality. Of course, the article is just showing a way to stop and start a process based on request activity which isn't quite the same thing at all.
Surprise: Erlang actually has this covered
. It isn't a magical silver bullet and you don't want to use it with processes that are under constant usage (it has overhead; this kind of feature necessarily always will), but it works reliably and indeed can help with resource usage.
Every time I look at all those processes in Akonadi just sitting there when they know they won't be doing a thing until the next sync or mail check I despair just a little. Every time I notice that Chrome has a process running for those tabs showing static content, chewing up 15-30 MB of memory each, I wish process hibernation was common.
If you look through the code of Akonadi resources, such as the imap resource in kdepim-runtime/resources/imap, one will quickly notice that they are really rather big monolithic applications. The imap resource spins up numerous finite state machines (though they often aren't implemented as "true" FSMs, but as procedural code that is setting and checking state-value-bearing variables everywhere) and handles large number of asynchronous jobs within its monolith. As a result it is just over ten thousand lines of code, and that isn't even counting the imap or the Akonadi/KDE generic resources libraries it uses. That's a large chunk of code doing a lot of things.
That's kind of odd. Akonadi is a MPA application, but its components are traditional single-process job multiplexers. This is through no fault or shortcoming of Akonadi or its developers. In fact, I feel kind of bad for referencing Akonadi so often in these entries because it is actually an extremely well done piece of software that has matured nicely by this point. It's just one of the few MPA applications written for the desktop in wide usage that one can't blame these kinds of warts on anything other than the underlying language and frameworks ... it's because Akonadi is good that I keep bringing it up as an example, as it highlights the limits of what is possible with the way we do things right now. Ok, enough "mea culpa" to the Akonadi team ... ;)
It would be very cool if the complex imap resource was itself an MPA system. State handling would dramatically simplify and due to this I am pretty sure a significant percentage of those 10,000+ lines of code would just vanish. (I took another drift through the code base of the imap resource this morning while the little one had a nap to check on these things. :)
Additionally, if it was "radically" MPA then a lot of the defensive programming could simply be dropped. Defensive programming is what most of us are quite used to: call a function, check for all the error states we can think of and respond accordingly and only then handle the "good" cases. This is problematic as humans are pretty bad at thinking of all possible error states when systems move beyond "trivial" in complexity. With radically MPA systems, however, each job can be handled by a separate process and should something other than success happens it can simply halt. No state needs to be preserved or reset; at most the other processes that are waiting for news back on how the job went may want to be informed that something went sideways so they can also either halt or continue on. (Yes, logging, too.) This not only makes the code much easier for humans to write, as one only needs to write for what is expected rather than try and list all things that are not unexpected, but makes the code base radically smaller as most of the "if (failureCondition)" branches that pepper our code today simply melt away.
This doesn't happen because processes are expensive and creating frameworks that can handle such radical MPA systems are hard to write and few pre-made ones exist.
With any MPA system, but particularly radically MPA applications, the opportunity for process supervision arises. Supervision is when one process watches other processes and decides their fate: when to start them, when to stop them, when (and how!) to restart them on failure.
I first became interested in the possibilities for system robustness due to supervision when systems such as systemd started coming together. systemd, however, falls wildly short of the possibilities. For what is perhaps the definitive example of supervision one ought to look to Erlang.
In Erlang applications, which are encouraged to be MPA, one defines a supervision tree. Not only can you have individual processes supervised for failure (e.g.), but you can have nested trees of supervisors each with a policy to follow when processes are created and when they fail. You can, for instance, tell the supervisor that a particular group of processes all rely on each other and so should one fail they should all fail and be restarted. You can define timeouts for responsiveness, how many times to restart processes and such things.
This allows one to define the state of the full set of processes in a robust manner, from process execution through to finality. This is a key to robust, long-lived appliations.
With the MPA model it is trivial to spread a task out across multiple machines. This is extremely common in the server room when it comes to large applications and as such is quite well understood. There are large numbers of libraries and frameworks that make this easier, from message queueing systems to service discovery mechanisms. The desirability of this is quite obvious: finishing large tasks quickly is often beyond the reach of individual machines. So put a bunch of them together and figure out how to make them work together on problems. Voila, problem solved. (Writing blogs is fun: you can make the amazingly difficult and complex appear in a puff of magic smoke with a simple "voila" .. ;)
Outside the server room this is hardly ever used, however. MPA systems aimed at non-server workloads tend to assume they all run on the same system. Well, we live in a different world than we did twenty years ago.
People have multiple rather powerful computers with them. Right now I have my laptop, a tablet, two ARM devices and a smartphone. The printer sitting next to me isn't very powerful, but it runs a full modern OS as well.
We also routinely use services that exist on other machines, but instead of letting processes coordinate as we would if they were local we tend to blast data around in bulk or create complex protocols that allow the local machine to dictate to the remote machine what we'd like it to do for us. The protocols tend to result in multiple layers of indirection on the remote side: JSON over HTTP hits a web server which forwards the request to a bit of scripted code that interacts with the server to construct some sort of response which it then encodes into JSON and sends back over HTTP to be reconstituted on the local side ... and should the client ever want something ever so slightly different, too bad.
This makes using two end-user devices together really rather difficult. The byzantine client-server architecture ensures that it is non-trivial to do simple things and that significant pieces of software must be run on any devices one wishes to coordinate. People are aware of the annoyances: Apple has Handoff, KDE has KDE Connect, .. but what happens if I'm running Akonadi (that guy again!) on my Plasma Desktop system and I'd like it to be able to find results from data on my Plasma Active tablet? Yeah, not simple to solve .. unless Akonadi could run agents remotely as transparently as it can locally. Which would mean more complexity in the Akonadi code base, and every other MPA app that might benefit from this. As it is additional functionality and probably nobody has yet hit this use case, Akonadi is not capable of it and the use case is likely to never be fulfilled once people run into it.
It is the combination of thinking with blinders on ("JSON over HTTP.. merp merp!") and the difficulty level of "remote processes are the same as local processes" that prevents this from happening in places we could benefit from.
This goes hand-in-hand with the remote process possibility, but takes it up a notch. In addition to having remote processes, how amazing would it be to be able to migrate processes between machines? This also has been done; it's a pretty common pattern in big multi-agent systems from what I understand. This carries a lot of the same requirements and challenges of process hibernation, and probably would only be possible with specially designated processes. Security is another concern, but not an insurmountable obstacle.
Along with lack of remote processes, this is probably the sole reason it is so hard to transfer the picture from your phone to your laptop: byzantine structures must exist on all systems to do the simplest of things, and every time we wish to do a new simple thing that byzantine structure needs an upgrade, typically on both sides. How awful. This is probably also my cue to moan about Bluetooth, but this blog entry is already long enough.
Many MPA applications already get a security leg up by having their applications separated by the operating system. They can't poke around at each other's memory locations, crashing doesn't take down the whole system, etc. This is really just a few tips of what is really a monumental iceberg, however.
It ought to be possible to put each process in its own container with its own runtime restrictions. The Linux middleware people working on things like kdbus and systemd along with the plethora of containerization projects out there are racing to bring this to individual applications. MPA applications ought to take advantage of these things to keep attack vectors down while allowing each process in the swarm access to all the functionality it needs.
Combined with radical MPA, this could be a significant boost: imagine all external input (e.g. from the network) being processed into safe, internal data structures in a process that runs in its own security context. Should something nasty happen, like a crash, it has access to nothing useful.
OpenSSH already went in this direction many years ago with privelege separation, but with modern containerization we could take this to a whole new level that would be of interest to many more applications.
Again, it means having the right frameworks available to make this easy to do.
... in conclusion
So we've now seen the common benefits of MPA, the common downsides of MPA and finally some more exotic possibilities which are rarely taken advantage of but would be very beneficial as common patterns.
Next we will look at how to bring this all together and attempt to describe a more perfect system which avoids the negatives and offers as many of the positives as possible.
The ultimate goal is completely robust applications with improved security and flexibility that can work more naturally (from a human's perspective) in multi-device environments and, perhaps, allow us to start tackling the more thorny issues of privacy and social computing.