493 stories
·
16 followers

Docker security with SELinux (Opensource.com)

1 Share
Dan Walsh looks at container security, on Opensource.com. "I hear and read about a lot of people assuming that Docker containers actually sandbox applications—meaning they can run random applications on their system as root with Docker. They believe Docker containers will actually protect their host system [...] system... Stop assuming that Docker and the Linux kernel protect you from malware."
Read the whole story
Share this story
Delete

What are useful online tools for Linux

1 Share

As you know, GNU Linux is much more than just an OS. There is literally a whole sphere on the Internet dedicated to the penguin OS. If you read this post, you are probably inclined towards reading about Linux online. Among all the pages that you can find on the subject, there are a couple […]
Continue reading...

The post What are useful online tools for Linux appeared first on Xmodulo.

Read the whole story
Share this story
Delete

green processes: multiple process architectures for everyone

1 Comment and 2 Shares
I've been rambling-writing for a couple weeks now about multi-process architecture (MPA) as seen in applications such as Akonadi or Chrome. After covering both the up- and down-sides, I also wrote a bit about missed opportunities: things that should be possible with MPA but which are rarely seen in practice. I kept promising to write about my thoughts on solutions to the downsides and missed opportunities, and that's where we are now.

The answer is deceptively simple: green processes. These are like green threads, which a few people actually mentioned in the comments sections of those past blog entries. Like green threads, green processes are mimics of native processes but run in a virtual machine (VM) instead. Some of the benefits are similar: you can have far less overhead per process (Erlang's per-process overhead is just 309 words) and you can launch thousands of them even if you only have one CPU core to work with without grinding they system to a halt. More usefully, if you have a 4 core system the VM can schedule N processes in exactly 4 native threads making full utilization of the system while minimizing things like context switch costs. (That does trivialize the complexity of the task by summarizing it in a single sentence ... )

Unlike green threads, however, green processes provide separation of memory access and require message passing rather than shared memory objects. They also more naturally follow the failure semantics of processes, such as the ability to crash. That may seem like a trivial set of differences, but the implications are hugely significant.

For instance, with crashing processes you can limit (and in many cases simply get rid of entirely) defensive programming. This requires a suitable language to go with the VM, but with such a thing in hand code can be written that only works in known-good (or white-listed) states. Everything else causes a crash, and this gets propagated appropriately through the system.

That system can include process supervisors which can restart processes or kill related ones in response to failures. It should also be noted that one still wants exceptions as those are for a rather different use case: non-fatal errors with known causes. (Essentially white-listed undesired behaviors.)

With message passing being the only way to coordinate between processes, you also are a tiny step away from distributed applications. After all, it doesn't look any different to the process in the VM if the message came from a process in the same VM or from a process elsewhere. So you can now split up your application between multiple VMs on the same system or on completely different systems connected by a network.

This could even be done at runtime or as a configuration option. Consider a system where one process needs to use unsafe native code (more on that later) and the rest simply do their thing safely in the VM. That one process could be moved to a separate VM (complete with process supervision) to keep it away from the rest of the processes. If the native code takes down that process or the native code leads to a security vulnerability, you now have process separation at the native OS level. If that process only causes problems on some systems (say, a graphical app being run on systems with a particular vendors particularly crashtastic GPU drivers), then it could be moved out of process only such systems and kept in-process otherwise.

Due to the lightweight nature of green processes one can afford to break an application up into a massively concurrent system. Each bit of I/O can live in its own process, for instance. Spawning a process for each socket or file operation becomes reasonable. Synchronous finite state machines (FSM), very useful things that, when not forced into asynchronous behavior due to event loops, are very simple to write and manage, become second nature.

A VM with green processes ends up resolving, or at least noticeably improving on, nearly all the downsides of native processes while also making most of the missed opportunities low-hanging fruit, sometimes even for free (as seen with distributing processes). What impact could this have?

Let's look at Akonadi. It could be a hundred processes instead of the two dozen it currently is on my system (I have a lot of accounts ...), making the code much simpler in the process (no more async job and async FSM implementations) while having dramatically lower overhead. Not only would the per-process cost be lower, even with message passing the cost would be truly zero as objects (not just serialized memory buffers!) are passed using copy-on-write (COW) and/or reference counted storage. On smaller systems with only one or two cores, the entire application would automatically "collapse" down to a single-threaded application that would be nearly identical in nature (and therefore also in performance) to a purposefully single-threaded, single-process, async event driven application. This is a new avenue for scalablility. Another one would be in how remote Akonadi processes would come alone for free; want to run Akonadi on a server but access the data locally? Want to spread your Akonadi processes across a network of systems? No problem. Meanwhile, all the benefits of the multi-process system it currently enjoys, including robustness, would be retained.

Or how about Plasma. Every DataEngine, every model and every widget could live in its own process in the VM. Since it looks like one process to the native OS (because it is), the actual UI could be delegated to a specific thread and even on x.org it could paint all day even if a DataEngine or other component were to die. Installing a new Plasmoid, even one with compiled code, could be hot-reloaded without restarting the desktop shell. KRunner could be sucked into the main plasma-shell process without risking stability or robustness, shrinking the overall footprint of the system. In fact, allof the separate processes could be pulled into the same VM while retaining allof the current benefits, including re-use on other systems that don't want plasma-shell.

It isn't all picnic baskets and fluffy white clouds, however.

Running natively compiled C/C++/ASM code in your VM will crash the entire VM when that code segfaults. So if one were to use Qt, for example, from such a VM any crash in Qt would bring down the whole thing. The benefits are only offered to code running in the VM. Perhaps using multiple VMs for just those parts of the system would be a solution in some cases. Given the message passing of such a system, it is possible to even run native code applications external to the VM and have them appear inside the VM as any other local-to-the-VM process (to other processes in VM, anyways) by grafting the message passing protocol onto that native application.

C/C++ isn't particularly suited to being run in such a VM, at least not with today's tools. So either new languages are needed or we need new tools. Or both. A new language for such a VM would not be the worst of ideas, actually: look how QML has vastly improved life in the UI world. Similarly, a properly designed language could make application development much safer, much easier and much faster. (To those who live-and-breath Python/Ruby/Perl/etc. that's not news, I'm sure.) That does mean that some amount of application code would need to be rewritten to take advantage of such a VM. This is probably desirable in any case since most applications will need rethinking in terms of massive concurrency. If the VM supports easily binding in existing native code, this could even be done in stages. We know that can work as this is exactly how Plasma was transitioned from imperative QGraphicsView code to declarative QML code. Still, there would be a transition period.

Finally, such a VM is not trivial to do correctly. It ends up becoming a micro-operating-system in its own right with memory management and process schedulers. Care needs to be paid to how messages are passed to keep overhead low there, as well. Making such a thing would be a significant investment. One could almost certainly start with existing art, but a lot of work would still be required.

Speaking of existing art: as I was considering the "after convergence" questions earlier this year, I ended up looking around for systems already tackling the problems I saw related to MPA and I came across Erlang which does nearly all of the above. So if you are interested in seeing what this can look like in practice, I definitely recommend looking into it. It is not a perfect system, but it is about the only one I could find that wasn't an academic project that addressed these issues head-on. I think the original designers of the runtime were way ahead of their time.

Erlang itself is probably not 'the answer', however. It is an amazing system that is just fantastic for server side work. (On that note, I think it is a much better fit for Akonadi than C++ ever could be, or for Bodega than node.js ever could be.) However, it lacks some of the features I'd really like to see in such a system such as cross-VM process migration (non-trivial to implement properly, granted), it is not well suited to numeric processing (not an issue for most data-centric servers) and has some legacy issues related to things like strings (though that has come leaps and bounds in recent years as well). I don't care about syntax warts as they aren't that bad in Erlang and if you think they are then simply use the rather nicer Elixir that runs on the same VM.

However, a system designed specifically for the needs of client-side user interface code could be marvelous. Systems under the application would remain in C/C++ and the UI should be done declaratively (with e.g. QML), but the application could be something else entirely that makes writing the applications in between those two parts far easier and ...

... more importantly than easier development, gift them with super-powers current systems are unlikely to ever offer. Multi-process architecture, despite the inherent benefits like increased application robustness, is really just a stepping stone towards being able to do more interesting things.

With the premise of a massively-concurrent system with minimized costs and a consistent message passing system in place as axiomatic assumptions for future applications, we can begin to explore new possibilities that open up.

You see, after convergence, which allows us to forget that devices are somehow different from each other (without hobbling the user interface into stupidity in the process), we can start thinking about how to use all those devices as a single fabric, be they remote servers, you and your friend's phones, your phone and your boss' laptop .. be they located on the same or different network segments .. be they in a consistent or a dynamic set of states ..

Still, before plunging in that direction conceptually, there are useful insights remaining to be brought into this house of thought. Next I will be exploring methods of authentication.

Until then, consider this question: How many separate online accounts do you have?

(I bet almost everyone will get the answer to that one "wrong". *devilish smile*)
Read the whole story
jepler
2 days ago
reply
green processes = in-process sandboxing, and that's the model we decided *didn't* work.
Earth, Sol system, Western spiral arm
Share this story
Delete

opportunities presented by multi-process architectures

1 Share
So after stating that I was thinking about what comes "after convergence" I rambled on a bit about social networks and the cloud before veering wildly towards applications written as sets of processes, or multi-process architectures (MPA). I wrote about the common benefits of MPA and then the common downsides of them. Before trying to pull this full-circle back to the original starting point all those blog entries ago, I need to say a few things about opportunities presented by MPA systems that are rarely taken advantage of and why that is.

I don't want to give the impression that I think I've come up with the ideas in this blog entry. Most already exist in production somewhere in the world out there. There are certainly great academic papers to be found on each of the topics below. It just seems to be a set of topics that is not commonly known, thought about or discussed, particularly outside the server room. So nothing here is new, though I suspect at least some of the ideas might be new to many of the readers.

Before getting into the topics at hand, I'd also like to give a nod to the Erlang community. When I started toying about with the questions that inspired this line of though, I went looking for solutions that already existed. There almost always is someone, somewhere working on a particular way to approach a given problem. This is the beauty of a global Internet with so many people trained to work with computers. While looking around at various things, Erlang caught my eye as it is designed for radical MPA applications. While messing around with Erlang I began to understand just how desperately little most applications were getting out of the MPA pattern, and it helped illuminate new possible paths of exploration to tough challenges we face such as actually getting the most out of "convergence" when the reality is people have multiple devices (if only because each person has their own device). It's pretty amazing what many of the people in that community are up to .. for instance:


Hot Upgrades

May as well start with this one since I posted that video above; that way I can pretend it was a clever segue instead of just nerd porn. ;) Typically when we think of upgrading an application, we think of upgrading all of it. We also generally expect to have to restart the application to get the benefits of those upgrades. Our package managers on Linux often tell us that explicitly, in fact.

There are exceptions to this. The most obvious example is plugins: you can change a plugin on disk and, at least on next start, you get new functionality without touching the rest of the installed application components. When the plugins are scripted, it becomes possible to do this at runtime. 

In fact, sometime between Plasma version 4.2 and 4.4 I worked up a patch that reloaded widgets written in Javascript when they changed on disk. With that patch, the desktop could just continue on and pieces could be upgraded here or there. I never merged it into the mainline code as I wasn't convinced it would be a commonly useful thing and there was some overhead to it all.

Generally, however, applications are upgraded monolithically. This is in large part because componentization, MPA or not, is not as common as it ought to be. It's also in part because components rely on interfaces and too many developers lack the discipline to keep those interfaces consistent over time. (The web is the worst for this, but every time I see Firefox checking plugins after a minor version upgrade I shake my head.) It also doesn't fit very nicely into the software delivery systems we typically use, all of which tend to be built around the application level in granularity.

Those are all solvable issues. What is less obvious is how to handle code reloading at runtime when the system is in active use. When upgrading a single component while the application is running, how does one separate it safely from the rest of the running code to perform an in-place update of the code to be executed?

With an MPA approach the answer seems fairly straight-forward: bring up a new process, switch to using the new process, kill the old one. Since the processes are already separated from each other, upgrading one component on the fly ought to be possible without fiddling with the rest of the application. It's less straight-forward in practice, however, as one needs to handle any tasks the old process is still processing, any pending messages sitting in the old process' queue (or whatever is being used for message passing) and handing over all messaging connections to the new process. It is possible though, as the video above demonstrates, though even in Erlang where this was designed into the system one needs to approach with a measure of thoughtfulness.

This is an uncommon feature because it is not easy to get right and pretty well all the existing systems out there lack any support for it whatsoever. However, if we want applications that run "forever", then this needs to become part of our toolbox. Given that the Linux kernel people have been working on addressing this issue (granted, for them restarting the "application" means a device restart which is even more drastic than restarting a single application), our applications ought to catch up.

Process hibernation

Most MPA applications spin up processes and let them sit there until the process is finished being useful. For things like web and file servers using this design, the definition of "useful" is pretty obvious: when the client connection drops and/or the client request is done being handled, tear down the process. 

Even then there is a fly in the ointment: spinning up a process can take more time than one wants and so perhaps it makes sense to just have a bunch of processes sitting there laying in wait to process incoming requests one after the other. Having just started apache2 on my laptop to check that it does what I remember it doing, indeed I have a half-dozen httpd-prefork processes sitting around doing nothing but wasting resources. The documentation says: "Apache always tries to maintain several spare or idle server processes, which stand ready to serve incoming requests. In this way, clients do not need to wait for a new child processes to be forked before their requests can be served."

Keeping processes around can also make it a lot easier when creating an MPA structure: spin up everything you need and let it sit around. This is a great way to avoid complex "partial ready" states in the application.

With multiple processes, one would hope it would be possible simply hibernate a specific process on demand. That way it exists, for all intents and purposes, but isn't using resources. This keeps the application state simple and can avoid at least some of the process spin-up costs. Unfortunately, this is really, really hard to do with 100% fidelity. People have tried to add this kind of feature to the Linux kernel, but it never has made it in. One of the biggest challenges in handling files and sockets sensibly during hibernation.

This article on deploying node.js applications with systemd is interesting in its own rights, but also shows that people really want this kind of functionality. Of course, the article is just showing a way to stop and start a process based on request activity which isn't quite the same thing at all.

Surprise: Erlang actually has this covered. It isn't a magical silver bullet and you don't want to use it with processes that are under constant usage (it has overhead; this kind of feature necessarily always will), but it works reliably and indeed can help with resource usage.

Every time I look at all those processes in Akonadi just sitting there when they know they won't be doing a thing until the next sync or mail check I despair just a little. Every time I notice that Chrome has a process running for those tabs showing static content, chewing up 15-30 MB of memory each, I wish process hibernation was common.

Radical MPA

If you look through the code of Akonadi resources, such as the imap resource in kdepim-runtime/resources/imap, one will quickly notice that they are really rather big monolithic applications. The imap resource spins up numerous finite state machines (though they often aren't implemented as "true" FSMs, but as procedural code that is setting and checking state-value-bearing variables everywhere) and handles large number of asynchronous jobs within its monolith. As a result it is just over ten thousand lines of code, and that isn't even counting the imap or the Akonadi/KDE generic resources libraries it uses. That's a large chunk of code doing a lot of things.

That's kind of odd. Akonadi is a MPA application, but its components are traditional single-process job multiplexers. This is through no fault or shortcoming of Akonadi or its developers. In fact, I feel kind of bad for referencing Akonadi so often in these entries because it is actually an extremely well done piece of software that has matured nicely by this point. It's just one of the few MPA applications written for the desktop in wide usage that one can't blame these kinds of warts on anything other than the underlying language and frameworks ... it's because Akonadi is good that I keep bringing it up as an example, as it highlights the limits of what is possible with the way we do things right now. Ok, enough "mea culpa" to the Akonadi team ... ;)

It would be very cool if the complex imap resource was itself an MPA system. State handling would dramatically simplify and due to this I am pretty sure a significant percentage of those 10,000+ lines of code would just vanish. (I took another drift through the code base of the imap resource this morning while the little one had a nap to check on these things. :)

Additionally, if it was "radically" MPA then a lot of the defensive programming could simply be dropped. Defensive programming is what most of us are quite used to: call a function, check for all the error states we can think of and respond accordingly and only then handle the "good" cases. This is problematic as humans are pretty bad at thinking of all possible error states when systems move beyond "trivial" in complexity. With radically MPA systems, however, each job can be handled by a separate process and should something other than success happens it can simply halt. No state needs to be preserved or reset; at most the other processes that are waiting for news back on how the job went may want to be informed that something went sideways so they can also either halt or continue on. (Yes, logging, too.) This not only makes the code much easier for humans to write, as one only needs to write for what is expected rather than try and list all things that are not unexpected, but makes the code base radically smaller as most of the "if (failureCondition)" branches that pepper our code today simply melt away.

This doesn't happen because processes are expensive and creating frameworks that can handle such radical MPA systems are hard to write and few pre-made ones exist.

Supervision

With any MPA system, but particularly radically MPA applications, the opportunity for process supervision arises. Supervision is when one process watches other processes and decides their fate: when to start them, when to stop them, when (and how!) to restart them on failure.

I first became interested in the possibilities for system robustness due to supervision when systems such as systemd started coming together. systemd, however, falls wildly short of the possibilities. For what is perhaps the definitive example of supervision one ought to look to Erlang.

In Erlang applications, which are encouraged to be MPA, one defines a supervision tree. Not only can you have individual processes supervised for failure (e.g.), but you can have nested trees of supervisors each with a policy to follow when processes are created and when they fail. You can, for instance, tell the supervisor that a particular group of processes all rely on each other and so should one fail they should all fail and be restarted. You can define timeouts for responsiveness, how many times to restart processes and such things.

This allows one to define the state of the full set of processes in a robust manner, from process execution through to finality. This is a key to robust, long-lived appliations.

Processing non-locality

With the MPA model it is trivial to spread a task out across multiple machines. This is extremely common in the server room when it comes to large applications and as such is quite well understood. There are large numbers of libraries and frameworks that make this easier, from message queueing systems to service discovery mechanisms. The desirability of this is quite obvious: finishing large tasks quickly is often beyond the reach of individual machines. So put a bunch of them together and figure out how to make them work together on problems. Voila, problem solved. (Writing blogs is fun: you can make the amazingly difficult and complex appear in a puff of magic smoke with a simple "voila" .. ;)

Outside the server room this is hardly ever used, however. MPA systems aimed at non-server workloads tend to assume they all run on the same system. Well, we live in a different world than we did twenty years ago. 

People have multiple rather powerful computers with them. Right now I have my laptop, a tablet, two ARM devices and a smartphone. The printer sitting next to me isn't very powerful, but it runs a full modern OS as well.

We also routinely use services that exist on other machines, but instead of letting processes coordinate as we would if they were local we tend to blast data around in bulk or create complex protocols that allow the local machine to dictate to the remote machine what we'd like it to do for us. The protocols tend to result in multiple layers of indirection on the remote side: JSON over HTTP hits a web server which forwards the request to a bit of scripted code that interacts with the server to construct some sort of response which it then encodes into JSON and sends back over HTTP to be reconstituted on the local side ... and should the client ever want something ever so slightly different, too bad.

This makes using two end-user devices together really rather difficult. The byzantine client-server architecture ensures that it is non-trivial to do simple things and that significant pieces of software must be run on any devices one wishes to coordinate. People are aware of the annoyances: Apple has Handoff, KDE has KDE Connect, .. but what happens if I'm running Akonadi (that guy again!) on my Plasma Desktop system and I'd like it to be able to find results from data on my Plasma Active tablet? Yeah, not simple to solve .. unless Akonadi could run agents remotely as transparently as it can locally. Which would mean more complexity in the Akonadi code base, and every other MPA app that might benefit from this. As it is additional functionality and probably nobody has yet hit this use case, Akonadi is not capable of it and the use case is likely to never be fulfilled once people run into it.

It is the combination of thinking with blinders on ("JSON over HTTP.. merp merp!") and the difficulty level of "remote processes are the same as local processes" that prevents this from happening in places we could benefit from.

Process migration

This goes hand-in-hand with the remote process possibility, but takes it up a notch. In addition to having remote processes, how amazing would it be to be able to migrate processes between machines? This also has been done; it's a pretty common pattern in big multi-agent systems from what I understand. This carries a lot of the same requirements and challenges of process hibernation, and probably would only be possible with specially designated processes. Security is another concern, but not an insurmountable obstacle.

Along with lack of remote processes, this is probably the sole reason it is so hard to transfer the picture from your phone to your laptop: byzantine structures must exist on all systems to do the simplest of things, and every time we wish to do a new simple thing that byzantine structure needs an upgrade, typically on both sides. How awful. This is probably also my cue to moan about Bluetooth, but this blog entry is already long enough.

Security

Many MPA applications already get a security leg up by having their applications separated by the operating system. They can't poke around at each other's memory locations, crashing doesn't take down the whole system, etc. This is really just a few tips of what is really a monumental iceberg, however.

It ought to be possible to put each process in its own container with its own runtime restrictions. The Linux middleware people working on things like kdbus and systemd along with the plethora of containerization projects out there are racing to bring this to individual applications. MPA applications ought to take advantage of these things to keep attack vectors down while allowing each process in the swarm access to all the functionality it needs. 

Combined with radical MPA, this could be a significant boost: imagine all external input (e.g. from the network) being processed into safe, internal data structures in a process that runs in its own security context. Should something nasty happen, like a crash, it has access to nothing useful.

OpenSSH already went in this direction many years ago with privelege separation, but with modern containerization we could take this to a whole new level that would be of interest to many more applications.

Again, it means having the right frameworks available to make this easy to do.

... in conclusion

So we've now seen the common benefits of MPA, the common downsides of MPA and finally some more exotic possibilities which are rarely taken advantage of but would be very beneficial as common patterns.

Next we will look at how to bring this all together and attempt to describe a more perfect system which avoids the negatives and offers as many of the positives as possible.

The ultimate goal is completely robust applications with improved security and flexibility that can work more naturally (from a human's perspective) in multi-device environments and, perhaps, allow us to start tackling the more thorny issues of privacy and social computing.
Read the whole story
Share this story
Delete

Racing to writeness to wrongness leads

1 Share

In the Perl world I’m mostly known as a guy who hacks on Perl 6 stuff. Less known is that outside of the Perl world, I spend a lot of my time with the .Net platform. C#, despite a rather uninspiring initial couple of releases, has escaped Java-think and grown into a real multi-paradigm language. It’s not Perl, but it’s certainly not unpleasant, and can even be a good bit of fun to work with nowadays. My work with it right now typically involves teaching, along with various mentoring and trouble-shooting tasks.

The Windows world has always been rather into threads – at least as long as I’ve been around. .Net is also, as a result. Want to do some computation in your GUI app? Well, better farm it off to a thread, so the main thread can keep the UI responsive to the user’s needs. Want to do network I/O? Well, that could probably use some asynchronous programming – and the completion handler will be run on some thread or other. Then the results will probably want marshaling somehow. (That used to hurt; now things are better.) Building a web application? Better learn to love threads. You don’t actually get any choice in the matter: having multiple request-processing threads is the unspoken, unquestioned, default in a web application on .Net.

Of course, just because threads are almost ubiquitous doesn’t mean the average developer – or even the above-average developer – gets things right. A bunch of my recent trouble-shooting gigs have boiled down to dealing with a lack of understanding of multi-threaded programming. “So, we embed this 80s library in our web application, but things tend to crash under load.” “How do you deal with the fact that 80s library likely isn’t threadsafe?” “It…what?” “Oh, my…”

So anyway, back to Perl 6. Last year, we managed to get Rakudo on the JVM. And, given we now ran on a VM where folks deploy heavily threaded software every day, and with no particular progress to be seen on concurrency in Perl 6 for years, I did what I usually seem to end up doing: get fed up of the way things are and figured I should try to make them better. Having spent a bunch of years working with and teaching about parallel, asynchronous, and concurrent programming outside of the Perl world, it was time for worlds to collide.

And oh hell, collide they do. Let’s go word counting:

my %word_counts;
for @files -> $filename {
    for slurp($filename).words {
         %word_counts{$_}++;
    }
}

OK, so that’s the sequential implementation. But how about one that processes the files in parallel? Well, it seems there are a bunch of files, and that seems like a natural way to parallelize the work. So, here goes:

my %word_counts;
await do for @files -> $filename {
    start {
        for slurp($filename).words {
            %word_counts{$_}++;
        }
    }
}

Here, start creates a Promise, which is kept when the code inside of it completes. That work is scheduled to be done on the thread pool, and the code calling start continues onward, moving on to create another Promise for the next file. Soon enough, the thread pool’s input queue is nicely occupied with work, and threads are chugging through it. The loop is in a context that means it produces results - the Promise objects – thanks to our use of the do keyword. We give them to await, which waits for them all to get done. Perfect, right?

Well, not so fast. First of all, are hashes thread safe? That is, if I try to write to a hash from multiple threads, what will happen? Well, good question. And the answer, if you try this out on Rakudo on JVM today, is you’ll end up with a hosed hash, in all likelihood. OK. Go on. Say what you’re thinking. Here’s one guess at a common response: “But…but…but…OH NO WE MUST MAKE IT WORK, this is Perl, not Java, dammit!” Well, OK, OK, let’s try to make it work…

So the hash ain’t threadsafe. Let’s go put implicit locking in hash access. We’ll slow down everything for it, but maybe with biased locking it won’t be so bad. Maybe we can build a smart JIT that invalidates the JITted code when you start a thread. Maybe escape analysis will save the common cases, because we can prove that we’ll never share things. Maybe we can combine escape analysis and trace JIT! (Hey, anybody know if there’s a paper on that?) Heck, we gotta build smart optimizations to make Perl 6 perform anyway…

So anyway, a patch or two later and our hashes are now nicely thread safe. We’re good, right? Well, let’s run it and…ohhhh…wrong answer. Grr. Tssk. Why, oh why? Well, look at this:

%word_counts{$_}++;

What does the post-increment operator do? It reads a value out of a scalar, gets its successor, and shoves the result in the scalar. Two threads enter. Both read a 41. Both add 1. Both store 42. D’oh. So, how do we fix this? Hm. Well, maybe we could make ++ take a lock on the scalar. Now we’re really, really going to need some good optimization, if we ever want tight loops doing ++ to perform. Like, inlining and then lifting locks…if we can get away with it semantically. Or one of the tricks mentioned earlier. Anyway, let’s suppose we do it. Hmm. for good measure, maybe we’d better ponder some related cases.

%word_counts{$_} += 1;

Not idiomatic here, of course, but we can easily imagine other scenarios where we want something like this. So, we’d better make all the assignment meta-ops lock the target too…uh…and hold the lock during the invocation of the + operator. Heck, maybe we can not do locks in the spin-lock or mutex sense, but go with optimistic concurrency control, given + is pure and we can always retry it if it fails. So, fine, that’s the auto-increment and the assignment meta-ops sorted. But wait…what about this:

%word_counts{$_} = %word_counts{$_} + 1;

Well, uhh…darn. I dunno. Maybe we can figure something out here, because having that behave differently than the += case feels really darn weird. But let’s not get bogged down with side-problems, let’s get back to our original one. My hash is thread safe! My ++ is atomic, by locks, or some other technique. We’re good now, aren’t we?

Nope, still not. Why? Turns out, there’s a second data race on this line:

%word_counts{$_}++;

Why does this work when we never saw the word before? Auto-vivification, of course. We go to look up the current scalar to auto-increment it. But it doesn’t exist. So we create one, but we can’t install it unless we know it will be assigned; just looking for a key shouldn’t make it come to exist. So we put off the installation of the scalar in the hash until it’s assigned. So, two threads come upon the word “manatee”. Both go and ask the hash for the scalar under that key. Access to the hash is already protected, so the requests are serialized (in the one-after-the-other sense). The hash each time notices that there’s no scalar in that slot. It makes one, attached to it the fact that it should be stored into the hash if the scalar is assigned to, and hands it back. The ++ sees the undefined value in the scalar, and sticks a 1 in there. The assignment causes the scalar to be bound to the hash…uh…wait, that’s two scalars. We made two. So, we lose a word count. Manatees may end up appearing a little less popular than dugongs as a result.

hue-manatee

How do we fix this one? Well, that’s kinda tricky. At first, we might wonder if it’s not possible to just hold some lock on something for the whole line. But…how do we figure that out? Trying to work out a locking scheme for the general case of auto-viv – once we mix it with binding – feels really quite terrifying, as this REPL session reveals:

> my %animals; my $gerenuk := %animals<gerenuk>; say %animals.perl;
().hash
> $gerenuk = 'huh, what is one of those?'; say %animals.perl;
("gerenuk" => "huh, what is one of those?").hash

So, what’s my point in all of this? Simply, that locking is not just about thread safety, but also about the notion of transaction scope. Trying to implicitly lock stuff to ensure safe mutation on behalf of the programmer means you’ll achieve thread safety at a micro level. However, it’s very, very unlikely that will overlap with the unspoken and uncommunicated transaction scope the programmer had in mind – or didn’t even know they needed to have in mind. What achieving safety at the micro level will most certainly achieve, however, is increasing the time it takes for the programmer to discover the real problems in their program. If anything, we want such inevitably unreliable programs to reliably fail, not reliably pretend to work.

I got curious and googled for transaction scope inference, wondering if there is a body of work out there on trying to automatically figure these things out. My conclusion is that either it’s called something else, I’m crap at Google today, or I just created a thankless PhD topic for somebody. (If I did: I’m sorry. Really. :-) My hunch is that the latter is probably the case, though. Consider this one:

while @stuff {
    my $work = @stuff.pop;
    ...
}

Where should the implicit transaction go here? Well, it should take in the boolification of @stuff and the call to pop. So any such general analysis is clearly inter-statement, except that we don’t want to hard-code it for boolification and popping, so it’s interprocedural, but then method calls are late-bound, so it’s undecidable. Heck, it’d be that way even in boring Java. With Perl you can go meta-programming, and then even your method dispatch algorithm might be late bound.

At this point, we might ponder software transactional memory. That’s very much on-topic, and only serves to re-inforce my point: in STM, you’re given a mechanism to define your transaction scope:

my %word_counts;
await do for @files -> $filename {
    start {
        for slurp($filename).words {
            # THE FOLLOWING IS IMAGINARY SYNTAX. No, I can't 
            # hack you up a prototype down the pub, it's *hard*!
            atomic { %word_counts{$_}++ }
        }
    }
}

This looks very nice, but let’s talk about the hardware for a bit.

Yes, the hardware. The shiny multi-core thing we’re trying to exploit in all of this. The thing that really, really, really, hates on code that writes to shared memory. How so? It all comes down to caches. To make this concrete, we’ll consider the Intel i7. I’m going to handwave like mad, because I’m tired and my beer’s nearly finished, but if you want the gory details see this PDF. Each core has an Level 1 cache (actually, two: one for instructions and one for data). If the data we need is in it, great: we stall for just 4 cycles to get hold of it. The L1 cache is fast, but also kinda small (generally, memory that is fast needs more transistors per byte we store, meaning you can’t have that much of it). The second level cache – also per core – is larger. It’s a bit slower, but not too bad; you’ll wait about 10 cycles for it to give you the data. (Aside: modern performance programming is thus more about cache efficiency than it is about instruction count.) There’s then a
level 3 cache, which is shared between the cores. And here’s where things get really interesting.

As a baseline, a hit in the level 3 cache is around 40 cycles if the memory is unshared between cores. Let’s imagine I’m a CPU core wanting to write to memory at 0xDEADBEEF. I need to get exclusive access to that bit of memory in order to do so. That means before I can safely write it, I need to make sure that any other core with it in its caches (L1/L2) tosses what it knows, because that will be outdated after my write. If some other core shares it, the cost of obtaining the cache line from L3 goes up to around 65 cycles. But what if the other core has modified it? Then it’s around 75 cycles. From this, we can see that pretty much any write to shared memory, if another core was last to write, is going to be incurring a cost of around 75 cycles. Compare that to just several cycles for unshared memory.

So how does our approach to parallelizing our word count look in the light of this? Let’s take a look at it again:

my %word_counts;
await do for @files -> $filename {
    start {
        for slurp($filename).words {
            %word_counts{$_}++;
        }
    }
}

Locks are just memory, so if we inserted those automatically – even if we did work out a good way to do so – then taking the lock is a shared memory write. That’s before we go updating the memory associated with the hash table to install entries, and the memory of the scalars to update the counts. What if we STM it? Even if we keep modifications in a local modification buffer, we still have to commit at some point, and that’s going to have to be a write to shared memory. In fact, that’s the thing that bothers me about STM. It’s a really, really great mechanism – way superior to locks, composable, and I imagine not too hard to teach – but its reason for existing is to make writes to shared memory happen in a safe, transactional, way. And its those writes that the hardware makes costly. Anyway, I’m getting side-tracked again. The real point is that our naive parallelization of our program – even if we can find ways to make it work reliably – is a disaster when considered in the light of how the hardware works.

So, what to do? Here’s an alternative.

# Produce a word counts hash per file - totally unshared!
my @all_counts = await do for @files -> $filename {
    start {
        my %word_counts;
        for slurp($filename).words {
            %word_counts{$_}++;
        }
        %word_counts
    }
}

# Bring them together into a single result.
my %totals;
for @all_counts {
    %totals{.key} += .value;
}
say %totals.elems;

Those familiar with map-reduce will probably have already recognized the pattern here. The first part of the program does the work for each file, producing its own word count hash (the map). This is completely thread local. Afterwards, we bring all of the results together into a single hash (the reduce). This is doing reads of data written by another thread, of course. But that’s the cheaper case, and once we get hold of the cache lines with with the hash and scalars, and start to chug through it, we’re not going to be competing for it with anything else.

Of course, the program we get at the end is a bit longer. However, it’s also not hard to imagine having some built-ins that make patterns like this shorter to get in place. In fact, I think that’s where we need to be expending effort in the Perl 6 concurrency work. Yes, we need to harden MoarVM so that you can’t segfault it even if you do bad things. Yes, we should write a module that introduces a monitor keyword, which is a class that automatically takes a lock around each of its method calls:

monitor ThreadSafeLoggingThingy {
    has @!log;

    method log($msg) {
        push @!log, $msg;
    }

    method latest($n) {
        $n < @!log
            ?? @!log[*-$n .. *]
            !! @!log[]
    }
}

Yes, we should do an Actor one too. We could even provide a trait:

my @a is monitor;

Which would take @a and wrap it up in a monitor that locks and delegates all its calls to the underlying array. However, by this point, we’re treading dangerously close to forgetting the importance of transaction scope. At the start of the post, I told the story of the hopelessly unsafe calls to a legacy library from a multi-threaded web application. I had it hunted down and fixed in a morning because it exploded, loud and clear, once I started subjecting it to load tests. Tools to help find such bugs exist. By contrast, having to hunt bugs in code that is threadsafe, non-explosive, but subtly wrong in the placing of its transaction boundaries, is typically long and drawn out – and where automated tools can help less.

In closing, we most certainly should take the time to offer newbie-friendly concurrent, parallel, and asynchronous programming experiences in Perl 6. However, I feel that needs to be done by guiding folks towards safe, teachable, understandable patterns of a CSP (Communicating Sequential Processes) nature. Perl may be about Doing The Right Thing, and Doing What I Mean. But nobody means their programs to do what the hardware hates, and the right thing isn’t to make broken things sort-of-work by sweeping complex decisions on transaction scope under the carpet. “I did this thing I thought was obvious and it just blew up,” can be answered with, “here’s a nice tutorial on how to do it right; ask if you need help.” By contrast, “your language’s magic to make stuff appear to work just wasted days of my life” is a sure-fire way to get a bad reputation among the competent. And that’s the last thing we want.


Read the whole story
Share this story
Delete

Evaluating Free Software for procurement

1 Share

When you’re a public body, how do you evaluate Free Software solutions, and how do you procure them? Recently I’ve been getting this question fairly regularly. Here are the main resources I point people to.

First stop: A guideline from the European Commission on “public procurement of open source software“. This answers most of the fundamental questions, such as “is it ok for us to just download a program?” (yes), or “How can we specify that we really want Free Software and Open Standards?”. When it comes to evaluating potential solutions, the EC guideline is pretty curt.

The UK government has produced an “Open Source procurement toolkit“. This is a very useful resource. It highlights Open Standards and the need to avoid lock-in. The documents in the toolkit are clearly structured and well written.

They sometimes make the common mistake of describing “open source” and “commercial software” as opposites. Lots of commercial companies that have successfully built their business around Free Software would beg to differ. But this is a minor quibble with a generally very useful resource.

So let’s say you’re putting out a call for tender for a Free Software-based solution. How do you evaluate and compare the different bidders? Here, the Swedes have some helpful advice for you. In early 2011, one of Sweden’s two national procurement agencies launched a number of Free Software framework contracts. They specified some pretty detailed criteria for evaluating suppliers. Some of them are pretty nifty: In one example, bidders can score the highest number of points if they have committed code to a project, and the project has accepted and integrated it.

Theoriginal documents are in Swedish (of course), and have been translated into German. If you read neither language, this presentation by Daniel Melin has a pretty good overview. You’ll also want to check outthis write-up of Sweden’s and other countries’ public sector approaches to procuring Free Software.

Thankfully, there are many other useful resources on this topic. If you want to see your favourite one included here, please get in touch!

Read the whole story
Share this story
Delete
Next Page of Stories