Writing a Threading Library

The first assignment for my Operating Systems class is to write a threading library for Linux. It’s not comprehensive, but it should support most of the “basic” functions: init, create, wait, join, exit, and so on. Plus, it should come with a mutex library for synchronization (which, coincidentally, it should use itself in order not to trip over its own toes).

It’s one of the first programming assignments I’ve had in awhile which I’ve really actually gotten into. The code itself is surprisingly straightforward, only nuanced. What I mean by that, I suppose, is that the algorithms for doing this aren’t too mentally taxing (yet), but the API I’m using (ucontext.h’s setcontext, makecontext, swapcontext, and getcontext functions) is completely foreign to me, as is the concept of signal handling in C. There’s an alternative (setjmp.h’s longjmp, siglongjmp, setjmp, and sigsetjmp functions) which is generally used for comprehensive threading libraries (I believe pthreads uses it, but I’m not positive), but it’s a bit more difficult, and what I’m using now is “good enough”. I may still change to the latter library, but only once I’ve gotten the current implementation working.

Anyway, ucontext. It abstracts the concepts that I need pretty well, but there’s some leaks. For instance, valgrind, a popular program analzyer, reports a bunch of “interesting” memory reads and writes, plus a few uses of uninitialized memory. Even though they’re (what I believe to be) completely valid (and the memory is initialized, dammit), the fact that contexts are being swapped and the code pointer is being thrown around all over the place seems to be throwing it for a loop. Maybe this is an issue with any threading library, though, and simply a limitation of our ability to analyze memory accesses. I don’t know.

The problems reported by valgrind don’t seem to be causing any functional. There are other more substantial issues, however. For instance, the concept of a “thread” is something I’m inventing on top of these contexts (the quotes will be used from now on to distinguish between my simulated “threads”, and the operating system’s understanding of a thread). Because of this, I ran into an issue with my program terminating. As it turns out, when setcontext switches to a context, and that context reaches the end of the function it was told to run, the thread exits. But not my “thread” — the actual kernel-recognized concept of the thread my application is running in. Oops.

It was reasonably easy to fix. You’re allowed to give a context another context to return to when it’s finished running. This is similar to a callback function, only it requires more memory and yet more code to initialize the context. And I have to create one for each “thread” (more on this later). So, I make a context which starts running a function that turns the currently-executing “thread” into a zombie and re-runs the scheduler. I will have needed this anyway, but it would have been nice not to have my entire program exit just because some context returned, but I can see why the API was designed this way; it’s not entirely clear what the context should do when it’s finished (you’re not really running multiple programs at once using these contexts, you’re simply saving a context’s state, then swapping to it when a scheduler is run; how should a context know what other context to go to when it’s done executing unless you specifically tell it?). I’m going to have to extend this zombie function later, though; once I implement wait, join, and the like, the thread will have different things to do when it terminates. For instance, when I join a thread, the thread should clean itself up entirely and get its return value back to the joiner. When it’s waited on, yield back to the waiter. And finally, when nobody seems to care about it, it has to sit around in a zombie state until the program exits, just in case another piece of code ever ends up referring to it later.

I mentioned earlier that I have to create a zombie context for every “thread” I make. I had hoped I could make one zombie context that they can all share, but initial attempts to do so have all failed; I’ll return to it later. Essentially, a context includes not only all the memory and stack of the program at that point in time, but also where the code is currently being executed (i.e., the code pointer). So when a zombie thread calls the function to zombify it, it enters the function and continues down. I haven’t found a reasonable way to “reset” the context to its original state, so other zombies which would swap to this context find themselves past the cleanup and zombification steps.

Another potential would be to create two zombie contexts. When one runs, it cleans up the other (which should theoretically no longer be in use, with proper mutex usage), reallocates the memory and creates a new, fresh zombie context, points the “current” zombie context to the newly-created one, and yields back to the scheduler. The next time a zombie is run, it uses the new context, frees and reallocates the (previously) old one, then points at that one again. I’ll try this approach whenever I have time.

Besides these issues, I’m proceeding quite quickly through the assignment. I’ll have to dedicate a lot of time over the weekend to it, but I’m not worried about the deadline, since I have nothing else due in the period (I do have one test, though). I’ve already implemented thread creation, grabbing the current thread handle (which is trivial in my implementation, so I might as well have) (note to self: I may want to use this function rather than referring to the current_thread pointer, just to be safe), scheduling, and thread cleanup. Only three of the several API functions I’m supposed to implement have been finished (i.e., create, get current thread, and init), but I have most of the actual threading code written, so writing the other publicly-exported functions should be relatively minor tasks that simply get or poke or manipulate the different structures I’ve already set up using functions I’ve already written. At least, I hope it’s that simple.