Systems Programming

How do we define systems programming?

Wikipedia and friends will tell you that the term “Systems Programming” refers to software that reaches low-level components of the technology stack in order to enable other software to be more capable and performant. And that’s true as far as it goes, but it doesn’t sing.

I would say, rather: Systems programming is when we enable users to imagine that a computer is something other than a computer.

Consider the four classic systems programming examples: Operating system, compiler, database, network protocol. Each of these presents its users with an imaginary, simpler computer. The compiler, for instance, lets you imagine that computers have structured data and polymorphism and lists and things, even though we know that they only have flat arrays of bytes. A network protocol lets you imagine that things like streams exist, or that your NIC talks IP and that IP networks make connections, or that datagrams arrive error-free or not at all.

(Of course we also work in the realm of imagination. When I do systems programming I make transistors do backflips for me even though I got a C in microelectronics, because a microprocessor presents me with an imaginary view of its basquillions of bipolar junctions and metal oxide tunneling whatnots. This is the essence of computer engineering: Make simplified models of things work as if they were real.)

By allowing users to work in this realm of imagination, we make them more capable, we make their code less error-prone, we make their code make sense. We leverage their fifty correct lines of code per day to fifty better, more powerful, more graceful lines. Fifty lines of python today can do more than the menagerie of odd geniuses at Data General could do in a year.

I like databases because they touch on all of the flavors of systems engineering. Good databases have optimizing transpilers in them, and they work operating system filesystem abstractions to the breaking point, and they frequently involve high-throughput IPC that necessarily resembles networking. A good database rings all the bells.

(The downside is that the conventional user-facing interface for a DB is SQL, which is just terrible. Because this project is deliberately toy-like, we will simply fix that by not using SQL as our interface. When I am choosing, I do not need to choose terrible things. More on this later.)

Structural Implications

It follows that every systems programming project has at least a front end – the imaginary interface that will bring power and joy to our users – and a back end – the place where this system meets its next lower imaginary.

In practice both of these ends are super irritating. The front end is focused on user tasks, and if we wanted to think about user tasks we would have become users. The back end is focused on banging on imaginary raw metal, and we also are not drummers. Everything cool happens in between – the middle end.

The Middle End

The term “middle end” has not caught on, so we usually use the compiler term “Intermediate Representation” (IR; formally a different thing but colloquialism reigns in this messy realm). In compilers, the IR is where all the fun math happens, the point where the user’s ideas of how a computer works have been stripped down to what they actually want and we get to figure out how to turn that into what will actually happen.

As such, the choice of a good IR is a matter of heavy, heavy theory. If this were a real company rather than Barzai banging on his Das Keyboard, we would employ whole-ass algebraists to do nothing but think brilliant thoughts all day. At one previous company we had an entire team of geniuses who mathed amazing math all day to deliver a 13% speedup every single month forever.

I am not a math super genius, and probably neither are you, so what we do is steal an IR from someone smarter. In the case of databases, this work was done by Cobb, the great database theorist. Cobb tells us that the right IR for a database is Relational Algebra.

We’re going to get into that more – lots more – later. For now, put a pin in it and move on.

Implementation Strategies

For a large system with multiple layers, there is only one way to begin, and that is to build “a thread through the system” (h/t Joe Boykin) – a single example that touches every level.

This should be a controversial statement. We could begin by writing interface design documents, for instance, or detailed specifications, or by implementing each layer one-by-one as a standalone artifact. However those approaches are premature.

In particular: Interface design documents for a long project have the reverse-Conway’s-law effect of dictating organizational structure forever. This will cause your organization to become brittle and hidebound and will tank morale. Don’t commit to any interface until you have a prototype in hand to start looking away from bad ideas.

Implementing parts standalone requires a whole lot of work to reify their interfaces to stand alone, an effort which initially appears to be justified because it improves testability, but which ultimately never actually aligns quite right with the tests you actually want to write.

Both of these approaches are much, much more valuable once you have even the most trivial of prototypes, because the prototype can build proof-in-practice of the architecture – and, more to the point, falsify it.

Yes, AN ARCHITECTURE CAN BE FALSE.

An architecture is an assertion that a system structured thisaway can accomplish an unknown future task describable thataway. Of course that assertion can be false! We don’t even know the future task! It’s a miracle that assertion is ever true!

Your architecture has errors, and the cost of an error in architecture grows at least quadratically over the course of development. So we need to falsify our architecture rapidly and repeatedly.

A “thread through the system” is actually subject to ongoing revision, so it’s more than just a prototype. And it provides a whole lot less than a working system, so it’s a lot less than an MVP. It’s a point of departure for our journey into the realms of imagination. It is a cardboard box that we will imagine into a starship via crayons and scotch tape. It’s a mixed metaphor.

This PR

This PR starts a skeleton of main that we will start to hang parsers off of to build our first thread.

It brings in our first external, the extensible command line parser clap; we did this via:

cargo add clap --features derive,cargo

This blog post corresponds to repository state post_05

Lunar metadata: This is a contraction phase; the density of the codebase grows.