First things first

Before we do anything else, we start a repository. Nothing fancy, I’ll use some default github presets, along with my standard bits and pieces.

A lawyer once told me that Apache 2 is the sensible default license so we’ll start with that. I’ll be the only contributor to this repository, and I can freely relicense in the future as long as that’s the case.

I think everyone who’s been in the game for a while has their own standard starting repository configuration. I try to hew pretty closely to the “happy path” of my tools, since it’s easier to refactor away from the happy path than to start there and try to get back.

The first choice in any project is the language, which is closely tied to the second question, whether to work top-down or bottom-up. I’m a bottom-up guy, and the legitimate hotness for low-level systems programming these days is Rust. I don’t know Rust very well, I’ve written maybe a couple of thousands of lines total, but most programming languages are similar enough that after your first twenty or thirty they start to rhyme nicely.

Rust provides a project source and build manager called cargo, and I’ll use that to set up the world.

cargo init .

Cargo sets this up as a binary project; that might or might not be right for us; we’ll see as we go.

Cargo follows the convention that a project starts in a src directory. That’s not my usual style, but we’ll roll with it while we get our bearings.


This blog post corresponds to repository state post_02

Lunar metadata: This is an expansion phase; the scope of the codebase grows.


Let's Make a Database

Call me Barzai (my IRL name is easily found, but we’re internet friends so let’s use internet names; if you are polite you will refrain from doxing). I’m a software engineer, classically trained, and have decided to spread some experience.

I’m doing this for a few reasons, both altruistic and selfish, public and private. But the big one is:

  • There are a lot more people who need to know practical systems engineering than are teaching it,
  • The quality of online tutorial resources that are narrative (not just complete projects) and project-sized (neither just two or three files nor impossibly huge) is very poor, and
  • When people are bad at this, the costs of their unskillful work fall on others, including sometimes on me.

So we’re going to make a database.

This isn’t going to be a big fancy SQL monster like Oracle, or even something comparable to open-source databases like MySQL or Postgres. This is going to be something small and a little bit silly. But it’s going to show you some of the things that I’ve learned in my decades as a software engineer that might help you brush up your systems engineering chops, or just be a fun romp through some styles of code that most people don’t get to play with.

On the way we might hit networking, parsing, language design, test methodology, polyglot codebases, performance optimization, design philosophy, and a million other things. I think it will be fun!

Here are some ground rules:

  • I’m using free tools where practical. It should be possible to reproduce this work without spending more than a few dollars – none at all, if possible – because this is educational material and if cost is a barrier to education then we are all worse off in the end.
    • Of course cost should not be a barrier to most anyone in a position to use this tutorial, but “should” and $5 buys a mediocre coffee these days.
    • Conversely, none of the “free tools” are really quite free. If you gain profit by a free tool, it would be proper to donate a bit to the maintainers, wouldn’t it?
  • I’m deliberately choosing an unfamiliar language. I’ll be using Rust for this project, even though my own background is C, C++, and python (and Java. And Haskell. And XQuery. And…).
    • This is because stumbling through bad code and unfamiliar toolchains is a core part of systems engineering. If I were working in C++ this would be too easy for me and I would unconsciously gloss over details.
    • If you say “I am uncomfortable writing code in an unfamiliar language” then this is the wrong level for you. In my view, mature systems programming begins where programming languages start to blur together, which for most people seems to be after 8-10 years of full-time software work in three different languages.
  • I’m going to call my shots – I will make design decisions and choices of tools explicitly. This will force me to make some of my implicit knowledge explicit.
  • This will be narrative – it will consists of reasonably PR-sized chunks, interspersed in places with amusing stories, as one would experience the growth of such a codebase over time.
    • Concretely, each post here will correspond to a git revision. It will be possible to see the state of the code “at the time of” the relevant design decision.
  • This will be subjective – I have a point of view and an attitude, solecisms and idiosyncrasies, ethics and outright curmudgeonly antique habits of thought and action; I hope that by making my own point of view clear, you can see how you might make your own personal style an asset in your own development.
  • This will have approximately weekly posts describing an approximately daily pace – each post will resemble a day’s work for a mature (not super-high velocity) developer. An old rule of thumb states that a developer can average fifty correct lines of code per day.
    • There will be posts here with more that fifty lines – sometimes considerably more. Those will contain correspondingly many defective lines of code.
    • There will be posts here with zero lines – design days, watercooler storytelling days, and so forth.
  • There will be sarcasm and swearing. If you don’t associate software engineering with sarcasm and swearing, you probably aren’t quite the target audience; come back in a couple of years when your mental health is worse.

If you agree, let’s begin…


This blog post corresponds to repository state post_01