Oct 14, 2019
Mu: A minimal hobbyist computing stack

It is far better to have an under-featured product that is rock solid, fast, and small than one that covers what an expert would consider the complete requirements.”
Richard Gabriel's best summary (pg 219) of his essay, “Worse is Better”

Over the past year I've been working on a minimal-dependency hobbyist computing stack (everything above the processor) called Mu. The goal is to:

  1. build up infrastructure people can enjoy programming on,
  2. using as little code as possible, so that people can also hack on the underpinnings, modifying them to suit diverse desires.

Conventional stacks kinda support 1 if you squint, but they punt on 2, so it can take years to understand just one piece of infrastructure (like the C compiler). It looks like nobody understands the entire stack anymore. I'd like Mu to be a stack a single person can hold in their head all at once, and modify in radical ways.

(While it should support understanding everything, you aren't expected to understand everything. Mu tries to reward curiosity, but should get out of the way when you're just trying to get something done.)

One implication of fitting in a single head: Mu constrains the number of supported languages. Languages have a way of growing into isolated universes, and interoperation between languages adds its own complexities. It seems better for future readers if the stack minimizes the number of such boundaries, even if writers are inconvenienced somewhat.

I eventually want to make it easy to swap out one kind of language for another, so that people can program in Lisp-like or Python-like or C-like syntax according to their taste. But regardless of which notations any given Mu computer has, it will be parsimonious in the number of notations readers have to learn to understand and take ownership of a single computer.

Each of these notations needs to be simple enough that it can be implemented out of lower-level notations. The fact that C compilers are written in C contributes a lot of the complexity that makes compilers black magic to most people.

A year in, it's surprising how inevitable the design has seemed. If the journey starts at a specific processor architecture and the goal is to minimize layers of translation above it while paying attention to dependencies, the final destination seems quite inevitable (ignoring minor syntactic choices). This discovery seems worth sharing more broadly, if only so others can prove me wrong and suggest alternative designs.


Since Mu looks quite different from conventional stacks, I need a few blog posts to cover all the ground. My plan is to give Mu just 3 distinct languages (only the first currently exists; the third is not even planned yet).

  • Level 1: just the processor's instruction set with some syntax sugar
  • Level 2: a memory-safe language
  • Level 3: an expressive high-level language

These levels don't quite map to familiar terms. Level 1 has some attributes of machine code and Assembly. Level 2 has some attributes of Assembly and very high-level languages.

An alternative view of this stack is the primary affordance each level provides:

  • Level 1: structured control flow
  • Level 2: strong typing
  • Level 3: mould the computer to how people think

Each of these affordances requires a global change to the programming model. Within each level I try to keep things as habitable as possible using just local syntax sugar (tools that don't need to understand the entire codebase).

The rest of this post focuses on level 1. The sequel describes a preliminary design for level 2. I won't get to level 3 until I finish building level 2. So far I've written 40k lines of code (LoC) in level 1, and 30k LoC in an earlier prototype that has some resemblance to level 2. Since I want each level to be habitable in isolation, it seems like a good idea to write a decent amount of code in it before moving on to higher levels. (This is a big difference from past approaches to bootstrapping, which try to do as little as possible in each level before jumping to higher levels. The lower levels inevitably end up being hard for newcomers to understand or interactively take apart.)

Level 1: SubX

At the lowest level my goal is to pick a processor and make the experience of programming in raw machine code not completely suck.

I mentioned above that each level builds on levels below, but that's a lie at this level. I don't enjoy editing binary and don't want my readers to have to either. Instead I have an alternative plan, with two prongs:

  • A C++ translator that converts a textual format to binary.
  • A self-hosted translator that converts the textual format to binary.

The C++ translator is more familiar, but pulls in some gigantic dependencies. The self-hosted translator will look strange but have a tiny codebase and a tiny surface area (just 3 OS syscalls, for example). Both translators will emit identical binaries, which is helpful when debugging the system.

Complete details for how to operate these tools are in the Readme. This post is a quick tour of the major design choices, but if it's interesting you should clone the repo and read the Readme in a text editor. Mu is really intended to be played with interactively, rather than passively read.

Ok, design choices. My computer runs x86, and it's the most open platform I have. So Mu is going to run on x86. It is explicitly not designed to be portable. So we'll start with a quick tour of x86. Here's what you need to know.

The x86 instruction set is 40 years old. It started out as an 8-bit processor, then turned into a 16-bit, 32-bit and now 64-bit processor. It has accumulated hacks and bolted on features over its history. To help it fit in my head, I've chosen a fairly regular subset of x86 for SubX (hence the name), focusing on just 32-bit values (and a couple of instructions for 8-bit bytes so that I can iterate over strings).


Machine code for any processor consists of linear sequences of instructions. All the nested block structure and nested calls of higher-level languages are gone by this point. All you have is long lists of instructions with the ability to conditionally skip some subsequences and run some subsequences repeatedly. (Really, instructions are just numbers, so all you have are long sequences of numbers.)

Every instruction in machine code starts with an opcode, some series of bits that specifies which instruction to run. Assembly languages then provide friendly names for opcodes like `add` and `jump`. In a complex instruction set like for the x86 processor, `add` maps to a whole family of opcodes, and there's significant logic to deduce the opcode for an instruction based on its arguments. It also takes a significant amount of code to provide good error messages. There are many reasonable-seeming combinations of operands that x86 doesn't let you add together, and it's hard to compress that knowledge into a short error message tailored to a specific situation.

SubX doesn't provide names for opcodes; you have to use the raw opcodes directly. This eliminates code for translating names to opcodes, and it also enormously simplifies error messages. The programmer needs to work from the list of opcodes, and each error needs to only handle a single case. In practice, this hasn't felt like a major hindrance, because understanding error messages returned by Assemblers often requires understanding the underlying opcodes anyway. I've been finding good error messages to be more valuable than syntactic conveniences. (I'd love feedback on this decision.)

To see the complete list of opcodes supported by SubX at any point in time, type:

$ bootstrap help opcodes


Each instruction operates on some small number of operands. In x86 instructions can't take more than 2 operands for the most part. Since there are only two operands, and since binary operations like addition are fundamental, most instructions both read and write from one operand:

  • Add operand o1 to o2;
  • subtract o1 from o2;
  • compute the bitwise `AND` of o1 and o2, and store the result in o1;
  • …and so on.

Operands may be stored in either registers or memory. Registers are some small number of named (really numbered) locations. The x86 processor has 8:

  • eax (0)
  • ecx (1)
  • edx (2)
  • ebx (3)
  • esp (4)
  • ebp (5)
  • esi (6)
  • edi (7)

Operands in memory are also usually specified using registers somehow:

  • `*eax`: value in memory at address provided in eax
  • `*(ecx+4)`: value in memory at address ecx+4
  • …and so on.

All this gets turned into numbers, but we don't need the details for this post. Consult the Readme for details.


One big problem programming in raw x86 machine code is that instructions are not all the same size. Instruction boundaries aren't aligned to every 4 bytes, or something like that. The processor is parsing future instruction boundaries as it reads in earlier instructions. It's really easy to accidentally add an extra byte or forget a byte to an instruction. When that happens, bytes that were intended to be opcodes can be interpreted as operands, and vice versa. The program goes silently off the rails, and may not show an error message until much later. The difficulty of debugging such errors is arguably the single biggest obstacle to a good programming experience.

This is the point at which Assembly languages nope out and go build a whole new syntax for themselves, categorizing opcodes into instruction names and so on. Since SubX needs to be self-hosted and every feature in it needs to be programmed in SubX, its solution to this problem is to stay with machine code, but add lots of space for metadata. Here's a sample instruction in SubX:

89/<- 5/rm32/ebp 4/r32/esp # copy esp to ebp 

Instructions lie all on one line. They consist of multiple words separated by whitespace. Each word contains a datum until the first slash. The datum is the only part that makes it into the binary. Everything in a word after the datum and the first slash is metadata. As you see, you can have multiple bits of metadata squirrelled away in a word, all separated by slashes.

Metadata is optional and often ignored, so it's a good place for little comments. However, certain special words for argument types trigger error-checking. In the above instruction, Mu ignores the `/<-` and reads just the ‘89’. However, it knows that opcode 89 expects a `/rm32` and `/r32` argument. If it fails to see one of them in the rest of the instruction, it immediately flags an error. If it sees any of the other known argument types in the instruction when it doesn't expect them—it immediately flags an error. (The register names `/ebp` and `esp` are just comments to aid the reader.)

$ bootstrap translate init.linux examples/ex1.subx -o a.elf
'bb/copy-to-ebx' (copy imm32 to EBX): missing imm32 operand

(The code samples in this post hide some details that aren't important on a first encounter with Mu. I'll elide the actual opcodes further down.)

While this use of metadata is specific to the properties of x86, they're a general mechanism for lots of different checks one may want to apply. I used them in several ways in a previous prototype. I've started to think of the structure of words and metadata as “s-statements”, by analogy with s-expressions. A fairly fundamental uniform syntax that can be used in diverse situations where one doesn't want arbitrary nesting.

Structured programming

As I mentioned above, the primary motivation for the SubX layer is to make working with control flow more ergonomic. The x86 processor tracks what instruction to execute using the special eip register. This register can't be modified by most instructions. Only jumps and calls modify it.

jump 237  # add 237 to eip 

It gets tedious to adjust the byte offsets every time you add or delete an instruction. So SubX provides named labels for specific points in the instruction stream, just like conventional Assembly languages. For example:

jump-if-equal $foo
$foo:  # jump to here 

jump-if-equal 237

Another convenience (not usually found in Assembly language) is the special labels `{` and `}`, `break` and `loop`.

   jump-if-equal break
   jump loop

   jump-if-equal $break1
   jump $loop1

Basically the `{` and `}` get translated into labels, and `break` gets translated into a jump to the enclosing `}`. Correspondingly, `loop` gets translated to a jump to the enclosing (earlier) `{`. This syntax is surprisingly ergonomic and proved surprisingly easy to teach to non-programmers during the Mu1 prototype. Where you would say, in an imperative language:

if (<predicate>) {

in SubX you would say:

<flag> = <predicate>
   jump-unless <flag> break


Idiomatic machine code programs consist of functions operating on data. No matter what your high-level language does, and no matter what processor it runs on, at bottom it's translated to some interplay between functions and data. This section gets into that interplay in some detail.

We keep a program's functions and data segregated in memory to a code segment and data segment, respectively. (There are a variety of historic reasons for this that are not interesting. There is also one currently topical reason that is very interesting: security. A segment is a contiguous block of memory that gets a single access restriction. Code segments can be executed from but not written to (after the OS loads them initially). Data segments can be written but not executed. This W^X constraint is a critical pillar for securing computers; bad things happen when programs can be induced to modify their own code.)

In addition to the data segment a program starts out with, it can request various segments of empty working memory. A crucial one needed for functions to work is the stack. Stacks help keep local variables in functions isolated from each other. In particular, they're necessary for recursive functions that call themselves either directly or indirectly. Each call must get new copies of locals that don't interfere with other calls.

Here's one way function calls can work (used by many modern platforms): Before making a call the caller pushes arguments on the stack. The function can now access them from the stack. After it returns the caller pops the arguments off the stack to clean up. Recursive calls get separate frames on the stack.

Since the stack is temporary space that gets cleaned up after a function returns, it's also an ideal place for local variables. Just push stuff on the stack and make sure to clean it up before you return. The caller is none the wiser.

Managing arguments and local variables does require each function to know precisely where they live. One unambiguous way to specify arguments and local variables (again used by many modern platforms) is as offsets off of a special stack pointer register (esp above). The x86 instruction set provides `push` instructions that automatically decrement esp, and `pop` instructions that automatically increment esp. (The stack grows downward.)

Bottomline: calls in SubX consist by convention of some number of `push` instructions, one `call` instruction, and some cleanup of the stack. SubX provides the following syntax sugar:

(f o1 o2 o3)

push o3
push o2
push o1
call f
add 12 to esp # pop 3 args * 4 bytes each


Strings (arrays of bytes, ignoring character encodings) are a workhorse of high-level languages, but Assembly doesn't make them convenient to deal with. Programming in SubX involves writing lots of automated tests, and tests are most useful when they give good error messages. So passing strings into functions is a crucial mechanism. SubX allows string literals. When it encounters one, it appends the new literal to the data segment and replaces it with its label in the code segment.

 == code
(f o1 o2 "foo")

 == data

 == code
(f o1 o2 $string1)

 == data
   66 6f 6f  # Utf-8 for ‘f’ ‘o’ ‘o’


One of the ways I'm able to sling large programs at such a low level is by writing lots of automated tests. A test harness isn't a common sight in Assembly programming. SubX adds a single new mechanism that makes testing ergonomic: when emitting a binary it generates a new function called `run-tests` that calls every function in the program that starts with ‘test-’.

Putting all this syntax sugar together, here's a SubX function to compute the factorial of a number, along with one automated test:

factorial: # n : int → result/eax : int
# function prologue
89/<- ebp 4/r32/esp
# save registers
# if (n <= 1) return 1
81 7/subop/compare *(ebp+8) # n is at *(ebp+8)
7f/jump-if-greater break/disp8
b8/copy-to-eax 1/imm32
# otherwise return n * factorial(n-1)
7e/jump-if-lesser-or-equal break/disp8
# ebx = n-1
8b/-> *(ebp+8) 3/r32/ebx
# return n * factorial(ebx)
(factorial ebx) # → eax
f7 4/subop/multiply-into-eax *(ebp+8)
# restore registers
# function epilogue
89/<- esp 5/r32/ebp

(factorial 5)
(check-ints-equal 120 eax "failure: factorial(5)")

(Ignore all the magic numbers for opcodes, just trust that ‘5b’ means ‘pop to ebx’, that ‘5/r32’ means ‘ebp’, and so on.)


SubX is designed to be easy to self-host, so it mixes and matches features from machine code, conventional Assembly and higher-level programming languages:

  • No mnemonics; programmer must provide all numeric opcodes and operands directly.
  • Error-checking using metadata.
  • Syntax sugar for function calls and expressions like `*(ebp+8)`.
  • Literal strings.
  • Automated test harness.

The combination of these mechanisms has been ergonomic enough that I've written 40k LoC in SubX over the past year. I've successfully gotten SubX bootstrapped in itself. I've built an emulator for SubX that can emit traces of instructions executed, and this trace provides me with a time-travel debugging experience. I can also package up SubX programs with a third-party OS kernel into bootable disk images that run natively or on a cloud server. In spite of these milestones, the syntax above is still (obviously) not entirely copacetic. In the next post I describe my attempts to design the next level up, with strong type-safety and memory-safety. Rather to my surprise, the design process continues to seem inevitable.

(Thanks Garth Goldwater and Remo Dentato for helpful feedback on drafts of this post.)


  • Anonymous, 2019-10-18: I am unclear on why the actual opcodes themselves were retained in code that will obviously need to be fed through an assembler to produce a binary. Why not elide them and simply create a more ergonomic assembly language? Is the concern that the complexity of the assembler would threaten habitability more than the foreign-seeming syntax and necessity to memorize opcodes rather than mneumonics does?   
    • Kartik Agaram, 2020-01-04: Yeah, though I'm open to revisiting this decision. The trouble with conventional Assembly languages is that a mnemonic may expand to many opcodes depending on the arguments. That complicates things greatly.

      There's a ticket open in the project to design mnemonics for SubX opcodes in a 1:1 manner. It's a hard problem. We want the mnemonics to be more useful/memorable than opcodes, but not too verbose. I haven't been able to come up with a design that satisfies these constraints, but maybe somebody else can?

Comments gratefully appreciated. Please send them to me by any method of your choice and I'll include them here.

RSS (?)
twtxt (?)
Station (?)