A Step-By-Step Guide to Building a Programming Language
The world is full of programming languages with all kinds of different use cases. Most of these languages, though, are built for very general purposes – sometimes, we may want to design a language to fit a very specific use case (e.g. Facebook designed React to make it easier to develop their web applications, and Apple recently developed Pkl, a language designed to make configurations easier. There are many more examples of this in various fields). As such, knowing how to build a programming language is a useful skill to have in your belt.
In this article, we will build an interpreted programming language from scratch and learn a little bit about both the Lambda Calculus and programming languages as a whole along the way. The language we build here will be fairly esoteric, but the process should hopefully give you an idea of how to design your own use-case-specific languages (and teach you useful information about how programming languages function under the hood).
The Game Plan
Since we're building an interpreted language¹, our overarching flow will look something like this:

Basically, we start with some concrete syntax (code) written in our target language (the language that we are trying to write), pass it to some parser that converts it to an abstract syntax tree (a tree representation of the code that's easy to work with), and then pass that to an interpreter that "runs" the abstract syntax tree to give us a final result. Note that the parser and interpreter are written in some already existing host language – C's original parser and compiler, for instance, were written in assembly.
** Note: My use of "parser" here encapsulates the entire parsing process. Usually, lexing is done prior to "parsing", but in this case parsing is just the process of taking concrete syntax to abstract syntax, whatever that may look like.
As an example, consider the following specification for a simple language for basic arithmetic:
EXPR = number
| EXPR + EXPR
| EXPR - EXPR
| EXPR * EXPR
| EXPR / EXPR
| (EXPR)
The above, by the way, is an EBNF for a context-free grammar². I won't delve too deeply into what this means here, but all Programming languages written in a form like this are parse-able in polynomial time via the CYK algorithm. For this EBNF, something like (4 + 4) * 3
is a valid program, but something like def f(x): return 5; f(5)
is not.
Suppose we are now given the concrete syntax (4 + 4) * 3
. After parsing, we should get an abstract syntax tree (AST) like this:

Then our interpreter will start at the root and recursively go down the tree until we get our answer, namely 24.
Note quickly that this grammar is ambiguous – for instance, how should the expression 4 + 4 * 3
parse? It could either parse as the above (that is, (4 + 4) * 3
), or it could also parse as 4 + (4 * 3)
– neither parsing is inherently more "correct" in the way that we have specified the language, as both are valid parse trees. In cases like this, the parser will have to arbitrarily make a decision about how to parse our language.
Designing our Syntax
By the flow chart, our first step should logically be to design our concrete syntax. What you choose to make your syntax is completely up to you. I decided to create EmojiLang, a (horrible) language that ensures an extremely colorful screen while you are typing. The grammar is below:
grammar EmojiLang;
program: '