A Step-By-Step Guide to Building a Programming Language

Author:Murphy | View: 24597 | Time: 2025-03-22 21:09:34

The world is full of programming languages with all kinds of different use cases. Most of these languages, though, are built for very general purposes – sometimes, we may want to design a language to fit a very specific use case (e.g. Facebook designed React to make it easier to develop their web applications, and Apple recently developed Pkl, a language designed to make configurations easier. There are many more examples of this in various fields). As such, knowing how to build a programming language is a useful skill to have in your belt.

In this article, we will build an interpreted programming language from scratch and learn a little bit about both the Lambda Calculus and programming languages as a whole along the way. The language we build here will be fairly esoteric, but the process should hopefully give you an idea of how to design your own use-case-specific languages (and teach you useful information about how programming languages function under the hood).

The Game Plan

Since we're building an interpreted language¹, our overarching flow will look something like this:

Basically, we start with some concrete syntax (code) written in our target language (the language that we are trying to write), pass it to some parser that converts it to an abstract syntax tree (a tree representation of the code that's easy to work with), and then pass that to an interpreter that "runs" the abstract syntax tree to give us a final result. Note that the parser and interpreter are written in some already existing host language – C's original parser and compiler, for instance, were written in assembly.

** Note: My use of "parser" here encapsulates the entire parsing process. Usually, lexing is done prior to "parsing", but in this case parsing is just the process of taking concrete syntax to abstract syntax, whatever that may look like.

As an example, consider the following specification for a simple language for basic arithmetic:

EXPR = number
       | EXPR + EXPR
       | EXPR - EXPR
       | EXPR * EXPR
       | EXPR / EXPR
       | (EXPR)

The above, by the way, is an EBNF for a context-free grammar². I won't delve too deeply into what this means here, but all Programming languages written in a form like this are parse-able in polynomial time via the CYK algorithm. For this EBNF, something like (4 + 4) * 3 is a valid program, but something like def f(x): return 5; f(5) is not.

Suppose we are now given the concrete syntax (4 + 4) * 3 . After parsing, we should get an abstract syntax tree (AST) like this:

Then our interpreter will start at the root and recursively go down the tree until we get our answer, namely 24.

Note quickly that this grammar is ambiguous – for instance, how should the expression 4 + 4 * 3 parse? It could either parse as the above (that is, (4 + 4) * 3), or it could also parse as 4 + (4 * 3) – neither parsing is inherently more "correct" in the way that we have specified the language, as both are valid parse trees. In cases like this, the parser will have to arbitrarily make a decision about how to parse our language.

Designing our Syntax

By the flow chart, our first step should logically be to design our concrete syntax. What you choose to make your syntax is completely up to you. I decided to create EmojiLang, a (horrible) language that ensures an extremely colorful screen while you are typing. The grammar is below:

grammar EmojiLang;

program: '         Tags:          Hands On Tutorials Lambda Calculus Programming Programming Languages Software Engineering          
          
		  
		  Add Fav


      

      
      
      
      
      Murphy
      Add friends
      View space
      Message
      
      
      
      
Recommend
◦ Infinitely scalable storage for Kubernetes
◦ Introducing NumPy, Part 3: Manipulating Arrays
◦ Setting A Dockerized Python Environment – The Hard Way
◦ Using MLflow with ATOM to track all your machine learning experiments without additional code
◦ Apollo and Design Choices of Video Large Multimodal Models (LMMs)
◦ Software specification in agile projects
◦ Flamingo – Intuitively and Exhaustively Explained
◦ Diving Deeper with Structured Outputs
◦ Introduction to Clustering Algorithms
◦ RecList 2.0: Open-Source Systematic Testing of ML Models
◦ The Coming Copyright Reckoning for Generative AI
◦ Command Line Interface (CLI) Tutorial - How Advanced Users Interact with Computers