Building a Simple Scripting Language Interpreter in Haskell from Scratch
Introduction
Haskell is perhaps the most natural language for writing interpreters and compilers. Its algebraic data types map perfectly to abstract syntax trees, pattern matching makes evaluation elegant, and the type system catches entire classes of bugs at compile time. In this post, we'll build a complete interpreter for a simple scripting language — from lexer to evaluator — entirely from scratch.
Our language, which we'll call Lark, will support:
- Integer and boolean literals
- Arithmetic and comparison operators
- Variables and let bindings
- If/else expressions
- Lambda functions and closures
- Recursion
Project Setup
mkdir lark && cd lark
cabal init --non-interactive
Add to your lark.cabal:
build-depends: base ^>=4.17, containers, text
Step 1: The Lexer
The lexer transforms raw source text into a stream of tokens. In Haskell, we represent tokens as an algebraic data type:
module Lexer where
data Token
= TInt Int
| TBool Bool
| TIdent String
| TPlus | TMinus | TStar | TSlash
| TEq | TNeq | TLt | TGt
| TLParen | TRParen
| TLet | TIn | TIf | TThen | TElse
| TFun | TArrow
| TAnd | TOr | TNot
| TEOF
deriving (Show, Eq)
Now the lexer itself. We'll write it as a simple recursive function that consumes characters one at a time:
import Data.Char (isAlpha, isAlphaNum, isDigit, isSpace)
tokenize :: String -> [Token]
tokenize [] = [TEOF]
tokenize (c:cs)
| isSpace c = tokenize cs
| isDigit c = lexInt (c:cs)
| isAlpha c = lexIdent (c:cs)
| otherwise = lexSymbol c cs
lexInt :: String -> [Token]
lexInt s =
let (digits, rest) = span isDigit s
in TInt (read digits) : tokenize rest
lexIdent :: String -> [Token]
lexIdent s =
let (word, rest) = span isAlphaNum s
tok = case word of
"true" -> TBool True
"false" -> TBool False
"let" -> TLet
"in" -> TIn
"if" -> TIf
"then" -> TThen
"else" -> TElse
"fun" -> TFun
"and" -> TAnd
"or" -> TOr
"not" -> TNot
_ -> TIdent word
in tok : tokenize rest
lexSymbol :: Char -> String -> [Token]
lexSymbol c cs = case (c, cs) of
('+', _) -> TPlus : tokenize cs
('-', '>':rest)-> TArrow : tokenize rest
('-', _) -> TMinus : tokenize cs
('*', _) -> TStar : tokenize cs
('/', _) -> TSlash : tokenize cs
('=', '=':rest)-> TEq : tokenize rest
('!', '=':rest)-> TNeq : tokenize rest
('<', _) -> TLt : tokenize cs
('>', _) -> TGt : tokenize cs
('(', _) -> TLParen : tokenize cs
(')', _) -> TRParen : tokenize cs
_ -> error $ "Unexpected character: " ++ [c]
Step 2: The Abstract Syntax Tree
The AST represents the structure of our program. This is where Haskell truly shines — algebraic data types model it perfectly:
module AST where
type Name = String
data Expr
= Lit Literal
| Var Name
| BinOp Op Expr Expr
| UnOp UnaryOp Expr
| If Expr Expr Expr
| Let Name Expr Expr
| Lam Name Expr
| App Expr Expr
deriving (Show, Eq)
data Literal
= LInt Int
| LBool Bool
deriving (Show, Eq)
data Op
= Add | Sub | Mul | Div
| Eq | Neq | Lt | Gt
| And | Or
deriving (Show, Eq)
data UnaryOp
= Neg | Not
deriving (Show, Eq)
Step 3: The Parser
We'll write a hand-rolled recursive descent parser. The key insight is that operator precedence maps to a hierarchy of parsing functions:
module Parser where
import AST
import Lexer
type Parser a = [Token] -> Either String (a, [Token])
-- Entry point
parse :: [Token] -> Either String Expr
parse tokens = do
(expr, rest) <- parseExpr tokens
case rest of
[TEOF] -> Right expr
_ -> Left $ "Unexpected tokens: " ++ show rest
parseExpr :: Parser Expr
parseExpr tokens = case tokens of
(TLet : TIdent name : rest) -> do
(val, rest2) <- parseExpr rest
case rest2 of
(TIn : rest3) -> do
(body, rest4) <- parseExpr rest3
Right (Let name val body, rest4)
_ -> Left "Expected 'in' after let binding"
(TIf : rest) -> do
(cond, rest2) <- parseExpr rest
case rest2 of
(TThen : rest3) -> do
(t, rest4) <- parseExpr rest3
case rest4 of
(TElse : rest5) -> do
(f, rest6) <- parseExpr rest5
Right (If cond t f, rest6)
_ -> Left "Expected 'else'"
_ -> Left "Expected 'then'"
(TFun : TIdent x : TArrow : rest) -> do
(body, rest2) <- parseExpr rest
Right (Lam x body, rest2)
_ -> parseOr tokens
-- Precedence levels: or < and < cmp < add < mul < unary < app < atom
parseOr, parseAnd, parseCmp, parseAdd, parseMul :: Parser Expr
parseOr = parseBinOp [(TOr, Or)] parseAnd
parseAnd = parseBinOp [(TAnd, And)] parseCmp
parseCmp = parseBinOp [(TEq, Eq),(TNeq, Neq),(TLt, Lt),(TGt, Gt)] parseAdd
parseAdd = parseBinOp [(TPlus, Add),(TMinus, Sub)] parseMul
parseMul = parseBinOp [(TStar, Mul),(TSlash, Div)] parseUnary
parseBinOp :: [(Token, Op)] -> Parser Expr -> Parser Expr
parseBinOp ops next tokens = do
(left, rest) <- next tokens
go left rest
where
go left tokens =
case lookup (head tokens) ops of
Just op -> do
(right, rest) <- next (tail tokens)
go (BinOp op left right) rest
Nothing -> Right (left, tokens)
parseUnary :: Parser Expr
parseUnary (TMinus : rest) = do
(e, rest2) <- parseUnary rest
Right (UnOp Neg e, rest2)
parseUnary (TNot : rest) = do
(e, rest2) <- parseUnary rest
Right (UnOp Not e, rest2)
parseUnary tokens = parseApp tokens
parseApp :: Parser Expr
parseApp tokens = do
(f, rest) <- parseAtom tokens
go f rest
where
go f tokens =
case parseAtom tokens of
Right (arg, rest) -> go (App f arg) rest
Left _ -> Right (f, tokens)
parseAtom :: Parser Expr
parseAtom (TInt n : rest) = Right (Lit (LInt n), rest)
parseAtom (TBool b : rest) = Right (Lit (LBool b), rest)
parseAtom (TIdent x : rest)= Right (Var x, rest)
parseAtom (TLParen : rest) = do
(e, rest2) <- parseExpr rest
case rest2 of
(TRParen : rest3) -> Right (e, rest3)
_ -> Left "Expected closing parenthesis"
parseAtom tokens = Left $ "Unexpected token: " ++ show (head tokens)
Step 4: The Evaluator
The evaluator is the heart of the interpreter. We define a Value type for runtime values, and an Env for the environment (variable bindings):
module Eval where
import AST
import qualified Data.Map.Strict as Map
-- Runtime values
data Value
= VInt Int
| VBool Bool
| VClosure Name Expr Env
deriving (Show)
-- Environment: maps variable names to values
type Env = Map.Map Name Value
-- Evaluation
eval :: Env -> Expr -> Either String Value
eval env expr = case expr of
Lit (LInt n) -> Right (VInt n)
Lit (LBool b) -> Right (VBool b)
Var name ->
case Map.lookup name env of
Just v -> Right v
Nothing -> Left $ "Unbound variable: " ++ name
Let name val body -> do
v <- eval env val
eval (Map.insert name v env) body
If cond t f -> do
cv <- eval env cond
case cv of
VBool True -> eval env t
VBool False -> eval env f
_ -> Left "Condition must be a boolean"
Lam x body -> Right (VClosure x body env)
App f arg -> do
fv <- eval env f
av <- eval env arg
case fv of
VClosure x body closureEnv ->
eval (Map.insert x av closureEnv) body
_ -> Left "Cannot apply non-function"
BinOp op l r -> do
lv <- eval env l
rv <- eval env r
evalBinOp op lv rv
UnOp op e -> do
v <- eval env e
evalUnOp op v
evalBinOp :: Op -> Value -> Value -> Either String Value
evalBinOp op (VInt a) (VInt b) = case op of
Add -> Right $ VInt (a + b)
Sub -> Right $ VInt (a - b)
Mul -> Right $ VInt (a * b)
Div -> if b == 0
then Left "Division by zero"
else Right $ VInt (a `div` b)
Eq -> Right $ VBool (a == b)
Neq -> Right $ VBool (a /= b)
Lt -> Right $ VBool (a < b)
Gt -> Right $ VBool (a > b)
_ -> Left "Type error in binary operation"
evalBinOp And (VBool a) (VBool b) = Right $ VBool (a && b)
evalBinOp Or (VBool a) (VBool b) = Right $ VBool (a || b)
evalBinOp _ _ _ = Left "Type error in binary operation"
evalUnOp :: UnaryOp -> Value -> Either String Value
evalUnOp Neg (VInt n) = Right $ VInt (-n)
evalUnOp Not (VBool b) = Right $ VBool (not b)
evalUnOp _ _ = Left "Type error in unary operation"
Step 5: Putting It All Together
module Main where
import Lexer
import Parser
import Eval
import qualified Data.Map.Strict as Map
run :: String -> Either String Value
run source = do
let tokens = tokenize source
ast <- parse tokens
eval Map.empty ast
main :: IO ()
main = do
-- Test: factorial using recursion via let + lambda
let program = unlines
[ "let fact = fun n ->"
, " if n == 0 then 1"
, " else n * fact (n - 1)"
, "in fact 10"
]
case run program of
Left err -> putStrLn $ "Error: " ++ err
Right val -> print val
Testing It Out
Let's try a few programs:
-- Arithmetic
let x = 10 in let y = 32 in x + y
-- Result: VInt 42
-- Higher-order functions
let apply = fun f -> fun x -> f x in
let double = fun n -> n * 2 in
apply double 21
-- Result: VInt 42
-- Closures
let adder = fun n -> fun x -> n + x in
let add5 = adder 5 in
add5 37
-- Result: VInt 42
What's Missing (And How to Add It)
This interpreter deliberately omits several features to keep it focused. Here's what you'd add next:
Recursion: Our current let doesn't support recursion because the variable isn't in scope during evaluation of its own definition. Add a letrec construct or use a fix combinator.
Strings: Add LString String to Literal and handle string concatenation in evalBinOp.
Lists: Add LList [Expr] and primitive operations like head, tail, cons.
A REPL: Wrap main in a loop that reads a line, runs it, prints the result, and repeats.
Error positions: Thread source positions through the lexer and AST so error messages can show line/column numbers.
Why Haskell?
After going through this implementation, the advantages should be clear:
- ADTs + pattern matching make AST traversal exhaustive — the compiler warns if you forget a case
Eitherfor errors makes error propagation explicit and composable without exceptions- Immutable data means environments are naturally persistent — closures capture their scope without copying
- The type system catches most structural bugs before you even run the code
A comparable implementation in a dynamically typed language would be shorter to write initially, but far harder to extend safely. In Haskell, adding a new expression type means the compiler tells you every place you need to handle it.
Conclusion
We've built a complete interpreter in under 300 lines of Haskell: a lexer, a recursive descent parser with proper operator precedence, and a tree-walking evaluator with closures. The full source is available on GitHub.
The next step is adding a type checker — Haskell's type system makes implementing Hindley-Milner type inference particularly elegant, and that's a story for another post.