6 min read

Building a Simple Scripting Language Interpreter in Haskell from Scratch


Introduction

Haskell is perhaps the most natural language for writing interpreters and compilers. Its algebraic data types map perfectly to abstract syntax trees, pattern matching makes evaluation elegant, and the type system catches entire classes of bugs at compile time. In this post, we'll build a complete interpreter for a simple scripting language — from lexer to evaluator — entirely from scratch.

Our language, which we'll call Lark, will support:

  • Integer and boolean literals
  • Arithmetic and comparison operators
  • Variables and let bindings
  • If/else expressions
  • Lambda functions and closures
  • Recursion

Project Setup

mkdir lark && cd lark
cabal init --non-interactive

Add to your lark.cabal:

build-depends: base ^>=4.17, containers, text

Step 1: The Lexer

The lexer transforms raw source text into a stream of tokens. In Haskell, we represent tokens as an algebraic data type:

module Lexer where

data Token
  = TInt Int
  | TBool Bool
  | TIdent String
  | TPlus | TMinus | TStar | TSlash
  | TEq | TNeq | TLt | TGt
  | TLParen | TRParen
  | TLet | TIn | TIf | TThen | TElse
  | TFun | TArrow
  | TAnd | TOr | TNot
  | TEOF
  deriving (Show, Eq)

Now the lexer itself. We'll write it as a simple recursive function that consumes characters one at a time:

import Data.Char (isAlpha, isAlphaNum, isDigit, isSpace)

tokenize :: String -> [Token]
tokenize [] = [TEOF]
tokenize (c:cs)
  | isSpace c  = tokenize cs
  | isDigit c  = lexInt (c:cs)
  | isAlpha c  = lexIdent (c:cs)
  | otherwise  = lexSymbol c cs

lexInt :: String -> [Token]
lexInt s =
  let (digits, rest) = span isDigit s
  in TInt (read digits) : tokenize rest

lexIdent :: String -> [Token]
lexIdent s =
  let (word, rest) = span isAlphaNum s
      tok = case word of
              "true"  -> TBool True
              "false" -> TBool False
              "let"   -> TLet
              "in"    -> TIn
              "if"    -> TIf
              "then"  -> TThen
              "else"  -> TElse
              "fun"   -> TFun
              "and"   -> TAnd
              "or"    -> TOr
              "not"   -> TNot
              _       -> TIdent word
  in tok : tokenize rest

lexSymbol :: Char -> String -> [Token]
lexSymbol c cs = case (c, cs) of
  ('+', _)       -> TPlus   : tokenize cs
  ('-', '>':rest)-> TArrow  : tokenize rest
  ('-', _)       -> TMinus  : tokenize cs
  ('*', _)       -> TStar   : tokenize cs
  ('/', _)       -> TSlash  : tokenize cs
  ('=', '=':rest)-> TEq     : tokenize rest
  ('!', '=':rest)-> TNeq    : tokenize rest
  ('<', _)       -> TLt     : tokenize cs
  ('>', _)       -> TGt     : tokenize cs
  ('(', _)       -> TLParen : tokenize cs
  (')', _)       -> TRParen : tokenize cs
  _              -> error $ "Unexpected character: " ++ [c]

Step 2: The Abstract Syntax Tree

The AST represents the structure of our program. This is where Haskell truly shines — algebraic data types model it perfectly:

module AST where

type Name = String

data Expr
  = Lit Literal
  | Var Name
  | BinOp Op Expr Expr
  | UnOp UnaryOp Expr
  | If Expr Expr Expr
  | Let Name Expr Expr
  | Lam Name Expr
  | App Expr Expr
  deriving (Show, Eq)

data Literal
  = LInt Int
  | LBool Bool
  deriving (Show, Eq)

data Op
  = Add | Sub | Mul | Div
  | Eq  | Neq | Lt  | Gt
  | And | Or
  deriving (Show, Eq)

data UnaryOp
  = Neg | Not
  deriving (Show, Eq)

Step 3: The Parser

We'll write a hand-rolled recursive descent parser. The key insight is that operator precedence maps to a hierarchy of parsing functions:

module Parser where

import AST
import Lexer

type Parser a = [Token] -> Either String (a, [Token])

-- Entry point
parse :: [Token] -> Either String Expr
parse tokens = do
  (expr, rest) <- parseExpr tokens
  case rest of
    [TEOF] -> Right expr
    _      -> Left $ "Unexpected tokens: " ++ show rest

parseExpr :: Parser Expr
parseExpr tokens = case tokens of
  (TLet : TIdent name : rest) -> do
    (val, rest2) <- parseExpr rest
    case rest2 of
      (TIn : rest3) -> do
        (body, rest4) <- parseExpr rest3
        Right (Let name val body, rest4)
      _ -> Left "Expected 'in' after let binding"

  (TIf : rest) -> do
    (cond, rest2)  <- parseExpr rest
    case rest2 of
      (TThen : rest3) -> do
        (t, rest4) <- parseExpr rest3
        case rest4 of
          (TElse : rest5) -> do
            (f, rest6) <- parseExpr rest5
            Right (If cond t f, rest6)
          _ -> Left "Expected 'else'"
      _ -> Left "Expected 'then'"

  (TFun : TIdent x : TArrow : rest) -> do
    (body, rest2) <- parseExpr rest
    Right (Lam x body, rest2)

  _ -> parseOr tokens

-- Precedence levels: or < and < cmp < add < mul < unary < app < atom
parseOr, parseAnd, parseCmp, parseAdd, parseMul :: Parser Expr
parseOr   = parseBinOp [(TOr, Or)]     parseAnd
parseAnd  = parseBinOp [(TAnd, And)]   parseCmp
parseCmp  = parseBinOp [(TEq, Eq),(TNeq, Neq),(TLt, Lt),(TGt, Gt)] parseAdd
parseAdd  = parseBinOp [(TPlus, Add),(TMinus, Sub)] parseMul
parseMul  = parseBinOp [(TStar, Mul),(TSlash, Div)] parseUnary

parseBinOp :: [(Token, Op)] -> Parser Expr -> Parser Expr
parseBinOp ops next tokens = do
  (left, rest) <- next tokens
  go left rest
  where
    go left tokens =
      case lookup (head tokens) ops of
        Just op -> do
          (right, rest) <- next (tail tokens)
          go (BinOp op left right) rest
        Nothing -> Right (left, tokens)

parseUnary :: Parser Expr
parseUnary (TMinus : rest) = do
  (e, rest2) <- parseUnary rest
  Right (UnOp Neg e, rest2)
parseUnary (TNot : rest) = do
  (e, rest2) <- parseUnary rest
  Right (UnOp Not e, rest2)
parseUnary tokens = parseApp tokens

parseApp :: Parser Expr
parseApp tokens = do
  (f, rest) <- parseAtom tokens
  go f rest
  where
    go f tokens =
      case parseAtom tokens of
        Right (arg, rest) -> go (App f arg) rest
        Left _            -> Right (f, tokens)

parseAtom :: Parser Expr
parseAtom (TInt n : rest)  = Right (Lit (LInt n), rest)
parseAtom (TBool b : rest) = Right (Lit (LBool b), rest)
parseAtom (TIdent x : rest)= Right (Var x, rest)
parseAtom (TLParen : rest) = do
  (e, rest2) <- parseExpr rest
  case rest2 of
    (TRParen : rest3) -> Right (e, rest3)
    _ -> Left "Expected closing parenthesis"
parseAtom tokens = Left $ "Unexpected token: " ++ show (head tokens)

Step 4: The Evaluator

The evaluator is the heart of the interpreter. We define a Value type for runtime values, and an Env for the environment (variable bindings):

module Eval where

import AST
import qualified Data.Map.Strict as Map

-- Runtime values
data Value
  = VInt Int
  | VBool Bool
  | VClosure Name Expr Env
  deriving (Show)

-- Environment: maps variable names to values
type Env = Map.Map Name Value

-- Evaluation
eval :: Env -> Expr -> Either String Value
eval env expr = case expr of

  Lit (LInt n)  -> Right (VInt n)
  Lit (LBool b) -> Right (VBool b)

  Var name ->
    case Map.lookup name env of
      Just v  -> Right v
      Nothing -> Left $ "Unbound variable: " ++ name

  Let name val body -> do
    v    <- eval env val
    eval (Map.insert name v env) body

  If cond t f -> do
    cv <- eval env cond
    case cv of
      VBool True  -> eval env t
      VBool False -> eval env f
      _           -> Left "Condition must be a boolean"

  Lam x body -> Right (VClosure x body env)

  App f arg -> do
    fv  <- eval env f
    av  <- eval env arg
    case fv of
      VClosure x body closureEnv ->
        eval (Map.insert x av closureEnv) body
      _ -> Left "Cannot apply non-function"

  BinOp op l r -> do
    lv <- eval env l
    rv <- eval env r
    evalBinOp op lv rv

  UnOp op e -> do
    v <- eval env e
    evalUnOp op v

evalBinOp :: Op -> Value -> Value -> Either String Value
evalBinOp op (VInt a) (VInt b) = case op of
  Add -> Right $ VInt  (a + b)
  Sub -> Right $ VInt  (a - b)
  Mul -> Right $ VInt  (a * b)
  Div -> if b == 0
           then Left "Division by zero"
           else Right $ VInt (a `div` b)
  Eq  -> Right $ VBool (a == b)
  Neq -> Right $ VBool (a /= b)
  Lt  -> Right $ VBool (a < b)
  Gt  -> Right $ VBool (a > b)
  _   -> Left "Type error in binary operation"
evalBinOp And (VBool a) (VBool b) = Right $ VBool (a && b)
evalBinOp Or  (VBool a) (VBool b) = Right $ VBool (a || b)
evalBinOp _ _ _ = Left "Type error in binary operation"

evalUnOp :: UnaryOp -> Value -> Either String Value
evalUnOp Neg (VInt n)  = Right $ VInt  (-n)
evalUnOp Not (VBool b) = Right $ VBool (not b)
evalUnOp _ _           = Left "Type error in unary operation"

Step 5: Putting It All Together

module Main where

import Lexer
import Parser
import Eval
import qualified Data.Map.Strict as Map

run :: String -> Either String Value
run source = do
  let tokens = tokenize source
  ast <- parse tokens
  eval Map.empty ast

main :: IO ()
main = do
  -- Test: factorial using recursion via let + lambda
  let program = unlines
        [ "let fact = fun n ->"
        , "  if n == 0 then 1"
        , "  else n * fact (n - 1)"
        , "in fact 10"
        ]
  case run program of
    Left err -> putStrLn $ "Error: " ++ err
    Right val -> print val

Testing It Out

Let's try a few programs:

-- Arithmetic
let x = 10 in let y = 32 in x + y
-- Result: VInt 42

-- Higher-order functions
let apply = fun f -> fun x -> f x in
let double = fun n -> n * 2 in
apply double 21
-- Result: VInt 42

-- Closures
let adder = fun n -> fun x -> n + x in
let add5 = adder 5 in
add5 37
-- Result: VInt 42

What's Missing (And How to Add It)

This interpreter deliberately omits several features to keep it focused. Here's what you'd add next:

Recursion: Our current let doesn't support recursion because the variable isn't in scope during evaluation of its own definition. Add a letrec construct or use a fix combinator.

Strings: Add LString String to Literal and handle string concatenation in evalBinOp.

Lists: Add LList [Expr] and primitive operations like head, tail, cons.

A REPL: Wrap main in a loop that reads a line, runs it, prints the result, and repeats.

Error positions: Thread source positions through the lexer and AST so error messages can show line/column numbers.


Why Haskell?

After going through this implementation, the advantages should be clear:

  • ADTs + pattern matching make AST traversal exhaustive — the compiler warns if you forget a case
  • Either for errors makes error propagation explicit and composable without exceptions
  • Immutable data means environments are naturally persistent — closures capture their scope without copying
  • The type system catches most structural bugs before you even run the code

A comparable implementation in a dynamically typed language would be shorter to write initially, but far harder to extend safely. In Haskell, adding a new expression type means the compiler tells you every place you need to handle it.


Conclusion

We've built a complete interpreter in under 300 lines of Haskell: a lexer, a recursive descent parser with proper operator precedence, and a tree-walking evaluator with closures. The full source is available on GitHub.

The next step is adding a type checker — Haskell's type system makes implementing Hindley-Milner type inference particularly elegant, and that's a story for another post.