Building a Simple Scripting Language Interpreter in Haskell from Scratch


Introduction

Haskell is perhaps the most natural language for writing interpreters and compilers. Its algebraic data types map perfectly to abstract syntax trees, pattern matching makes evaluation elegant, and the type system catches entire classes of bugs at compile time. In this post, we'll build a complete interpreter for a simple scripting language — from lexer to evaluator — entirely from scratch.

Our language, which we'll call Lark, will support:

  • Integer and boolean literals
  • Arithmetic and comparison operators
  • Variables and let bindings
  • If/else expressions
  • Lambda functions and closures
  • Recursion

Project Setup

mkdir lark && cd lark
cabal init --non-interactive

Add to your lark.cabal:

build-depends: base ^>=4.17, containers, text

Step 1: The Lexer

The lexer transforms raw source text into a stream of tokens. In Haskell, we represent tokens as an algebraic data type:

module Lexer where

data Token
  = TInt Int
  | TBool Bool
  | TIdent String
  | TPlus | TMinus | TStar | TSlash
  | TEq | TNeq | TLt | TGt
  | TLParen | TRParen
  | TLet | TIn | TIf | TThen | TElse
  | TFun | TArrow
  | TAnd | TOr | TNot
  | TEOF
  deriving (Show, Eq)

Now the lexer itself. We'll write it as a simple recursive function that consumes characters one at a time:

import Data.Char (isAlpha, isAlphaNum, isDigit, isSpace)

tokenize :: String -> [Token]
tokenize [] = [TEOF]
tokenize (c:cs)
  | isSpace c  = tokenize cs
  | isDigit c  = lexInt (c:cs)
  | isAlpha c  = lexIdent (c:cs)
  | otherwise  = lexSymbol c cs

lexInt :: String -> [Token]
lexInt s =
  let (digits, rest) = span isDigit s
  in TInt (read digits) : tokenize rest

lexIdent :: String -> [Token]
lexIdent s =
  let (word, rest) = span isAlphaNum s
      tok = case word of
              "true"  -> TBool True
              "false" -> TBool False
              "let"   -> TLet
              "in"    -> TIn
              "if"    -> TIf
              "then"  -> TThen
              "else"  -> TElse
              "fun"   -> TFun
              "and"   -> TAnd
              "or"    -> TOr
              "not"   -> TNot
              _       -> TIdent word
  in tok : tokenize rest

lexSymbol :: Char -> String -> [Token]
lexSymbol c cs = case (c, cs) of
  ('+', _)       -> TPlus   : tokenize cs
  ('-', '>':rest)-> TArrow  : tokenize rest
  ('-', _)       -> TMinus  : tokenize cs
  ('*', _)       -> TStar   : tokenize cs
  ('/', _)       -> TSlash  : tokenize cs
  ('=', '=':rest)-> TEq     : tokenize rest
  ('!', '=':rest)-> TNeq    : tokenize rest
  ('<', _)       -> TLt     : tokenize cs
  ('>', _)       -> TGt     : tokenize cs
  ('(', _)       -> TLParen : tokenize cs
  (')', _)       -> TRParen : tokenize cs
  _              -> error $ "Unexpected character: " ++ [c]

Step 2: The Abstract Syntax Tree

The AST represents the structure of our program. This is where Haskell truly shines — algebraic data types model it perfectly:

module AST where

type Name = String

data Expr
  = Lit Literal
  | Var Name
  | BinOp Op Expr Expr
  | UnOp UnaryOp Expr
  | If Expr Expr Expr
  | Let Name Expr Expr
  | Lam Name Expr
  | App Expr Expr
  deriving (Show, Eq)

data Literal
  = LInt Int
  | LBool Bool
  deriving (Show, Eq)

data Op
  = Add | Sub | Mul | Div
  | Eq  | Neq | Lt  | Gt
  | And | Or
  deriving (Show, Eq)

data UnaryOp
  = Neg | Not
  deriving (Show, Eq)

Step 3: The Parser

We'll write a hand-rolled recursive descent parser. The key insight is that operator precedence maps to a hierarchy of parsing functions:

module Parser where

import AST
import Lexer

type Parser a = [Token] -> Either String (a, [Token])

-- Entry point
parse :: [Token] -> Either String Expr
parse tokens = do
  (expr, rest) <- parseExpr tokens
  case rest of
    [TEOF] -> Right expr
    _      -> Left $ "Unexpected tokens: " ++ show rest

parseExpr :: Parser Expr
parseExpr tokens = case tokens of
  (TLet : TIdent name : rest) -> do
    (val, rest2) <- parseExpr rest
    case rest2 of
      (TIn : rest3) -> do
        (body, rest4) <- parseExpr rest3
        Right (Let name val body, rest4)
      _ -> Left "Expected 'in' after let binding"

  (TIf : rest) -> do
    (cond, rest2)  <- parseExpr rest
    case rest2 of
      (TThen : rest3) -> do
        (t, rest4) <- parseExpr rest3
        case rest4 of
          (TElse : rest5) -> do
            (f, rest6) <- parseExpr rest5
            Right (If cond t f, rest6)
          _ -> Left "Expected 'else'"
      _ -> Left "Expected 'then'"

  (TFun : TIdent x : TArrow : rest) -> do
    (body, rest2) <- parseExpr rest
    Right (Lam x body, rest2)

  _ -> parseOr tokens

-- Precedence levels: or < and < cmp < add < mul < unary < app < atom
parseOr, parseAnd, parseCmp, parseAdd, parseMul :: Parser Expr
parseOr   = parseBinOp [(TOr, Or)]     parseAnd
parseAnd  = parseBinOp [(TAnd, And)]   parseCmp
parseCmp  = parseBinOp [(TEq, Eq),(TNeq, Neq),(TLt, Lt),(TGt, Gt)] parseAdd
parseAdd  = parseBinOp [(TPlus, Add),(TMinus, Sub)] parseMul
parseMul  = parseBinOp [(TStar, Mul),(TSlash, Div)] parseUnary

parseBinOp :: [(Token, Op)] -> Parser Expr -> Parser Expr
parseBinOp ops next tokens = do
  (left, rest) <- next tokens
  go left rest
  where
    go left tokens =
      case lookup (head tokens) ops of
        Just op -> do
          (right, rest) <- next (tail tokens)
          go (BinOp op left right) rest
        Nothing -> Right (left, tokens)

parseUnary :: Parser Expr
parseUnary (TMinus : rest) = do
  (e, rest2) <- parseUnary rest
  Right (UnOp Neg e, rest2)
parseUnary (TNot : rest) = do
  (e, rest2) <- parseUnary rest
  Right (UnOp Not e, rest2)
parseUnary tokens = parseApp tokens

parseApp :: Parser Expr
parseApp tokens = do
  (f, rest) <- parseAtom tokens
  go f rest
  where
    go f tokens =
      case parseAtom tokens of
        Right (arg, rest) -> go (App f arg) rest
        Left _            -> Right (f, tokens)

parseAtom :: Parser Expr
parseAtom (TInt n : rest)  = Right (Lit (LInt n), rest)
parseAtom (TBool b : rest) = Right (Lit (LBool b), rest)
parseAtom (TIdent x : rest)= Right (Var x, rest)
parseAtom (TLParen : rest) = do
  (e, rest2) <- parseExpr rest
  case rest2 of
    (TRParen : rest3) -> Right (e, rest3)
    _ -> Left "Expected closing parenthesis"
parseAtom tokens = Left $ "Unexpected token: " ++ show (head tokens)

Step 4: The Evaluator

The evaluator is the heart of the interpreter. We define a Value type for runtime values, and an Env for the environment (variable bindings):

module Eval where

import AST
import qualified Data.Map.Strict as Map

-- Runtime values
data Value
  = VInt Int
  | VBool Bool
  | VClosure Name Expr Env
  deriving (Show)

-- Environment: maps variable names to values
type Env = Map.Map Name Value

-- Evaluation
eval :: Env -> Expr -> Either String Value
eval env expr = case expr of

  Lit (LInt n)  -> Right (VInt n)
  Lit (LBool b) -> Right (VBool b)

  Var name ->
    case Map.lookup name env of
      Just v  -> Right v
      Nothing -> Left $ "Unbound variable: " ++ name

  Let name val body -> do
    v    <- eval env val
    eval (Map.insert name v env) body

  If cond t f -> do
    cv <- eval env cond
    case cv of
      VBool True  -> eval env t
      VBool False -> eval env f
      _           -> Left "Condition must be a boolean"

  Lam x body -> Right (VClosure x body env)

  App f arg -> do
    fv  <- eval env f
    av  <- eval env arg
    case fv of
      VClosure x body closureEnv ->
        eval (Map.insert x av closureEnv) body
      _ -> Left "Cannot apply non-function"

  BinOp op l r -> do
    lv <- eval env l
    rv <- eval env r
    evalBinOp op lv rv

  UnOp op e -> do
    v <- eval env e
    evalUnOp op v

evalBinOp :: Op -> Value -> Value -> Either String Value
evalBinOp op (VInt a) (VInt b) = case op of
  Add -> Right $ VInt  (a + b)
  Sub -> Right $ VInt  (a - b)
  Mul -> Right $ VInt  (a * b)
  Div -> if b == 0
           then Left "Division by zero"
           else Right $ VInt (a `div` b)
  Eq  -> Right $ VBool (a == b)
  Neq -> Right $ VBool (a /= b)
  Lt  -> Right $ VBool (a < b)
  Gt  -> Right $ VBool (a > b)
  _   -> Left "Type error in binary operation"
evalBinOp And (VBool a) (VBool b) = Right $ VBool (a && b)
evalBinOp Or  (VBool a) (VBool b) = Right $ VBool (a || b)
evalBinOp _ _ _ = Left "Type error in binary operation"

evalUnOp :: UnaryOp -> Value -> Either String Value
evalUnOp Neg (VInt n)  = Right $ VInt  (-n)
evalUnOp Not (VBool b) = Right $ VBool (not b)
evalUnOp _ _           = Left "Type error in unary operation"

Step 5: Putting It All Together

module Main where

import Lexer
import Parser
import Eval
import qualified Data.Map.Strict as Map

run :: String -> Either String Value
run source = do
  let tokens = tokenize source
  ast <- parse tokens
  eval Map.empty ast

main :: IO ()
main = do
  -- Test: factorial using recursion via let + lambda
  let program = unlines
        [ "let fact = fun n ->"
        , "  if n == 0 then 1"
        , "  else n * fact (n - 1)"
        , "in fact 10"
        ]
  case run program of
    Left err -> putStrLn $ "Error: " ++ err
    Right val -> print val

Testing It Out

Let's try a few programs:

-- Arithmetic
let x = 10 in let y = 32 in x + y
-- Result: VInt 42

-- Higher-order functions
let apply = fun f -> fun x -> f x in
let double = fun n -> n * 2 in
apply double 21
-- Result: VInt 42

-- Closures
let adder = fun n -> fun x -> n + x in
let add5 = adder 5 in
add5 37
-- Result: VInt 42

What's Missing (And How to Add It)

This interpreter deliberately omits several features to keep it focused. Here's what you'd add next:

Recursion: Our current let doesn't support recursion because the variable isn't in scope during evaluation of its own definition. Add a letrec construct or use a fix combinator.

Strings: Add LString String to Literal and handle string concatenation in evalBinOp.

Lists: Add LList [Expr] and primitive operations like head, tail, cons.

A REPL: Wrap main in a loop that reads a line, runs it, prints the result, and repeats.

Error positions: Thread source positions through the lexer and AST so error messages can show line/column numbers.


Why Haskell?

After going through this implementation, the advantages should be clear:

  • ADTs + pattern matching make AST traversal exhaustive — the compiler warns if you forget a case
  • Either for errors makes error propagation explicit and composable without exceptions
  • Immutable data means environments are naturally persistent — closures capture their scope without copying
  • The type system catches most structural bugs before you even run the code

A comparable implementation in a dynamically typed language would be shorter to write initially, but far harder to extend safely. In Haskell, adding a new expression type means the compiler tells you every place you need to handle it.


Conclusion

We've built a complete interpreter in under 300 lines of Haskell: a lexer, a recursive descent parser with proper operator precedence, and a tree-walking evaluator with closures. The full source is available on GitHub.

The next step is adding a type checker — Haskell's type system makes implementing Hindley-Milner type inference particularly elegant, and that's a story for another post.


Read more

伊斯法罕的石榴

壹·两个名字 她有两个名字。 一个叫莎赫拉(شهلا),波斯语,意为"黑眸"。这是母亲给的,在伊斯法罕朱法区的犹太会堂里,拉比念诵祝祷词时,母亲抱着她,对父亲说:"你看她的眼睛,像石榴籽一样黑。" 另一个叫希拉(שִׁירָה),希伯来语,意为"诗歌"。这是她到特拉维夫的第三天,在移民局的铁皮桌前自己选的。办事员问她要不要改一个希伯来名字,她想了想,说:"希拉。" 办事员没问为什么。 她也没解释。 只有她自己知道——母亲在伊斯法罕的庭院里,总爱低声哼一首波斯古诗。哈菲兹的。调子缠绵,像扎因代河的水,绕过三十三孔桥,流远了,还能听见。 她想留住那个调子。 所以选了"

By yuki

慕容复国记

一则燕祚再兴的架空笑谈,兼致某个地中海沿岸的平行宇宙 引子 天下苦慕容复久矣。 不是苦他作恶,是苦他执念。 想那姑苏慕容氏,自五胡乱华以降,前燕、后燕、南燕、西燕,起起落落,兴亡如翻饼。到得慕容复这一代,手中既无一城一地,帐下亦无一兵一卒,偏偏还揣着一颗滚烫的复国心,逢人便谈"兴复大燕",说得旁人尴尬,自己倒慷慨激昂。 江湖人士评曰:"此人武功尚可,唯患复国癫。" 段誉叹息,王语嫣摇头,包不同欲言又止,邓百川默默饮酒。 然而—— 诸君且住。 倘若天道好还,气运轮转,慕容复当真复了国呢? 这故事,便从一封密信说起。 第一章:应许之地 宋哲宗元祐六年,暮春。 慕容复独坐燕子坞琴房,案上摊着一幅泛黄的舆图——大燕故疆,自辽东至河北,横亘千里。他用朱笔圈了又圈,

By yuki

燕王本纪外传·慕容垂复国记

太元九年,秦师败绩于淝水。 天下哗然。 苻坚大帝——那位曾以百万之众投鞭断流、豪言"我之兵力,投鞭于江,足断其流"的旷世英雄——率师南下,结果被谢玄那帮北府兵打了个落花流水,仓皇北顾,连皇帝袍子都险些留在淮南。 慕容垂在军中听闻此讯,面不改色,心中却已盘算好了第七十二套方案。 垂字道明,小字阿六敦,慕容氏之麒麟子也。 此公前半生,堪称中古史上最高规格的受气包—— 二十岁,以少胜多,大破宇文部,班师献捷,群臣妒之。 三十岁,大破段部,威震辽东,太后慕容可足浑氏怒之。 四十岁,以五千破晋军三万,时论以为"慕容垂一出,诸将皆废",于是诸将联名弹劾之。 兄长慕容评——那位以斗量金、以车载银、聚财无算的大司马——尤恨之入骨,欲借刀杀之。 垂乃叹曰: "大丈夫立功天地之间,

By yuki

超导:电阻消失背后的量子秘密

引言 1911年,荷兰物理学家昂内斯(Heike Kamerlingh Onnes)在将汞冷却到4.2K(约-269°C)时,发现其电阻突然完全消失。这一现象被命名为超导(Superconductivity)。 电阻为零,意味着电流可以在超导体中永久流动而不损耗任何能量。这不是电阻"很小",而是精确为零——实验上已验证超导电流的衰减时间超过10万年。 超导不只是一个工程奇迹,更是量子力学在宏观尺度上最壮观的表现之一。理解超导,需要从量子力学的深层结构出发。 一、超导的基本现象 超导体有两个标志性特征,缺一不可: 1.1 零电阻 普通金属的电阻来自电子与晶格振动(声子)和杂质的碰撞。温度越低,碰撞越少,电阻越小——但在普通金属中,电阻只是趋近于零,永远不会精确为零。 超导体在转变温度(临界温度 $T_c$)以下,电阻精确为零。这不是量的差异,而是质的相变。

By yuki