lean-pl-tutorials/tutorial-02-semantics/08-syntax-representation.org

* Unit 8 --- Representing Syntax
:PROPERTIES:
:CUSTOM_ID: unit-8-representing-syntax
:END:
*Tutorial 2: PL Semantics in Lean* · [[../README.org][← Back to README]]

** Goals
:PROPERTIES:
:CUSTOM_ID: goals
:END:
- Encode lambda calculus terms as an inductive type
- Understand three binding representations:
  1. *Named* (strings --- simple, but α-equiv isn't definitional)
  2. *de Bruijn indices* (numbers --- α-equiv is free, shifting is painful)
  3. *Locally nameless* (free vars named, bound vars indexed --- compromise)

We'll use de Bruijn indices (the "heavy lifter") for the rest of this tutorial,
with locally nameless for comparison.

** Sources
:PROPERTIES:
:CUSTOM_ID: sources
:END:
- syndikos/lean4-stlc =Syntax.lean=: https://github.com/syndikos/lean4-stlc
- Chris Henson 2025: https://chrishenson.net/posts/2025-05-10-formalized_lambda_calculus.html
- chenson2018/LeanScratch: https://github.com/chenson2018/LeanScratch
- Software Foundations Vol.2: https://softwarefoundations.cis.upenn.edu/

** Exercises
:PROPERTIES:
:CUSTOM_ID: exercises
:END:
#+begin_src lean
-- 8.1 — Named representation
inductive NamedTerm where
  | var (x : String)
  | lam (x : String) (body : NamedTerm)
  | app (f arg : NamedTerm)
deriving Repr

-- The Church encoding of identity: λx. x
def idNamed : NamedTerm := NamedTerm.lam "x" (NamedTerm.var "x")

-- Encode λx. λy. x  (K combinator)
def kNamed : NamedTerm :=
  sorry

-- Encode λf. λx. f (f x)  (Church numeral 2)
def twoNamed : NamedTerm :=
  sorry

-- 8.2 — de Bruijn representation
-- Variables are numbers: 0 = nearest binder, 1 = next, etc.
inductive DBTerm where
  | var (idx : Nat)    -- variable reference by binding distance
  | lam (body : DBTerm) -- λ. body  (no name needed!)
  | app (f arg : DBTerm)
deriving Repr

-- λ. λ. 1  (= λx. λy. x  in named form)
def kDB : DBTerm := DBTerm.lam (DBTerm.lam (DBTerm.var 1))

-- λ. 0  (= λx. x  in named form)
def idDB : DBTerm := DBTerm.lam (DBTerm.var 0)

-- Encode λf. λx. f (f x)  (Church 2)
def twoDB : DBTerm :=
  sorry

-- 8.3 — Locally nameless
-- Free variables are strings, bound variables are de Bruijn indices
-- (You don't need to implement this fully — just understand the idea)
inductive LNTerm where
  | fvar (x : String)   -- free variable
  | bvar (idx : Nat)    -- bound variable (de Bruijn)
  | lam (body : LNTerm) -- binder
  | app (f arg : LNTerm)
deriving Repr
#+end_src

*** Key insight for PL semantics
:PROPERTIES:
:CUSTOM_ID: key-insight-for-pl-semantics
:END:
When we encode *typing contexts* =Γ = x₁:τ₁, x₂:τ₂, ...=, de Bruijn indices
give us "index into the context" for free. The last binding is index 0, the
second-last is index 1, etc. This makes the typing rules elegant in Lean ---
no name-clash avoidance needed.

--------------

← [[../tutorial-01-basics/07-dependent-types.org][Tutorial 1 --- Unit 7]] · Next: [[file:09-substitution.org][Unit 9 --- Substitution]]