SydML
Table of Contents
- 1. About
- 2. Notice board
- 3. Motivation
- 4. Project structure
- 5. Compiler overview
- 6. Languages
- 7. Progress
- 8. Resources
- 8.1. Reference projects
- 8.2. Parsing
- 8.3. IRs — SSA, ANF, and CPS
- 8.3.1. SSA vs ANF: source code, slides.
- 8.3.2. Reddit discussion about ANF between Hécate and Colin
- 8.3.3. [X] Matt Might — A-Normalization: Why and How
- 8.3.4. SSA is Functional Programming
- 8.3.5. [X] How to compile with continuations
- 8.3.6. Appel — Continuation Passing, Closure Passing Style
- 8.3.7. Compiling with Continuations and LLVM
- 8.3.8. [-] [#A] Compiling without continuations
- 8.3.9. Compiling with Continuations, Continued
- 8.3.10. Slides implementing Compiling with Continuations, Continued
- 8.3.11. Compiling with CPS
- 8.3.12. Original ANF paper: The Essence of Compiling with Continuations
- 8.3.13. Email discussing MLton's IR and some history
- 8.3.14. Comparing CPS, ANF, and SSA for compiling functional languages
- 8.3.15. A Functional Perspective on SSA Optimisation Algorithms
- 8.3.16. The CPS/ANF saga
- 8.3.17. Tarditi — Design and Implementation of Code Optimiziations for a Type-Directed Compiler for Standard ML (ANF)
- 8.3.18. Paper summarizing the Tarditi thesis
- 8.3.19. A Correspondence between CPS and SSA
- 8.3.20. [X] Colin — Compiling the λ-calculus
- 8.4. Compiler Architecture
- 8.4.1. rustc demand-driven-compilation
- 8.4.2. [X] Query-based compiler architectures
- 8.4.3. [X] Three Architectures for a Responsive IDE
- 8.4.4. Practical Algorithms for Incremental Software Development Environments
- 8.4.5. Query-based compiler architectures TALK
- 8.4.6. A journey through incremental computation TALK
- 8.4.7. Making Fast Incremental Compiler for Huge Codebase TALK
- 8.4.8. How to load 1m lines of Ruby in 5s TALK
- 8.4.9. Compilers are Databases TALK
- 8.4.10. Anders Hejlsberg on Modern Compiler Construction TALK
- 8.4.11. Responsive compilers TALK
- 8.5. Miscellaneous
- 8.5.1. [-] [#A] Lessons from Writing a Compiler
- 8.5.2. [X] Matt Might — Closure Conversion
- 8.5.3. Towards a practical programming language based on dependent type theory
- 8.5.4. Implementing a JIT Compiled Language with Haskell and LLVM
- 8.5.5. [X] Build systems à la carte: Theory and Practice
- 8.5.6. A Reddit thread about functional IRs
- 8.5.7. Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages
- 8.5.8. Custom Cabal builds
- 8.5.9. Nix tree-sitter examples
- 8.5.10. Tarditi — No Assembly Required: Compiling SML to C
1. About
SydML is currently planned to be a functional programming language with strict semantics.
Github renders this document very poorly, which conflicts with my style of organisation. See the rendered HTML here.
2. Notice board
2.1. Productivity
I've been off my meds since
, and caffeine lost its kick about a week-and-a-half ago; consequently, my productivity has plummeted.His colleague Alfréd Rényi said, "a mathematician is a machine for turning coffee into theorems", and Erd ős drank copious quantities; this quotation is often attributed incorrectly to Erdős, but Erdős himself ascribed it to Rényi. After his mother's death in 1971 he started taking antidepressants and amphetamines, despite the concern of his friends, one of whom (Ron Graham) bet him $500 that he could not stop taking them for a month. Erdős won the bet, but complained that it impacted his performance: "You've showed me I'm not an addict. But I didn't get any work done. I'd get up in the morning and stare at a blank piece of paper. I'd have no ideas, just like an ordinary person. You've set mathematics back a month." After he won the bet, he promptly resumed his use of Ritalin and Benzedrine.
2.2. We're so back!!!
I am medicated again, for the first time in about a month!
3. Motivation
3.1. Demonstrate that I've learned from last year's failures
The entire project collapsed at the last minute, likely as a result of the following points.
3.1.1. Very poor planning.
A lack of thorough planning led to extreme unstability, ultimately leaving me with a broken compiler days before the project was due.
To prevent this, I intend to thoroughly architect the compiler and each pass beforehand, down to the individual signatures of key components. This plan would be sure to account for not only high-level passes, but also
- where errors are thrown and caught,
- how line and column information is retained,
3.1.2. Not enough testing.
The volatile codebase did not mesh will with my tests against internal APIs. The few tests that I had could not keep up with the rapidly-changing code, and often didn't even compile.
To address this, tests will be written against a stable command-line interface, treating the entire compiler as a black box.
- I plan on following a Golden testing model.
- For source–to-machine-code compilations, this is a straightforward check that an example input matches an output file, bit-for-bit.
- Testing other functions, such as type-checking, will require serialisation to (JSON|S-expressions), which is useful feature for debugging in any case.
- Github CI/CD.
3.2. New goals
While last year's compiler was primarily language-focused, I now intend to focus on compiler techniques.
3.2.1. IRs — ANF and SSA
4. Project structure
4.1. sydml
4.1.1. app
4.1.2. src
4.1.3. test
4.2. presydml
4.3. vendor
Vendored dependencies.
4.3.1. tree-sitter
The tree-sitter bindings for Haskell leave a lot to be desired. My patches include the following:
- Add binding to
ts_node_string()
. - Add bindings to the "edit" API.
4.3.2. qbe
This library provides a Haskell representation of QBE's IL. Patches include the following:
- Bump the version bounds of
base
,text
,bytestring
, anddeepseq
. - Add support for allocation instructions. This patch is the work of Ellis
Kesterton, whose merge request to
qbe-hs
has been idle upstream for over a year.
4.3.3. compiler-explorer
A tool for viewing compiler output and IRS. My patches include:
- Add a Nix flake.
- Add support for presydml.
4.4. doc
Documentation. Currently, this solely serves as a home for the rendered HTML of the README.
4.5. tree-sitter-sydml
Houses Sydc's parser.
5. Compiler overview
Pass | Produced IL | Module type | |
---|---|---|---|
External | |||
Source | |||
Parse (tree-sitter) | Tree-sitter AST | ||
Sydc | |||
Parse | AST | Surface.Module Parse |
|
Rename | AST | Surface.Module Rename |
|
Type-check | Typed AST | TBD | |
Elaborate | Core | Core.Module |
|
ToANF | ANF | ANF.Module |
|
Closure-convert | ANF | ANF.Module |
|
LambdaLift | Lambda-lifted ANF | ANF.LambdaLiftedModule |
|
ToQBE | QBE | QBE.Program |
|
Export | |||
External | |||
qbe | Assembly | ||
($CC) |
Machine |
6. Languages
6.1. SydML
6.1.1. Syntax
- We use Haskell-style layouts, meaning that one can group productions by either alignment, or semicolons and braces.
6.2. Core
Core is an explicitly-typed IL based on System Fω. Internally
e ::= x Variables | λx:τ. e Term abstractions | Λx:κ. e Type abstractions | e e Term applications | e @τ Type applications | let x = e in e Let-bindings | letrec (x = e)* in e Recursive let-bindings | case x of {P → e}* Case-expressions τ ::= K Type constructors | τ τ Constructor application | τ → τ Function types | ∀x:k. τ Universal quantification k ::= Type The kind of all types | k → k Higher kinds
6.3. ANF
6.4. QBE
7. Progress
Headline | Time | ||
---|---|---|---|
Total time | 3d 17:32 | ||
Progress | 3d 17:32 | ||
Milestones | 3d 6:41 | ||
0 Preemptive Planning [6/6] | 5:29 | ||
1 Research Toys [7/7] | 1d 18:42 | ||
2 Layer 1 [6/7] | 1d 0:41 | ||
3 The Empty Module [0/8] | 5:49 | ||
Other | 10:51 | ||
Nix setup | 7:26 | ||
Documentation | 1:44 | ||
GitHub Pages action | 0:26 |
7.1. Milestones
Milestone | Parser | Elab | Type-checker | ->ANF | Closure-conv. | Hoist | ->QBE |
---|---|---|---|---|---|---|---|
Empty module | |||||||
Int literal decl | |||||||
if 1 then 2 else 3 | |||||||
The identity fn |
7.1.1. DONE 0 Preemptive Planning [6/6]
DEADLINE:
- DONE MLton-esque overview.
- DONE Create a skeleton project.
- DONE Milestone table.
- DONE Define a CLI.
- DONE Write tests for empty files.
- Current blocker: I'm not sure how to uniformly run tests both against
black-box executables, as well as typical unit tests. Ideally, both would be
run by
cabal test
. Potential solution:
main :: IO () - main = ... + main = getArgs >>= main' + main' :: List String -> IO () + main' xs = do + opts <- execParserPure _ _ + ...
- Current blocker: I'm not sure how to uniformly run tests both against
black-box executables, as well as typical unit tests. Ideally, both would be
run by
- DONE Define Milestone 2.
7.1.2. DONE 1 Research Toys [7/7]
Implement the core algorithms as toys with toothpick scaffolding.
- After implementing closure-conversion, I think it may be a good idea to initially work uncurried by default, à la SML, and implement the necessary optimisations for curry-heavy code later.
- The value of statically-typed IRs is not to be undermined.
[X]
Known issue: There is some problem with closure conversion. Try compiling:
λ> pretty . upToConvert $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` ((Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 3)
- DONE The Untyped λ-calculus
- Augmented with tuples.
- DONE Rename
- DONE LowerANF
- DONE Closure-convert
- DONE Hoist
- DONE ToQBE
As of this commit, the toy compiler can compile its first program:
λ> :m + *Presydc.SSA λ> writePipeline "/tmp/t/t.qbe" $ Lam.IntVal 123
$ nix-shell -p qbe gcc [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t ; echo $? 123
As of this commit, it can compile arithmetic primitives and impure
print
calls:λ> :m + *Presydc.SSA λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.Prim $ Lam.PrimAdd (Lam.IntVal 2) (Lam.IntVal 3)
$ nix-shell -p qbe gcc [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t 5
As of this commit, it can compile trivial conditional expressions:
λ> writePipeline "/tmp/t/t.qbe" $ Lam.IfThenElse (Lam.IntVal 1) (Lam.IntVal 2) (Lam.IntVal 3)
[nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t ; echo $? 2
As of this commit, the factorial program sucessfully compiles and executes. Two hidden bugs were fixed in the process:
freeVars
did not function as expects on application nodes.freeVars (LetApp x f ys m) = toHashSetOf (each . #Var) ys + <> HS.singleton f <> (freeVars m & HS.delete x)
Three typos in the
hoist
function led to both branches of an if-expression being the unhoisted 'true' branch — I have no idea how this happened. :Pgo (IfThenElse c t f) = do Hoisted fs js t' <- go t - Hoisted fs' js' t' <- go f + Hoisted fs' js' f' <- go f (th,el) <- each mkFresh ("then","else") - let bt = Join th Nothing t + let bt = Join th Nothing t' - bf = Join el Nothing t + bf = Join el Nothing f' pure $ Hoisted (fs >< fs') (Seq.fromList [bt,bf] >< js >< js') $ IfThenElse c (Jump th Nothing) (Jump el Nothing)
-- (λf. (λx. f (λv. x x v)) (λx. f (λv. x x v))) (λ rec n. if n then n * rec (n - 1) else 1) 3 ) λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` (Lam.Lam "rec" $ Lam.Lam "n" $ Lam.IfThenElse (Lam.Var "n") (Lam.Prim $ PrimMul (Lam.Var "n") (Lam.Var "rec" `Lam.App` (Lam.Prim $ PrimSub (Lam.Var "n") (Lam.IntVal 1)))) (Lam.IntVal 1)) `Lam.App` Lam.IntVal 3
As of this commit, it can compile closures and applications thereupon, marking the completion of the ToQBE pass!
λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ (Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 1 `Lam.App` Lam.IntVal 2
[nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t [nix-shell]$ ./t 1
- DONE Demo
- Maybe using godbolt?
7.1.3. TODO 2 Layer 1 [6/7]
Round three of planning. Define the core data structures, signatures for functions between them, and tests for those functions.
- DONE Big decisions!
- DONE Core data structures
[9/9]
- DONE
Syd
monad
- DONE Error effect and ADT
- DONE
Located
- DONE Surface AST
- DONE Core AST
- DONE ANF AST
- KILL Closure-converted AST
- KILL Hoisted AST
- KILL CodeGen record à la Idris
- DONE
- DONE Pass signatures
[6/6]
- KILL Debug flags
[0/7]
Pushed back. These can be implemented once the ASTs in question are more fleshed out.
- DONE Tests
- DONE CI/CD
- w/ omnix
- TODO Record demo
7.2. Other
I am not convinced that some of these tasks should count towards "work time;" regardless, I choose to log them.
7.2.1. Nix setup
7.2.2. Documentation
7.2.3. GitHub Pages action
8. Resources
8.1. Reference projects
8.1.1. sixten/sixty
- the latter is a rewrite of the former.
- the leading example of demand-driven compilers.
- implemented in glorious haskell.
8.1.2. eclair
- implements datalog.
- targets llvm.
- demand-driven, using rock.
- implemented in glorious haskell.
8.1.3. hnix
- an interpreter for the nix language.
- noted only for its use of recursion schemes.
- implemented in glorious haskell.
8.1.4. Hécate — boreal
The lovely Hécate's boreal
compiler has been an excelent reference. Many of our
design decisions align, which has been invaluably convenient for myself }:3.
- parses with tree-sitter.
- ANF-based IR.
- demand-driven.
- haskell-like source language.
- client-server.
- targets lua.
- implemented in glorious haskell.
8.1.5. elm
- targets JS.
- haskell-like.
- written by a moron. 💔
- implemented in glorious haskell.
8.1.6. purescript
- haskell-like source language.
- implemented in glorious haskell.
8.1.7. MLton SML
- many different IRs.
- whole-program optimising.
- pretty high-level decent documentation.
- implemented in SML.
8.1.8. Moscow ML SML
8.1.9. HaMLet SML
- Self-described as a reference implementation.
8.1.10. amy
- targets LLVM.
- ANF-based IR.
- strict.
- otherwise haskell-like.
- implemented in glorious haskell.
8.1.12. Essentials of Compilation
- A book about compiling Racket and Python to x86-64 assembly.
8.3. IRs — SSA, ANF, and CPS
8.3.1. SSA vs ANF: source code, slides.
8.3.3. [X] Matt Might — A-Normalization: Why and How
8.3.5. [X] How to compile with continuations
8.3.8. [-] [#A] Compiling without continuations
Describes ANF and join points as used in GHC.
- talk
- Formalises join points with typing, operational semantics, and syntax.
- Describes heavyweight optimisations for join points.
- The paper's implementation allows for recursive join points.
8.3.11. Compiling with CPS
8.3.16. The CPS/ANF saga
8.3.17. Tarditi — Design and Implementation of Code Optimiziations for a Type-Directed Compiler for Standard ML (ANF)
This thesis emphasizes a "pay as you go" compilation strategy where simple, monorphic code should not be slower simply because the language supports higher-level code. Also, monomorphic functions should get specialized to machine-type functions. C compilers focus on loops, so functional compilers should focus on recursive functions. Explains lots of optimizations with a lambda calculus IR
8.3.20. [X] Colin — Compiling the λ-calculus
8.4. Compiler Architecture
8.4.2. [X] Query-based compiler architectures
8.4.3. [X] Three Architectures for a Responsive IDE
8.4.5. Query-based compiler architectures TALK
8.4.6. A journey through incremental computation TALK
8.4.8. How to load 1m lines of Ruby in 5s TALK
8.4.9. Compilers are Databases TALK
8.4.10. Anders Hejlsberg on Modern Compiler Construction TALK
8.4.11. Responsive compilers TALK
8.5. Miscellaneous
8.5.1. [-] [#A] Lessons from Writing a Compiler
Covers a lot of topics that textbooks miss.
8.5.2. [X] Matt Might — Closure Conversion
8.5.6. A Reddit thread about functional IRs
Nearly all of their linked resources are included in this document.
8.5.8. Custom Cabal builds
- Will be necessary to build tree-sitter-sydml.