SydML

Table of Contents

1. About

SydML is currently planned to be a functional programming language with strict semantics.

Github renders this document very poorly, which conflicts with my style of organisation. See the rendered HTML here.

2. Notice board

2.1. [2024-09-18 Wed] Productivity

I've been off my meds since [2024-08-20 Tue], and caffeine lost its kick about a week-and-a-half ago; consequently, my productivity has plummeted.

His colleague Alfréd Rényi said, "a mathematician is a machine for turning coffee into theorems", and Erd ős drank copious quantities; this quotation is often attributed incorrectly to Erdős, but Erdős himself ascribed it to Rényi. After his mother's death in 1971 he started taking antidepressants and amphetamines, despite the concern of his friends, one of whom (Ron Graham) bet him $500 that he could not stop taking them for a month. Erdős won the bet, but complained that it impacted his performance: "You've showed me I'm not an addict. But I didn't get any work done. I'd get up in the morning and stare at a blank piece of paper. I'd have no ideas, just like an ordinary person. You've set mathematics back a month." After he won the bet, he promptly resumed his use of Ritalin and Benzedrine.

2.2. [2024-09-20 Fri] We're so back!!!

I am medicated again, for the first time in about a month!

3. Motivation

3.1. Demonstrate that I've learned from last year's failures

The entire project collapsed at the last minute, likely as a result of the following points.

3.1.1. Very poor planning.

A lack of thorough planning led to extreme unstability, ultimately leaving me with a broken compiler days before the project was due.

To prevent this, I intend to thoroughly architect the compiler and each pass beforehand, down to the individual signatures of key components. This plan would be sure to account for not only high-level passes, but also

  • where errors are thrown and caught,
  • how line and column information is retained,

3.1.2. Not enough testing.

The volatile codebase did not mesh will with my tests against internal APIs. The few tests that I had could not keep up with the rapidly-changing code, and often didn't even compile.

To address this, tests will be written against a stable command-line interface, treating the entire compiler as a black box.

  • I plan on following a Golden testing model.
  • For source–to-machine-code compilations, this is a straightforward check that an example input matches an output file, bit-for-bit.
  • Testing other functions, such as type-checking, will require serialisation to (JSON|S-expressions), which is useful feature for debugging in any case.
  • Github CI/CD.

3.2. New goals

While last year's compiler was primarily language-focused, I now intend to focus on compiler techniques.

3.2.1. IRs — ANF and SSA

4. Project structure

4.1. sydml

4.1.1. app

4.1.2. src

4.1.3. test

4.2. presydml

4.3. vendor

Vendored dependencies.

4.3.1. tree-sitter

The tree-sitter bindings for Haskell leave a lot to be desired. My patches include the following:

  • Add binding to ts_node_string().
  • Add bindings to the "edit" API.

4.3.2. qbe

This library provides a Haskell representation of QBE's IL. Patches include the following:

  • Bump the version bounds of base, text, bytestring, and deepseq.
  • Add support for allocation instructions. This patch is the work of Ellis Kesterton, whose merge request to qbe-hs has been idle upstream for over a year.

4.3.3. compiler-explorer

A tool for viewing compiler output and IRS. My patches include:

  • Add a Nix flake.
  • Add support for presydml.

4.4. doc

Documentation. Currently, this solely serves as a home for the rendered HTML of the README.

4.5. tree-sitter-sydml

Houses Sydc's parser.

5. Compiler overview

  Pass Produced IL Module type
External      
    Source  
  Parse (tree-sitter) Tree-sitter AST  
Sydc      
  Parse AST Surface.Module Parse
  Rename AST Surface.Module Rename
  Type-check Typed AST TBD
  Elaborate Core Core.Module
  ToANF ANF ANF.Module
  Closure-convert ANF ANF.Module
  LambdaLift Lambda-lifted ANF ANF.LambdaLiftedModule
  ToQBE QBE QBE.Program
  Export    
External      
  qbe Assembly  
  ($CC) Machine  

6. Languages

6.1. SydML

6.1.1. Syntax

  • We use Haskell-style layouts, meaning that one can group productions by either alignment, or semicolons and braces.

6.2. Core

Core is an explicitly-typed IL based on System Fω. Internally

e ::= x                     Variables
   |  λx:τ. e               Term abstractions
   |  Λx:κ. e               Type abstractions
   |  e e                   Term applications
   |  e @τ                  Type applications
   |  let x = e in e        Let-bindings
   |  letrec (x = e)* in e  Recursive let-bindings
   |  case x of {P → e}*    Case-expressions

τ ::= K                     Type constructors
   |  τ τ                   Constructor application
   |  τ → τ                 Function types
   |  ∀x:k. τ               Universal quantification

k ::= Type                  The kind of all types
   |  k → k                 Higher kinds

6.3. ANF

6.4. QBE

7. Progress

Table 1: Clock summary at [2024-09-21 Sat 03:26]
Headline Time    
Total time 3d 17:32    
Progress 3d 17:32    
  Milestones   3d 6:41  
    0 Preemptive Planning [6/6]     5:29
    1 Research Toys [7/7]     1d 18:42
    2 Layer 1 [6/7]     1d 0:41
    3 The Empty Module [0/8]     5:49
  Other   10:51  
    Nix setup     7:26
    Documentation     1:44
    GitHub Pages action     0:26

7.1. Milestones

Milestone Parser Elab Type-checker ->ANF Closure-conv. Hoist ->QBE
Empty module              
Int literal decl              
if 1 then 2 else 3              
The identity fn              

7.1.1. DONE 0 Preemptive Planning [6/6]

DEADLINE: <2024-09-03 Tue>

  1. DONE MLton-esque overview.
  2. DONE Create a skeleton project.
  3. DONE Milestone table.
  4. DONE Define a CLI.
  5. DONE Write tests for empty files.
    • Current blocker: I'm not sure how to uniformly run tests both against black-box executables, as well as typical unit tests. Ideally, both would be run by cabal test.
    • Potential solution:

        main :: IO ()
      - main = ...
      + main = getArgs >>= main'
      
      + main' :: List String -> IO ()
      + main' xs = do
      +   opts <- execParserPure _ _
      +   ...
      
  6. DONE Define Milestone 2.

7.1.2. DONE 1 Research Toys [7/7]

Implement the core algorithms as toys with toothpick scaffolding.

  • After implementing closure-conversion, I think it may be a good idea to initially work uncurried by default, à la SML, and implement the necessary optimisations for curry-heavy code later.
  • The value of statically-typed IRs is not to be undermined.
  • [X]

    Known issue: There is some problem with closure conversion. Try compiling:

    λ> pretty . upToConvert $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` ((Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 3)
    
  1. DONE The Untyped λ-calculus
    • Augmented with tuples.
  2. DONE Rename
  3. DONE LowerANF
  4. DONE Closure-convert
  5. DONE Hoist
  6. DONE ToQBE
    • As of this commit, the toy compiler can compile its first program:

      λ> :m + *Presydc.SSA
      λ> writePipeline "/tmp/t/t.qbe" $ Lam.IntVal 123
      
      $ nix-shell -p qbe gcc
      [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
      [nix-shell]$ ./t ; echo $?
      123
      
    • As of this commit, it can compile arithmetic primitives and impure print calls:

      λ> :m + *Presydc.SSA
      λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.Prim $ Lam.PrimAdd (Lam.IntVal 2) (Lam.IntVal 3)
      
      $ nix-shell -p qbe gcc
      [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
      [nix-shell]$ ./t
      5
      
    • As of this commit, it can compile trivial conditional expressions:

      λ> writePipeline "/tmp/t/t.qbe" $ Lam.IfThenElse (Lam.IntVal 1) (Lam.IntVal 2) (Lam.IntVal 3)
      
      [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
      [nix-shell]$ ./t ; echo $?
      2
      
    • As of this commit, the factorial program sucessfully compiles and executes. Two hidden bugs were fixed in the process:

      • freeVars did not function as expects on application nodes.

         freeVars (LetApp x f ys m) = toHashSetOf (each . #Var) ys
        +                           <> HS.singleton f
                                   <> (freeVars m & HS.delete x)
        
      • Three typos in the hoist function led to both branches of an if-expression being the unhoisted 'true' branch — I have no idea how this happened. :P

         go (IfThenElse c t f) = do
           Hoisted fs js t' <- go t
        -   Hoisted fs' js' t' <- go f
        +   Hoisted fs' js' f' <- go f
           (th,el) <- each mkFresh ("then","else")
        -   let bt = Join th Nothing t
        +   let bt = Join th Nothing t'
        -       bf = Join el Nothing t
        +       bf = Join el Nothing f'
           pure $ Hoisted (fs >< fs') (Seq.fromList [bt,bf] >< js >< js') $
             IfThenElse c (Jump th Nothing) (Jump el Nothing)
        
      -- (λf. (λx. f (λv. x x v)) (λx. f (λv. x x v))) (λ rec n. if n then n * rec (n - 1) else 1) 3
      )
      λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ Lam.zCombinator `Lam.App` (Lam.Lam "rec" $ Lam.Lam "n" $ Lam.IfThenElse (Lam.Var "n") (Lam.Prim $ PrimMul (Lam.Var "n") (Lam.Var "rec" `Lam.App` (Lam.Prim $ PrimSub (Lam.Var "n") (Lam.IntVal 1)))) (Lam.IntVal 1)) `Lam.App` Lam.IntVal 3
      
    • As of this commit, it can compile closures and applications thereupon, marking the completion of the ToQBE pass!

      λ> writePipeline "/tmp/t/t.qbe" $ Lam.Prim $ PrimPrintInt $ (Lam.Lam "x" $ Lam.Lam "y" $ Lam.Var "x") `Lam.App` Lam.IntVal 1 `Lam.App` Lam.IntVal 2
      
      [nix-shell]$ qbe t.qbe > t.s && gcc t.s -o t
      [nix-shell]$ ./t
      1
      
  7. DONE Demo

7.1.3. TODO 2 Layer 1 [6/7]

Round three of planning. Define the core data structures, signatures for functions between them, and tests for those functions.

  1. DONE Big decisions!
    1. NO Side effects?

      Decided against, primarily due to top-level side-effects.

      1. Impure
        1. Pros
          • Rely less on optimisations — use ref types.
          • Easy implementation.
        2. Cons
          • Top-level side effects lead to a lot of pain in implementation and use.
      2. Pure
        1. Pros
          • I have Haskellbrain.
          • More interesting implementation.
        2. Cons
          • Potentially complicated to make effecient.
    2. OKAY ML Modules?

      We can postpone this thought.

  2. DONE Core data structures [9/9]
    1. DONE Syd monad
    2. DONE Error effect and ADT
    3. DONE Located
    4. DONE Surface AST
    5. DONE Core AST
    6. DONE ANF AST
    7. KILL Closure-converted AST
    8. KILL Hoisted AST
    9. KILL CodeGen record à la Idris
  3. DONE Pass signatures [6/6]
    1. DONE Parse
    2. DONE Elaborate
    3. DONE TypeCheck
    4. DONE ClosureConvert
    5. DONE Hoist
    6. DONE ToQBE
  4. KILL Debug flags [0/7]

    Pushed back. These can be implemented once the ASTs in question are more fleshed out.

    1. KILL -ddump-ast
    2. KILL -ddump-core
    3. KILL -ddump-type-check
    4. KILL -ddump-anf
    5. KILL -ddump-closure-convert
    6. KILL -ddump-hoist
    7. KILL -ddump-qbe
  5. DONE Tests
  6. DONE CI/CD
  7. TODO Record demo

7.1.4. TODO 3 The Empty Module [0/8]

  1. TODO Parse
  2. TODO Rename
  3. TODO Type-check
  4. TODO Elaborate
  5. TODO Closure-convert
  6. TODO Hoist
  7. TODO LowerANF
  8. TODO ToQBE

7.2. Other

I am not convinced that some of these tasks should count towards "work time;" regardless, I choose to log them.

7.2.1. Nix setup

7.2.2. Documentation

7.2.3. GitHub Pages action

8. Resources

8.1. Reference projects

8.1.1. sixten/sixty

  • the latter is a rewrite of the former.
  • the leading example of demand-driven compilers.
  • implemented in glorious haskell.

8.1.2. eclair

  • implements datalog.
  • targets llvm.
  • demand-driven, using rock.
  • implemented in glorious haskell.

8.1.3. hnix

  • an interpreter for the nix language.
  • noted only for its use of recursion schemes.
  • implemented in glorious haskell.

8.1.4. Hécate — boreal

The lovely Hécate's boreal compiler has been an excelent reference. Many of our design decisions align, which has been invaluably convenient for myself }:3.

8.1.5. elm

  • targets JS.
  • haskell-like.
  • written by a moron. 💔
  • implemented in glorious haskell.

8.1.6. purescript

  • haskell-like source language.
  • implemented in glorious haskell.

8.1.7. MLton   SML

8.1.8. Moscow ML   SML

8.1.9. HaMLet   SML

  • Self-described as a reference implementation.

8.1.10. amy

  • targets LLVM.
  • ANF-based IR.
  • strict.
  • otherwise haskell-like.
  • implemented in glorious haskell.

8.1.11. Idris 2

  • Elegantly supports multiple and third-party backends.
  • Uses ANF.

8.1.12. Essentials of Compilation

  • A book about compiling Racket and Python to x86-64 assembly.

8.3. IRs — SSA, ANF, and CPS

8.3.1. SSA vs ANF: source code, slides.

8.3.8. [-] [#A] Compiling without continuations

Describes ANF and join points as used in GHC.

  • talk
  • Formalises join points with typing, operational semantics, and syntax.
  • Describes heavyweight optimisations for join points.
  • The paper's implementation allows for recursive join points.

8.3.17. Tarditi — Design and Implementation of Code Optimiziations for a Type-Directed Compiler for Standard ML (ANF)

This thesis emphasizes a "pay as you go" compilation strategy where simple, monorphic code should not be slower simply because the language supports higher-level code. Also, monomorphic functions should get specialized to machine-type functions. C compilers focus on loops, so functional compilers should focus on recursive functions. Explains lots of optimizations with a lambda calculus IR

Author: Madeleine Sydney Ślaga

Created: 2024-09-21 Sat 13:16

Validate