Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Seq Architecture

Seq is a concatenative (stack-based) programming language with static typing, row polymorphism, and green-thread concurrency.

Core Design Principles

  1. Values are independent of stack structure - A value can be duplicated, shuffled, or stored without corruption. The stack is a contiguous array of 8-byte tagged pointer values.

  2. Functional style - Operations produce new values rather than mutating. list.push returns a new list, it doesn’t modify the original.

  3. Static typing with inference - Stack effects are checked at compile time. Row polymorphism (..rest) allows generic stack-polymorphic functions.

  4. Concatenative composition - Functions compose by juxtaposition. f g means “do f, then g”. No explicit argument passing.

Project Structure

Crate graph

Arrows point from a crate to the crates it depends on at compile time.

graph TD
    core[seq-core<br/><i>values, stack, arena</i>]
    runtime[seq-runtime<br/><i>scheduler, I/O, stdlib ops</i>]
    compiler[seq-compiler<br/><i>parser, typechecker, codegen</i>]
    lsp[seq-lsp<br/><i>language server</i>]
    repl[seq-repl<br/><i>seqr TUI REPL</i>]
    vim[vim-line<br/><i>vim-motion line editor</i>]

    runtime --> core
    compiler --> runtime
    lsp --> compiler
    repl --> compiler
    repl --> vim

seq-repl launches seq-lsp as a subprocess for completions — that’s a runtime dependency, not a compile-time one, so it doesn’t appear in the graph above.

Layout

patch-seq/
├── crates/
│   ├── core/           # Rust - seq-core foundational types
│   │   └── src/
│   │       ├── value.rs        # Value enum (Int, Float, String, Variant, Channel, etc.)
│   │       ├── tagged_stack.rs # Contiguous 8-byte tagged pointer stack
│   │       ├── arena.rs        # Thread-local bump allocator (bumpalo)
│   │       ├── seqstring.rs    # Reference-counted string type
│   │       └── memory_stats.rs # Memory tracking
│   ├── compiler/       # Rust - seqc compiler
│   │   ├── src/
│   │   │   ├── ast.rs          # AST, union definitions, match expressions
│   │   │   ├── parser.rs       # Forth-style parser
│   │   │   ├── typechecker.rs  # Row-polymorphic type inference
│   │   │   ├── builtins.rs     # Type signatures for builtins
│   │   │   ├── codegen/        # LLVM IR generation (inline ops, control flow, FFI)
│   │   │   ├── unification.rs  # Type unification
│   │   │   ├── ffi.rs          # FFI manifest parser and codegen
│   │   │   ├── lint.rs         # Syntactic lint engine
│   │   │   └── capture_analysis.rs  # Closure capture analysis
│   │   └── stdlib/             # Seq standard library (.seq files)
│   ├── runtime/        # Rust - libseq_runtime.a
│   │   └── src/
│   │       ├── arithmetic.rs   # Math operations
│   │       ├── io.rs           # I/O operations
│   │       ├── scheduler.rs    # May coroutine scheduler
│   │       ├── channel.rs      # CSP-style channels
│   │       ├── weave.rs        # Generator/coroutine weaves
│   │       ├── closures.rs     # Closure invocation
│   │       ├── string_ops.rs   # String operations
│   │       ├── variant_ops.rs  # Variant operations
│   │       ├── list_ops.rs     # List operations
│   │       ├── map_ops.rs      # Map operations
│   │       ├── file.rs         # File I/O
│   │       ├── dns.rs          # DNS worker pool (may-aware)
│   │       ├── tcp.rs          # TCP (may-aware: listen/accept/connect/read/write)
│   │       ├── udp.rs          # UDP (bind/send-to/recv-from)
│   │       ├── tls.rs          # TLS client (rustls + ring)
│   │       ├── http_client.rs  # HTTP/1.1 client (hand-rolled over may + rustls)
│   │       └── ...             # + diagnostics, watchdog, signal, etc.
│   ├── lsp/            # Rust - seq-lsp language server
│   ├── repl/           # Rust - seqr TUI REPL (ratatui-based)
│   └── vim-line/       # Rust - vim-motion line editor library
├── examples/           # Example programs
└── docs/               # Documentation

Value Types

Values are defined in core/src/value.rs:

#![allow(unused)]
fn main() {
#[repr(C)]
pub enum Value {
    Int(i64),                    // Discriminant 0
    Float(f64),                  // Discriminant 1
    Bool(bool),                  // Discriminant 2
    String(SeqString),           // Discriminant 3 - reference-counted
    Symbol(SeqString),           // Discriminant 4 - symbolic identifiers (:foo)
    Variant(Arc<VariantData>),   // Discriminant 5 - Arc for O(1) cloning
    Map(Box<HashMap<...>>),      // Discriminant 6 - key-value dictionary
    Quotation { wrapper, impl_ },// Discriminant 7 - function pointers (dual calling conventions)
    Closure { fn_ptr, env },     // Discriminant 8 - function + Arc-shared captured values
    Channel(Arc<ChannelData>),   // Discriminant 9 - MPMC sender/receiver pair
    WeaveCtx { yield_chan, resume_chan }, // Discriminant 10 - generator coroutine context
}

pub struct VariantData {
    pub tag: SeqString,          // Symbol-based tag for dynamic variant construction
    pub fields: Box<[Value]>,
}
}

The Rust Value enum is 40 bytes (used for storage and FFI), but on the stack values are encoded as 8-byte tagged pointers (StackValue = u64). Integers are stored inline (shifted left 1, low bit set); all other types are heap-allocated behind 8-byte-aligned pointers with a tag in the low 3 bits.

Stack Model

The stack is a contiguous array of 8-byte tagged pointer values (StackValue), defined in core/src/tagged_stack.rs:

#![allow(unused)]
fn main() {
pub type StackValue = u64;  // 8 bytes — tagged pointer encoding

pub struct TaggedStack {
    pub base: *mut StackValue,  // Heap-allocated array
    pub sp: usize,              // Stack pointer (next free slot)
    pub capacity: usize,        // Current allocation size
}
}

This design enables:

  • Inline LLVM IR operations - Integer arithmetic, comparisons, and boolean ops execute directly in generated code without FFI calls
  • Cache-friendly layout - Contiguous memory access patterns
  • O(1) stack operations - No linked-list traversal or allocation per push/pop

Key operations:

  • push(stack, value) -> stack' - Add value to top
  • pop(stack) -> (stack', value) - Remove and return top
  • dup, drop, swap, rot, over, pick, roll - Stack shuffling

Type System

Stack Effects

Every function has a stack effect: ( input -- output )

: add ( Int Int -- Int ) ... ;
: dup ( T -- T T ) ... ;
: swap ( A B -- B A ) ... ;

Row Polymorphism

The ..rest syntax captures “everything else on the stack”:

: my-dup ( ..rest T -- ..rest T T )
  dup
;

This means my-dup works regardless of what’s below the top value.

Type Inference

Types are inferred at compile time. The type checker:

  1. Assigns fresh type variables to unknowns
  2. Collects constraints from operations
  3. Unifies constraints to solve for types
  4. Reports errors if unification fails

Variants (Algebraic Data Types)

Variants are tagged unions with N fields:

# Create using low-level constructors (Symbol tag + N fields)
42 "hello" :MyTag variant.make-2    # Tag :MyTag with fields [42, "hello"]
:Empty variant.make-0               # Tag :Empty with no fields

# Access
variant.tag           # ( Variant -- Symbol )
variant.field-count   # ( Variant -- Int )
0 variant.field-at    # ( Variant Int -- Value )

# Functional append (for building dynamic collections)
value variant.append  # ( Variant Value -- Variant' )

In practice, union definitions generate typed Make-VariantName constructors and match expressions for safe, named field access. The low-level variant.make-N API is used by the stdlib for dynamic variant construction.

JSON Tags

The JSON library (stdlib/json.seq) uses Symbol-based variant tags:

  • :JsonNull (0 fields)
  • :JsonBool (1 field: Int, 0 or 1)
  • :JsonNumber (1 field: Float)
  • :JsonString (1 field: String)
  • :JsonArray (N fields: elements)
  • :JsonObject (2N fields: key1 val1 key2 val2 …)

Control Flow

Conditionals

condition if
  # then-branch
else
  # else-branch
then

The condition must be a Bool value (produced by comparisons, logical operations, or true/false literals). The type checker enforces this — passing an Int where a Bool is expected is a compile error.

Both branches must have the same stack effect.

Recursion

Words can call themselves:

: factorial ( Int -- Int )
  dup 1 i.<= if
    drop 1
  else
    dup 1 i.- factorial i.*
  then
;

Tail calls are optimized via LLVM’s musttail - deep recursion won’t overflow. See docs/TCO_GUIDE.md for details.

Concurrency (Strands)

Seq uses May coroutines for cooperative concurrency:

# Spawn a strand (green thread)
[ ... code ... ] strand.spawn    # ( Quotation -- Int ) returns strand ID

# Channels for communication
chan.make                         # ( -- Int ) returns channel ID
value chan-id chan.send            # ( Value Int -- )
chan-id chan.receive               # ( Int -- Value )

# Cooperative yield
chan.yield                        # Let other strands run

Note: Current implementation has known issues with heavy concurrent workloads.

Why May (Not Tokio)

Seq uses the may crate for stackful coroutines (fibers) rather than Rust’s async/await ecosystem (Tokio, async-std). Key reasons:

  1. No async coloring - With may, a Seq strand.spawn creates a fiber that can call blocking operations (channel send/receive, I/O) and implicitly yield. No async/await syntax pollution spreading through the call stack.

  2. Erlang/Go mental model - Fits Seq’s concatenative style naturally. [ code ] strand.spawn creates a lightweight fiber. Thousands can run concurrently with message passing via channels. This matches how Go goroutines and Erlang processes work - simple synchronous-looking code that yields cooperatively.

  3. Simpler FFI - LLVM-generated code calls synchronous Rust functions. No async runtime ceremony or Future plumbing required.

  4. M:N scheduling - Like Tokio, may multiplexes many fibers across a small thread pool. We get lightweight concurrency without one OS thread per fiber.

M:N Threading: Best of Both Worlds

Early concurrency implementations had to choose between two models:

ModelMappingProsCons
Green threads (early Java)M:1Cheap, fast switchSingle CPU only
Native OS threads1:1Multi-CPUExpensive (~1MB stack), slow switch

May provides M:N scheduling - many lightweight coroutines distributed across all CPU cores:

  • Lightweight - Strands use a fixed 128KB stack (configurable via SEQ_STACK_SIZE), not 1MB
  • Multi-core - Work-stealing scheduler spreads load across all CPUs
  • Fast context switch - Cooperative yield, no kernel involvement
  • No blocking - When one strand waits on I/O, others run on that core

This means Seq programs get the programming simplicity of green threads (spawn thousands of concurrent tasks cheaply) with the performance of native threads (utilizing all available CPUs). Write sequential code that scales.

Tradeoff: libc for stdout

May’s implicit yields can occur inside any function call. Rust’s stdout() uses an internal RefCell that panics if one coroutine holds a borrow, yields, and another coroutine on the same OS thread tries to borrow. This is because RefCell tracks borrows per-thread, not per-coroutine.

We bypass this by calling libc::write(1, ...) directly, protected by may::sync::Mutex (which yields the coroutine when contended rather than blocking the OS thread). This is a small price for may’s cleaner programming model.

See runtime/src/io.rs for the implementation.

Runtime Configuration

The scheduler can be tuned via environment variables:

VariableDefaultDescription
SEQ_STACK_SIZE131072 (128KB)Coroutine stack size in bytes
SEQ_POOL_CAPACITY10000Cached coroutine pool size
SEQ_WATCHDOG_SECS0 (disabled)Threshold for “stuck strand” detection
SEQ_WATCHDOG_INTERVAL5Watchdog check frequency (seconds)
SEQ_WATCHDOG_ACTIONwarnAction on stuck strand: warn or exit
SEQ_REPORTunset (disabled)At-exit KPI report: 1 (human/stderr), json (JSON/stderr), json:/path (JSON to file), words (human + per-word counts)

Diagnostics Feature

The runtime includes optional diagnostics for production debugging:

  • Strand registry - Tracks active strands with spawn timestamps
  • SIGQUIT handler - Dumps runtime stats on kill -3 <pid>
  • Watchdog - Detects strands running longer than threshold
  • At-exit report - SEQ_REPORT env var dumps KPIs (wall clock, strands, memory, channels) when the program exits. Compile with seqc build --instrument to include per-word call counts

These are controlled by the diagnostics Cargo feature (enabled by default):

# In Cargo.toml - disable for minimal overhead
seq-runtime = { version = "...", default-features = false }

When disabled, the runtime skips strand registry operations and signal handler setup, eliminating ~O(1024) scans and SystemTime::now() syscalls per spawn.

Note: Benchmarking shows the diagnostics overhead is negligible compared to May’s coroutine spawn syscalls. The feature is primarily useful for production deployments where kill -3 debugging is needed.

Memory Management

The tagged pointer stack eliminates per-operation allocation overhead for integers (inline in the 8-byte slot) and provides O(1) push/pop for all types. The stack is a single contiguous array that grows/shrinks by adjusting the stack pointer. Heap types (String, Variant, Closure) use reference counting for correct cleanup.

Arena Allocation

Problem: String operations (concatenation, substring, parsing) create many short-lived intermediate strings. Reference counting each one adds overhead.

Solution: Thread-local bump allocator (via bumpalo crate).

  • Allocation is a pointer bump (~5ns vs ~100ns for malloc)
  • No individual deallocation - entire arena reset at once
  • Reset when strand exits or when arena exceeds 10MB threshold
  • 20x faster than global allocator for allocation-heavy workloads

Thread-local vs strand-local: The arena is per-OS-thread, not per-strand. If may migrates a strand between threads (rare), some memory stays in the old arena until another strand on that thread exits. This is acceptable - the common case (strand stays on one thread) is fast, and the 10MB auto-reset prevents unbounded growth in the rare migration case.

See core/src/arena.rs for implementation.

Reference Counting

SeqString uses atomic reference counting for strings that escape the arena:

  • Strings passed through channels are cloned to the global allocator
  • Strings stored in closures use reference counting
  • Arena strings are fast for local computation; refcounted strings are safe for sharing across strands

This hybrid approach gives us arena speed for the common case (local string manipulation) and correctness for cross-strand communication.

Inline LLVM IR vs FFI

The tagged stack design enables inline code generation for performance-critical operations. Integer arithmetic, comparisons, and boolean operations execute directly in generated LLVM IR without FFI calls to the runtime:

; Example: inline integer add
%a = load i64, ptr %slot1_ptr
%b = load i64, ptr %slot1_ptr.1
%result = add i64 %a, %b
store i64 %result, ptr %slot1_ptr

Complex operations (string handling, variants, closures) still call into the Rust runtime for memory safety and code maintainability.

The main Word and Exit Codes

Every executable Seq program defines a main word with one of two allowed signatures:

: main ( -- )      # void main: process exits with code 0
: main ( -- Int )  # int main: returned Int becomes the process exit code

The compiler rejects any other shape (extra inputs, multiple outputs, non-Int output) at type-check time.

For main ( -- Int ), the top-of-stack Int at the end of main is written to a runtime global by the generated seq_main, and the C-level main function returns it as the process exit code (truncated to i32). This means Seq programs compose naturally with shell tooling: &&, ||, $?, set -e, CI gates, and test harnesses all work as expected.

: main ( -- Int )
  do-work
  0    # success
;
./myprog && echo "succeeded" || echo "failed with $?"

Script mode (seqc script.seq) inherits this behavior automatically because it just execs the compiled binary.

Compilation Pipeline

  1. Parse - Tokenize and build AST (parser.rs)
  2. Type Check - Infer and verify stack effects (typechecker.rs)
  3. Codegen - Emit LLVM IR (codegen.rs)
  4. Link - LLVM compiles IR, links with libseq_runtime.a
# Compile a .seq file
./target/release/seqc build myprogram.seq -o myprogram

# Keep IR for inspection
./target/release/seqc build myprogram.seq -o myprogram --keep-ir
cat myprogram.ll

Standard Library

Include System

include std:json    # Loads stdlib/json.seq
include foo         # Loads ./foo.seq

JSON (stdlib/json.seq)

Parsing:

include std:json

"[1, 2, 3]" json-parse    # ( String -- JsonValue Bool )

Serialization:

json-value json-serialize  # ( JsonValue -- String )

Functional builders:

json-empty-array 1 int->float json-number array-with 2 int->float json-number array-with
# Result: [1, 2]

json-empty-object "name" json-string "John" json-string obj-with
# Result: {"name": "John"}

Current Limitations

  1. No loop keywords - Use recursion with TCO (tail call optimization is guaranteed)
  2. Serialization size limits - Arrays > 3 elements, objects > 2 pairs show as [...]/{...}
  3. roll type checking - 3 roll works at runtime but type checker can’t fully verify

Building and CI

The justfile is the source of truth for all build, test, and lint operations. Forgejo Actions calls these recipes directly — there is no duplication between local development and CI.

just build       # build runtime, compiler, LSP
just test        # run all unit tests
just lint        # clippy with warnings as errors
just ci          # everything CI runs: fmt-check, lint, test, build,
                 # examples, integration tests, seq lint

Run just ci before pushing — it’s the same pipeline that runs in Forgejo Actions and will catch formatting, clippy, test, and lint failures locally.

Toolchain pinning

The Rust toolchain is pinned in two places that must always agree:

  • rust-toolchain.toml — used by rustup for local development
  • .forgejo/workflows/*.yml — every workflow that calls dtolnay/rust-toolchain@master declares the same explicit toolchain: input (ci-linux.yml, release.yml, bench.yml, bench-calibrate.yml)

All cargo commands in the ci pipeline use --locked so a stale Cargo.lock is a build failure rather than a silent re-resolve.

Running Programs

# Compile and run
./target/release/seqc build myprogram.seq -o /tmp/prog
/tmp/prog

# With arguments
/tmp/prog arg1 arg2