Seq Architecture
Seq is a concatenative (stack-based) programming language with static typing, row polymorphism, and green-thread concurrency.
Core Design Principles
-
Values are independent of stack structure - A value can be duplicated, shuffled, or stored without corruption. The stack is a contiguous array of 8-byte tagged pointer values.
-
Functional style - Operations produce new values rather than mutating.
list.pushreturns a new list, it doesn’t modify the original. -
Static typing with inference - Stack effects are checked at compile time. Row polymorphism (
..rest) allows generic stack-polymorphic functions. -
Concatenative composition - Functions compose by juxtaposition.
f gmeans “do f, then g”. No explicit argument passing.
Project Structure
Crate graph
Arrows point from a crate to the crates it depends on at compile time.
graph TD
core[seq-core<br/><i>values, stack, arena</i>]
runtime[seq-runtime<br/><i>scheduler, I/O, stdlib ops</i>]
compiler[seq-compiler<br/><i>parser, typechecker, codegen</i>]
lsp[seq-lsp<br/><i>language server</i>]
repl[seq-repl<br/><i>seqr TUI REPL</i>]
vim[vim-line<br/><i>vim-motion line editor</i>]
runtime --> core
compiler --> runtime
lsp --> compiler
repl --> compiler
repl --> vim
seq-repl launches seq-lsp as a subprocess for completions — that’s a
runtime dependency, not a compile-time one, so it doesn’t appear in the
graph above.
Layout
patch-seq/
├── crates/
│ ├── core/ # Rust - seq-core foundational types
│ │ └── src/
│ │ ├── value.rs # Value enum (Int, Float, String, Variant, Channel, etc.)
│ │ ├── tagged_stack.rs # Contiguous 8-byte tagged pointer stack
│ │ ├── arena.rs # Thread-local bump allocator (bumpalo)
│ │ ├── seqstring.rs # Reference-counted string type
│ │ └── memory_stats.rs # Memory tracking
│ ├── compiler/ # Rust - seqc compiler
│ │ ├── src/
│ │ │ ├── ast.rs # AST, union definitions, match expressions
│ │ │ ├── parser.rs # Forth-style parser
│ │ │ ├── typechecker.rs # Row-polymorphic type inference
│ │ │ ├── builtins.rs # Type signatures for builtins
│ │ │ ├── codegen/ # LLVM IR generation (inline ops, control flow, FFI)
│ │ │ ├── unification.rs # Type unification
│ │ │ ├── ffi.rs # FFI manifest parser and codegen
│ │ │ ├── lint.rs # Syntactic lint engine
│ │ │ └── capture_analysis.rs # Closure capture analysis
│ │ └── stdlib/ # Seq standard library (.seq files)
│ ├── runtime/ # Rust - libseq_runtime.a
│ │ └── src/
│ │ ├── arithmetic.rs # Math operations
│ │ ├── io.rs # I/O operations
│ │ ├── scheduler.rs # May coroutine scheduler
│ │ ├── channel.rs # CSP-style channels
│ │ ├── weave.rs # Generator/coroutine weaves
│ │ ├── closures.rs # Closure invocation
│ │ ├── string_ops.rs # String operations
│ │ ├── variant_ops.rs # Variant operations
│ │ ├── list_ops.rs # List operations
│ │ ├── map_ops.rs # Map operations
│ │ ├── file.rs # File I/O
│ │ ├── dns.rs # DNS worker pool (may-aware)
│ │ ├── tcp.rs # TCP (may-aware: listen/accept/connect/read/write)
│ │ ├── udp.rs # UDP (bind/send-to/recv-from)
│ │ ├── tls.rs # TLS client (rustls + ring)
│ │ ├── http_client.rs # HTTP/1.1 client (hand-rolled over may + rustls)
│ │ └── ... # + diagnostics, watchdog, signal, etc.
│ ├── lsp/ # Rust - seq-lsp language server
│ ├── repl/ # Rust - seqr TUI REPL (ratatui-based)
│ └── vim-line/ # Rust - vim-motion line editor library
├── examples/ # Example programs
└── docs/ # Documentation
Value Types
Values are defined in core/src/value.rs:
#![allow(unused)]
fn main() {
#[repr(C)]
pub enum Value {
Int(i64), // Discriminant 0
Float(f64), // Discriminant 1
Bool(bool), // Discriminant 2
String(SeqString), // Discriminant 3 - reference-counted
Symbol(SeqString), // Discriminant 4 - symbolic identifiers (:foo)
Variant(Arc<VariantData>), // Discriminant 5 - Arc for O(1) cloning
Map(Box<HashMap<...>>), // Discriminant 6 - key-value dictionary
Quotation { wrapper, impl_ },// Discriminant 7 - function pointers (dual calling conventions)
Closure { fn_ptr, env }, // Discriminant 8 - function + Arc-shared captured values
Channel(Arc<ChannelData>), // Discriminant 9 - MPMC sender/receiver pair
WeaveCtx { yield_chan, resume_chan }, // Discriminant 10 - generator coroutine context
}
pub struct VariantData {
pub tag: SeqString, // Symbol-based tag for dynamic variant construction
pub fields: Box<[Value]>,
}
}
The Rust Value enum is 40 bytes (used for storage and FFI), but on the stack
values are encoded as 8-byte tagged pointers (StackValue = u64). Integers are
stored inline (shifted left 1, low bit set); all other types are heap-allocated
behind 8-byte-aligned pointers with a tag in the low 3 bits.
Stack Model
The stack is a contiguous array of 8-byte tagged pointer values (StackValue),
defined in core/src/tagged_stack.rs:
#![allow(unused)]
fn main() {
pub type StackValue = u64; // 8 bytes — tagged pointer encoding
pub struct TaggedStack {
pub base: *mut StackValue, // Heap-allocated array
pub sp: usize, // Stack pointer (next free slot)
pub capacity: usize, // Current allocation size
}
}
This design enables:
- Inline LLVM IR operations - Integer arithmetic, comparisons, and boolean ops execute directly in generated code without FFI calls
- Cache-friendly layout - Contiguous memory access patterns
- O(1) stack operations - No linked-list traversal or allocation per push/pop
Key operations:
push(stack, value) -> stack'- Add value to toppop(stack) -> (stack', value)- Remove and return topdup,drop,swap,rot,over,pick,roll- Stack shuffling
Type System
Stack Effects
Every function has a stack effect: ( input -- output )
: add ( Int Int -- Int ) ... ;
: dup ( T -- T T ) ... ;
: swap ( A B -- B A ) ... ;
Row Polymorphism
The ..rest syntax captures “everything else on the stack”:
: my-dup ( ..rest T -- ..rest T T )
dup
;
This means my-dup works regardless of what’s below the top value.
Type Inference
Types are inferred at compile time. The type checker:
- Assigns fresh type variables to unknowns
- Collects constraints from operations
- Unifies constraints to solve for types
- Reports errors if unification fails
Variants (Algebraic Data Types)
Variants are tagged unions with N fields:
# Create using low-level constructors (Symbol tag + N fields)
42 "hello" :MyTag variant.make-2 # Tag :MyTag with fields [42, "hello"]
:Empty variant.make-0 # Tag :Empty with no fields
# Access
variant.tag # ( Variant -- Symbol )
variant.field-count # ( Variant -- Int )
0 variant.field-at # ( Variant Int -- Value )
# Functional append (for building dynamic collections)
value variant.append # ( Variant Value -- Variant' )
In practice, union definitions generate typed Make-VariantName constructors
and match expressions for safe, named field access. The low-level
variant.make-N API is used by the stdlib for dynamic variant construction.
JSON Tags
The JSON library (stdlib/json.seq) uses Symbol-based variant tags:
:JsonNull(0 fields):JsonBool(1 field: Int, 0 or 1):JsonNumber(1 field: Float):JsonString(1 field: String):JsonArray(N fields: elements):JsonObject(2N fields: key1 val1 key2 val2 …)
Control Flow
Conditionals
condition if
# then-branch
else
# else-branch
then
The condition must be a Bool value (produced by comparisons, logical operations, or true/false literals). The type checker enforces this — passing an Int where a Bool is expected is a compile error.
Both branches must have the same stack effect.
Recursion
Words can call themselves:
: factorial ( Int -- Int )
dup 1 i.<= if
drop 1
else
dup 1 i.- factorial i.*
then
;
Tail calls are optimized via LLVM’s musttail - deep recursion won’t overflow.
See docs/TCO_GUIDE.md for details.
Concurrency (Strands)
Seq uses May coroutines for cooperative concurrency:
# Spawn a strand (green thread)
[ ... code ... ] strand.spawn # ( Quotation -- Int ) returns strand ID
# Channels for communication
chan.make # ( -- Int ) returns channel ID
value chan-id chan.send # ( Value Int -- )
chan-id chan.receive # ( Int -- Value )
# Cooperative yield
chan.yield # Let other strands run
Note: Current implementation has known issues with heavy concurrent workloads.
Why May (Not Tokio)
Seq uses the may crate for stackful coroutines (fibers) rather than Rust’s
async/await ecosystem (Tokio, async-std). Key reasons:
-
No async coloring - With may, a Seq
strand.spawncreates a fiber that can call blocking operations (channel send/receive, I/O) and implicitly yield. Noasync/awaitsyntax pollution spreading through the call stack. -
Erlang/Go mental model - Fits Seq’s concatenative style naturally.
[ code ] strand.spawncreates a lightweight fiber. Thousands can run concurrently with message passing via channels. This matches how Go goroutines and Erlang processes work - simple synchronous-looking code that yields cooperatively. -
Simpler FFI - LLVM-generated code calls synchronous Rust functions. No async runtime ceremony or
Futureplumbing required. -
M:N scheduling - Like Tokio, may multiplexes many fibers across a small thread pool. We get lightweight concurrency without one OS thread per fiber.
M:N Threading: Best of Both Worlds
Early concurrency implementations had to choose between two models:
| Model | Mapping | Pros | Cons |
|---|---|---|---|
| Green threads (early Java) | M:1 | Cheap, fast switch | Single CPU only |
| Native OS threads | 1:1 | Multi-CPU | Expensive (~1MB stack), slow switch |
May provides M:N scheduling - many lightweight coroutines distributed across all CPU cores:
- Lightweight - Strands use a fixed 128KB stack (configurable via
SEQ_STACK_SIZE), not 1MB - Multi-core - Work-stealing scheduler spreads load across all CPUs
- Fast context switch - Cooperative yield, no kernel involvement
- No blocking - When one strand waits on I/O, others run on that core
This means Seq programs get the programming simplicity of green threads (spawn thousands of concurrent tasks cheaply) with the performance of native threads (utilizing all available CPUs). Write sequential code that scales.
Tradeoff: libc for stdout
May’s implicit yields can occur inside any function call. Rust’s stdout()
uses an internal RefCell that panics if one coroutine holds a borrow, yields,
and another coroutine on the same OS thread tries to borrow. This is because
RefCell tracks borrows per-thread, not per-coroutine.
We bypass this by calling libc::write(1, ...) directly, protected by
may::sync::Mutex (which yields the coroutine when contended rather than
blocking the OS thread). This is a small price for may’s cleaner programming
model.
See runtime/src/io.rs for the implementation.
Runtime Configuration
The scheduler can be tuned via environment variables:
| Variable | Default | Description |
|---|---|---|
SEQ_STACK_SIZE | 131072 (128KB) | Coroutine stack size in bytes |
SEQ_POOL_CAPACITY | 10000 | Cached coroutine pool size |
SEQ_WATCHDOG_SECS | 0 (disabled) | Threshold for “stuck strand” detection |
SEQ_WATCHDOG_INTERVAL | 5 | Watchdog check frequency (seconds) |
SEQ_WATCHDOG_ACTION | warn | Action on stuck strand: warn or exit |
SEQ_REPORT | unset (disabled) | At-exit KPI report: 1 (human/stderr), json (JSON/stderr), json:/path (JSON to file), words (human + per-word counts) |
Diagnostics Feature
The runtime includes optional diagnostics for production debugging:
- Strand registry - Tracks active strands with spawn timestamps
- SIGQUIT handler - Dumps runtime stats on
kill -3 <pid> - Watchdog - Detects strands running longer than threshold
- At-exit report -
SEQ_REPORTenv var dumps KPIs (wall clock, strands, memory, channels) when the program exits. Compile withseqc build --instrumentto include per-word call counts
These are controlled by the diagnostics Cargo feature (enabled by default):
# In Cargo.toml - disable for minimal overhead
seq-runtime = { version = "...", default-features = false }
When disabled, the runtime skips strand registry operations and signal handler
setup, eliminating ~O(1024) scans and SystemTime::now() syscalls per spawn.
Note: Benchmarking shows the diagnostics overhead is negligible compared to
May’s coroutine spawn syscalls. The feature is primarily useful for production
deployments where kill -3 debugging is needed.
Memory Management
The tagged pointer stack eliminates per-operation allocation overhead for integers (inline in the 8-byte slot) and provides O(1) push/pop for all types. The stack is a single contiguous array that grows/shrinks by adjusting the stack pointer. Heap types (String, Variant, Closure) use reference counting for correct cleanup.
Arena Allocation
Problem: String operations (concatenation, substring, parsing) create many short-lived intermediate strings. Reference counting each one adds overhead.
Solution: Thread-local bump allocator (via bumpalo crate).
- Allocation is a pointer bump (~5ns vs ~100ns for malloc)
- No individual deallocation - entire arena reset at once
- Reset when strand exits or when arena exceeds 10MB threshold
- 20x faster than global allocator for allocation-heavy workloads
Thread-local vs strand-local: The arena is per-OS-thread, not per-strand. If may migrates a strand between threads (rare), some memory stays in the old arena until another strand on that thread exits. This is acceptable - the common case (strand stays on one thread) is fast, and the 10MB auto-reset prevents unbounded growth in the rare migration case.
See core/src/arena.rs for implementation.
Reference Counting
SeqString uses atomic reference counting for strings that escape the arena:
- Strings passed through channels are cloned to the global allocator
- Strings stored in closures use reference counting
- Arena strings are fast for local computation; refcounted strings are safe for sharing across strands
This hybrid approach gives us arena speed for the common case (local string manipulation) and correctness for cross-strand communication.
Inline LLVM IR vs FFI
The tagged stack design enables inline code generation for performance-critical operations. Integer arithmetic, comparisons, and boolean operations execute directly in generated LLVM IR without FFI calls to the runtime:
; Example: inline integer add
%a = load i64, ptr %slot1_ptr
%b = load i64, ptr %slot1_ptr.1
%result = add i64 %a, %b
store i64 %result, ptr %slot1_ptr
Complex operations (string handling, variants, closures) still call into the Rust runtime for memory safety and code maintainability.
The main Word and Exit Codes
Every executable Seq program defines a main word with one of two
allowed signatures:
: main ( -- ) # void main: process exits with code 0
: main ( -- Int ) # int main: returned Int becomes the process exit code
The compiler rejects any other shape (extra inputs, multiple outputs, non-Int output) at type-check time.
For main ( -- Int ), the top-of-stack Int at the end of main is
written to a runtime global by the generated seq_main, and the C-level
main function returns it as the process exit code (truncated to i32).
This means Seq programs compose naturally with shell tooling: &&,
||, $?, set -e, CI gates, and test harnesses all work as
expected.
: main ( -- Int )
do-work
0 # success
;
./myprog && echo "succeeded" || echo "failed with $?"
Script mode (seqc script.seq) inherits this behavior automatically
because it just execs the compiled binary.
Compilation Pipeline
- Parse - Tokenize and build AST (
parser.rs) - Type Check - Infer and verify stack effects (
typechecker.rs) - Codegen - Emit LLVM IR (
codegen.rs) - Link - LLVM compiles IR, links with
libseq_runtime.a
# Compile a .seq file
./target/release/seqc build myprogram.seq -o myprogram
# Keep IR for inspection
./target/release/seqc build myprogram.seq -o myprogram --keep-ir
cat myprogram.ll
Standard Library
Include System
include std:json # Loads stdlib/json.seq
include foo # Loads ./foo.seq
JSON (stdlib/json.seq)
Parsing:
include std:json
"[1, 2, 3]" json-parse # ( String -- JsonValue Bool )
Serialization:
json-value json-serialize # ( JsonValue -- String )
Functional builders:
json-empty-array 1 int->float json-number array-with 2 int->float json-number array-with
# Result: [1, 2]
json-empty-object "name" json-string "John" json-string obj-with
# Result: {"name": "John"}
Current Limitations
- No loop keywords - Use recursion with TCO (tail call optimization is guaranteed)
- Serialization size limits - Arrays > 3 elements, objects > 2 pairs show as
[...]/{...} - roll type checking -
3 rollworks at runtime but type checker can’t fully verify
Building and CI
The justfile is the source of truth for all build, test, and lint
operations. Forgejo Actions calls these recipes directly — there is no
duplication between local development and CI.
just build # build runtime, compiler, LSP
just test # run all unit tests
just lint # clippy with warnings as errors
just ci # everything CI runs: fmt-check, lint, test, build,
# examples, integration tests, seq lint
Run just ci before pushing — it’s the same pipeline that runs in
Forgejo Actions and will catch formatting, clippy, test, and lint
failures locally.
Toolchain pinning
The Rust toolchain is pinned in two places that must always agree:
rust-toolchain.toml— used byrustupfor local development.forgejo/workflows/*.yml— every workflow that callsdtolnay/rust-toolchain@masterdeclares the same explicittoolchain:input (ci-linux.yml,release.yml,bench.yml,bench-calibrate.yml)
All cargo commands in the ci pipeline use --locked so a stale
Cargo.lock is a build failure rather than a silent re-resolve.
Running Programs
# Compile and run
./target/release/seqc build myprogram.seq -o /tmp/prog
/tmp/prog
# With arguments
/tmp/prog arg1 arg2