aether — Nano LLM from Scratch

About

Aether is a minimal, fully-transparent transformer LLM built from first principles. 0.57M parameters, C99 inference engine (350 LOC, zero dependencies), trained on code and knowledge corpora.

"What I cannot create, I do not understand." — Feynman

Training Phases

✅ v0.1: Jot

Syntax corpus. 200 epochs. Foundation.

✅ v0.2: Jung

JIT compiler. 100 epochs. Specialization.

✅ v0.3: Multilang

27MB code. 500 epochs. Loss 0.0947.

✅ v0.4: Knowledge

Balanced corpus. 200 epochs. Loss 0.1233.

🚀 v1.0: Mini

3.5M params. Full scale. Coming next.

Quick Start

Latest: v0.4 (knowledge expansion) — 200 epochs, loss 0.1233

Clone & setup:

git clone https://github.com/nulljosh/aether.git && cd nous
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Train:

python src/train.py --corpus tiny --epochs 200

Inference (C engine, 50K tok/s):

cd inference && make && ./aether ../models/aether.bin "fn " --temp 0.3

Web UI:

python index.py # http://localhost:5001

The Stack

PyTorch Trainer

AdamW, cosine LR, gradient clipping. Full control.

C99 Inference

350 LOC, mmap weights, zero deps. Runs anywhere.

Flask Web UI

Chat, quiz, status. Port 5001. Real-time inference.

Aether Daemon

Continuous training, checkpoints, iMessage notifications.

Benchmarks

Model	Params	Speed	Capability
aether (v0.4)	0.57M	50K tok/s	code + knowledge
GPT-2	124M	—	coherent paragraphs
Claude	???	80 tok/s	reasoning, tools

Why Aether

Full stack from scratch: tokenizer → attention → training → C inference
No black boxes. Every byte visible and understandable
Learning tool first, production model later
C99 engine runs anywhere with a C compiler
Progressive training shows how LLMs learn from data