v1 — Baseline Joint Training

GPT-2 Small (124M) · 50,000 steps · March 1–3, 2026 · $27.62 total

Target Achievement — 3 of 5 Met

AR Perplexity
26.9
Target: < 40
Met
AUROC
0.854
Target: > 0.75
Met
ECE
0.010
Target: < 0.05
Met
Diffusion Loss
4.13
Target: < 4.0 (97%)
Near
S1 Accuracy
28.7%
Target: 40% (72%)
Missed

Training Trajectory

Evaluation Metrics by Step

Step AR PPL Diff Loss S1 Acc AUROC ECE
50 22.4 7.91 3.7% 0.502 0.048
1,000 21.2 6.86 4.8% 0.559 0.003
5,000 24.3 6.11 9.3% 0.695 0.011
10,000 26.5 5.41 14.7% 0.791 0.011
20,000 28.8 4.61 22.3% 0.847 0.005
30,000 28.5 4.33 23.3% 0.860 0.009
40,000 27.4 4.21 27.0% 0.870 0.010
50,000 26.9 4.13 28.7% 0.854 0.010

Key Observations

Spot Instance Cost History

# AZ Steps Boot (UTC) Event Cost
1 us-east-1a 2.3k → 28.7k 2026-03-01 02:33 Reclaimed $14.66
2 us-east-1f 28.8k → 31.8k 2026-03-02 12:25 Reclaimed $2.14
3 us-east-1b 31.8k → 35k 2026-03-02 17:17 Reclaimed $2.25
4 us-east-1b 35k → 50k 2026-03-02 22:24 Completed $8.57
Total across 4 instances (3 reclamation recoveries) $27.62
g5.2xlarge spot (~$0.43/hr on-demand, ~63% savings via spot) · Checkpoints every 1,000 steps · Autonomous bootstrap recovery

Run Configuration

Model
GPT-2 Small (124M)
Config
tiny.yaml
Total Steps
50,000
Precision
bfloat16
GPU
NVIDIA A10G (24GB)
Instance
g5.2xlarge (spot)
λ AR / Diff
1.0 / 1.0
Checkpoints
50 (every 1k steps)
Data
OpenWebText
Tokenizer
GPT-2 (50,257 vocab)
Duration
~48 hours
Total Cost
$27.62