v2 — λ-Rebalanced Joint Training

GPT-2 Small (124M) · 50,000 steps · March 4–7, 2026 · $31.44 total

Target Achievement — 3 of 5 Met

AR Perplexity
29.65
Target: < 40
Met
v1: 26.9 (+10%)
AUROC
0.863
Target: > 0.75
Met
v1: 0.854 (+1%)
ECE
0.009
Target: < 0.05
Met
v1: 0.010 (-10%)
Diffusion Loss
4.70
Target: < 4.0 (83%)
Missed
v1: 4.13 (+14%)
S1 Accuracy
22.0%
Target: 40% (55%)
Missed
v1: 28.7% (-23%)

Training Trajectory (v2 vs v1)

Evaluation Metrics by Step

Step AR PPL Diff Loss S1 Acc AUROC ECE
1,000 21.4 6.74 5.0% 0.557 0.002
5,000 25.6 4.97 18.0% 0.824 0.003
10,000 28.4 4.37 25.1% 0.850 0.004
20,000 30.9 4.75 21.3% 0.851 0.007
30,000 31.7 4.08 28.1% 0.864 0.023
40,000 30.2 4.56 23.2% 0.857 0.007
50,000 29.65 4.70 22.0% 0.863 0.009

Key Observations

Spot Instance Cost History

# Type AZ Steps Cost Finalized
1g5.2xlargeus-east-1b1 – 9,300$5.11Yes
2g5.2xlargeus-east-1b9,100 – 10,600$0.16Yes
3g5.2xlargeus-east-1b10,600 – 11,200$1.34Yes
4g5.xlargeus-east-1b11,300$0.00Yes
5g5.xlargeus-east-1b11,300$0.17Yes
6g5.2xlargeus-east-1a11,100 – 12,500$0.97Yes
7g5.2xlargeus-east-1a12,200 – 12,500$0.31Yes
8g5.2xlargeus-east-1f12,300 – 13,200$0.81Yes
9g5.xlargeus-east-1f13,200 – 14,500$1.11Yes
10g5.2xlargeus-east-1f14,100 – 21,900$4.39Yes
11g5.2xlargeus-east-1b21,000 – 24,100$2.12Yes
12g5.2xlargeus-east-1a24,000 – 28,500$2.54Yes
13g5.2xlargeus-east-1b28,600 – 41,000$7.23Yes
14g5.xlargeus-east-1b41,000$0.09Yes
15g5.2xlargeus-east-1b41,000 – 50,000$5.10Yes
Total (15 sessions, 31 reclamations, 3 AZs) $31.44 (62% spot savings)

Configuration

Model
GPT-2 Small (124M)
Config
tiny.yaml
Total Steps
50,000
Precision
bfloat16
GPU
NVIDIA A10G (24GB)
Instance
g5.2xlarge / g5.xlarge (spot)
λ AR / Diff
1.0 / 2.0
Checkpoints
50 (every 1k steps)
Data
OpenWebText
Tokenizer
GPT-2 (50,257 vocab)
Duration
~72 hours
Total Cost
$31.44