CLUSTER · TIER 2
Nous Research releases Token Superposition Training for 2-3x LLM pretraining speedup
Nous Research released Token Superposition Training (TST), a modification to the standard LLM pretraining loop that achieves a 2–3× wall-clock speedup at matched FLOPs without changing model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens using averaged embeddings, then switches to standard next-token prediction for the remainder of the run.
Sources
1
X mentions
11k ▲
First seen
4Dago
Velocity
+2%/6h