Nous Research releases Lighthouse Attention for faster long-context pre-training

Nous Research has open-sourced Lighthouse Attention, a selection-based hierarchical attention mechanism for long-context pre-training that delivers a 1.4–1.7× wall-clock speedup at 98K context and runs ~17× faster than standard attention at 512K context on a single B200. The approach uses a multi-resolution pyramid with top-k cascade selection and requires no custom sparse attention kernel, straight-through estimator, or auxiliary loss.

Sources

X mentions

84k ▲

First seen

2Dago

Velocity

+2%/6h