NEWSFERENCE
MON, 11 May 2026 04:00:00
LIVE
$ today --liveF1TodayF2YesterdayF3ArchiveF4About
NEXT SCAN
← BACK TO TODAY/CLUSTER · ARXIV · RESEARCH
CLUSTER · TIER 3
FIRST SEEN 3D AGO
ARXIVRESEARCH

SOCKET: SOft Collision Kernel EsTimator for Sparse Attention

arXiv:2602.06283v2 Announce Type: replace Abstract: Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness depends on efficient scoring and selection at inference time. We revisit Locality-Sensitive Hashing (LSH) and introduce SOCKET, a SOft Collision Kernel EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregation. Traditional LSH yields binary collision signals that limit ranking quality and require substantial memory to perform well. In contrast, soft LSH accumulates graded collision evidence across hash tables, preserving top-k ordering with significantly less memory. This reframes LSH from a candidate generator into a principled scoring kernel for sparse attention. Leveraging this property, SOCKET enables efficient token selection without ad hoc voting and matches or surpasses prior sparse attention methods across multiple long-context benchmarks. With a custom CUDA scoring kernel and a Flash Decode Triton backend, SOCKET achieves up to 1.5$\times$ higher throughput than FlashAttention.

Sources
1
X mentions
First seen
3Dago
Velocity
+2%/6h
CONTRIBUTING SOURCES
1 ARTICLES
  1. arXiv: Machine Learning3D AGO
    arxiv.org/abs/2602.06283
X DISCOURSE
AWAITING X SIGNAL
No notable English-language X chatter on this entity yet.