CLUSTER · TIER 3
Apple researchers introduce VideoFlexTok for flexible-length coarse-to-fine video tokenization.
VideoFlexTok proposes a flexible-length video tokenizer that adapts to inherent video complexity, moving beyond fixed spatiotemporal 3D grids to better preserve salient visual features for downstream text-to-video models.
Sources
1
X mentions
—
First seen
4Hago
Velocity
+33%/6h