Walsh-Hadamard Spectral Bridging-Civilis.AI

Abstract

Scaling video predictive models to higher resolutions typically re

quires retraining from scratch, incurring substantial computational

costs. We present Walsh-Hadamard Spectral Bridging (WHSB), a

zero-parameter method that transfers learned video dynamics from

a low-resolution predictive model to a high-resolution decoder without

any additional training of the predictor. WHSB is grounded in the

mathematical observation that the latent space of video world models

exhibits a nested group structure under the Walsh-Hadamard trans

form over Z n 2 : the dynamics learned at coarse resolution correspond

to the low-frequency Walsh spectrum, which is a strict subset of the

spectrum at finer resolutions. We construct a bridge operator that per

forms forward and inverse Walsh-Hadamard transforms with spectral

truncation (downsampling) and zero-padding (upsampling), requiring

no learnable parameters. Experiments on KTH and UCF101 video

datasets demonstrate that WHSB surpasses the full high-resolution

training baseline at a 4× resolution span (64 → 256px), achieving a

cross-resolution prediction ratio C/B = 1.10–1.17. The zero-parameter

Walsh bridge consistently outperforms a learnable linear bridge with

8,352 parameters on UCF101 (1.04× advantage), and achieves compet

itive performance on KTH. Our results suggest that the latent repre

sentations of video predictive models possess an intrinsic Walsh spec

tral nesting structure, enabling zero-cost cross-resolution transfer and

challenging the prevailing assumption that resolution-specific retrain

ing is necessary

文章版权归作者所有，未经允许请勿转载。

THE END