Abstract: Temporal modeling plays an important role in the effective adaption of the powerful pretrained text–image foundation model into text–video retrieval. However, existing methods often rely on ...