Diffusion Models for Video Generation: Imagen Video
Transformers
Transfer Learning
MERLOT: Multimodal Neural Script Knowledge Models
Learning Temporal Video-Language Grounding for Egocentric Videos