Learning Long-Term Motion Embeddings for Efficient Kinematics Generation

Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude more efficiently by directly operating on a long-term motion embedding that is learned from large-scale trajectories obtained from tracker models. This enables efficient generation of long, realistic motions that fulfill goals specified via text prompts or spatial pokes. To achieve this, we...

Learning Long-Term Motion Embeddings for Efficient Kinematics Generation

Affected Roles

Time Horizon

What Changes

Recommended Action

Ready to dive deeper?

Discussion

Related Stories

7 ways to travel smarter this summer, with help from Google

New ways to create personalized images in the Gemini app

A new way to explore the web with AI Mode in Chrome