Inputs Camera: Static Camera: Dolly out Camera: Orbit right + Pedestal up
Object Motions

 


Abstract

This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. Shot design, a critical aspect of filmmaking, involves meticulously planning both camera movements and object motions in a scene. However, enabling intuitive shot design in modern image-to-video generation systems presents two main challenges: first, effectively capturing user intentions on the motion design, where both camera movements and scene-space object motions must be specified jointly; and second, representing motion information that can be effectively utilized by a video diffusion model to synthesize the image animations.

To address these challenges, we introduce MotionCanvas, a method that integrates user-driven controls into image-to-video (I2V) generation models, allowing users to control both object and camera motions in a scene-aware manner. By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis without requiring costly 3D-related training data. MotionCanvas enables users to intuitively depict scene-space motion intentions, and translates them into spatiotemporal motion-conditioning signals for video diffusion models. We demonstrate the effectiveness of our method on a wide range of real-world image content and shot-design scenarios, highlighting its potential to enhance the creative workflows in digital content creation and adapt to various image and video editing applications.

 


Comparison & Showcases

Effectiveness in Cinematic Shot Design (Joint camera and object motion control in a 3D-scene-aware manner).

Camera motion Object global motion Object local motion
[ Pedestal up + Dolly in ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Static ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Roll clockwise ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Tilting up ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Dolly in ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Trucking right + Pedestal up ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Orbit right ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left + Tilting up ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Dolly in ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Pedestal down + Tiliting up ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left + Dolly in ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Trucking right ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Static ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Trucking left + Dolly out ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left + Pedestal up + Orbit right ]
DragAnything MOFA-Video MotionCanvas (Ours)

 


Applications

Shot Design with Joint Camera and Object Control.

Inputs Camera: Trucking right Camera: Zoom in Camera: Roll clockwise
Inputs Camera: Static Camera: Dolly in Camera: Diagonal bottom-right
Camera: Dolly out Camera: Orbit left Camera: Pedestal up
Camera: Orbit left Camera: [Trcuking left + Pedestal up] Camera: Dolly in
Camera: Dolly out Camera: Dolly in Camera: Trcuking left

 


Long Videos with Complex Trajectories.

Input image Motion control signal Result sample #1 Result sample #2

Input image Result sample #1 Result sample #2

 


Object Local Motion Control.

Inputs
Results

Inputs
Results

Inputs
Results

Inputs
Results

 


Additional Applications

Motion Transfer.

Input source video
Transfer results

Video Editing.

Input video
Editing results

 


Comparisons with Baseline Methods

Camera Motion Control.

Input camera control MotionCtrl CameraCtrl Ours
[ Dolly in + Zoom Out ]
(Dolly zoom)
[ Trucking right ]

Object Motion Control.

Inputs DragAnything MOFA-Video
Camera TrackDiffusion Ours
[ Static ]

Inputs DragAnything MOFA-Video
Camera TrackDiffusion Ours
[ Trucking right ]

 


Ablation Study

Camera Motion Representation.

Input Gauss. Map Plucker Point Traj Coeff. (Ours)
[ Dolly out + Panning right ]
Input Gauss. Map Plucker Point Traj Coeff. (Ours)
[ Roll clockwise + Zoom out ]

Bounding Box Conditioning.

Input Ourscoord Ours
Input Ourscoord Ours

Additional Analysis

Effect of Point Track Density on Camera Motion Control

Density=0.1 Density=0.4 Density=0.7 Density=1.0
Input point track
Results

Effect of Text Prompt.
- We show camera motion control of "dolly in" with different levels of text detail.

"A man." "A man crossing a stream." "A man with a red backpack steps over a stream in a mountain valley."
"A man wearing a blue flannel shirt, hiking boots, and a red backpack carefully steps across a rocky stream in a picturesque valley surrounded by rugged mountains." "A man crossing a stream. It is raining." "A man crossing a stream and turning around."

Essentiality of Camera-aware and Camera-object-aware Transformations

- Inputs Preview w/o transform w/ transform (Ours)
Camera-aware transformation
Camera-object-aware transformation

 


Legend of Camera Motions