File size: 3,146 Bytes
08b23ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# 3D Animation with Video Guidance
This repository provides a complete pipeline for generating 3D object animations with video guidance. The system includes data processing and optimization algorithms for rigging-based animation.

## Overview
The pipeline takes a rigged 3D model and a reference video, then optimizes the object's motion to match the video guidance while maintaining realistic skeletal constraints.

## Prerequisites

### Model Downloads
Download the required pre-trained models:

- [Video-Depth-Anything](https://huggingface.co/depth-anything/Video-Depth-Anything-Large) - For depth estimation
- [CoTracker3](https://huggingface.co/facebook/cotracker3) - For point tracking

```
python download.py
```

### Input Data Structure

Organize your input data as follows:
```
inputs/
└── {seq_name}/
    β”œβ”€β”€ objs/
    β”‚   β”œβ”€β”€ mesh.obj          # 3D mesh geometry
    β”‚   β”œβ”€β”€ rig.txt           # Rigging definition
    β”‚   β”œβ”€β”€ material.mtl      # Material properties (optional)
    β”‚   └── texture.png       # Texture maps (optional)
    β”œβ”€β”€ first_frames/         # Rendered initial frames
    β”œβ”€β”€ imgs/                 # Extracted video frames
    β”œβ”€β”€ flow/                 # Optical flow data
    β”œβ”€β”€ flow_vis/             # Visualized optical flow
    β”œβ”€β”€ depth/                # Esitmated depth data
    β”œβ”€β”€ track/                # tracked joints/vertices
    └── input.mp4             # Source video
```

## Data Processing

Given a 3D model with rigging under `inputs/{seq_name}/objs` (`mesh.obj, rig.txt`, optional `.mtl` and texture `.png`), we first render the object from a specified viewpoint. This image is used as the input (first frame) to the video generation model (e.g., [Jimeng AI](https://jimeng.jianying.com/ai-tool/home?type=video)).

```
python utils/render_first_frame.py --input_path inputs --seq_name {seq_name}
```
Replace `{seq_name}` with your sequence name. The first-frame images are saved to `inputs/{seq_name}/first_frames`. This generates reference images from 4 different viewpoints (you can add more). Choose the viewpoint that best shows the object's joints and key parts for optimal animation results. Save the generated videos to `inputs/{seq_name}/input.mp4`.

Then we extract the frames from the video by running:

```
cd inputs/{seq_name}; mkdir imgs
ffmpeg -i input.mp4 -vf fps=10 imgs/frame_%04d.png
cd ../../
```

Estimate optical flows by running:

```
python utils/save_flow.py --input_path inputs --seq_name {seq_name}
```
The flow `.flo` files are saved to `inputs/{seq_name}/flow`, the flow visualization are saved to `inputs/{seq_name}/flow_vis`. Depth and tracking information are saved during optimization.

## Optimization

To optimize the animation, you can run

```
bash demo.sh
```

The results are saved to `results/{seq_name}/{save_name}`. Modify `--main_renderer` and `--additional_renderers` to change rendering viewpoints. If animations exhibit jitter or instability, increase the root/joint smoothing weights for better temporal consistency.


## TODO

- [ ] Add multi-view supervisions.