MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Abstract

Time-lapse video를 통해 실제 물리 지식을 학습하여 metamorphic video 생성이 가능한 MagicTime

[Project Page]

[Github]

[arXiv](2024/04/07 version v1)

Methodology

ChronoMagic Dataset

Data Curation and Filter

Youtube에서 time-lapse video를 수집하고 이하 설명할 단계를 거쳐 최종적으로 2265개의 time-lapse video가 포함된 ChronoMagic dataset 제작.

Cascade Preprocessing

OpenCV로 pixel intensity 차이를 계산하고 CLIP으로 프레임 간 cosine similarity를 측정하여 복잡한 장면 전환점을 감지하고 비디오를 별개의 섹션으로 분할한다.

이후 수동으로 잘못 잘린 segment를 수정, 선별.

Multi-view Text Fusion

비디오의 제목, 해시태그 등의 정보를 활용해 캡션을 생성한다.

하지만 인터넷상의 과장된 제목이 캡션 품질을 낮출 수 있기 때문에 각 keyframe에 대한 캡션을 생성한 뒤 keyframe captions에서 최종 비디오 캡션을 도출한다.

MagicTime Model

MagicAdapter

Spatial training을 먼저 수행한 후 temporal training을 수행.

사전 훈련된 모델: Stable Diffusion + AnimateDiff

Spatial training:

사전 훈련된 모델에서 temporal layer를 제거 후 spatial layer에 Q-LoRA adapter를 통합하고 keyframe-text pair를 통해 adapter를 훈련한다.

Temporal training:

Temporal layer를 다시 추가하고 똑같이 Q-LoRA 통합하고 ChronoMagic dataset으로 훈련.

Dynamic Frames Extraction

지속적으로 사물이 변화하는 time-lapse의 특성상 무작위 연속 프레임으로는 metamorphic video를 모델링할 수 없다.

따라서 비디오에서 프레임을 균등하게 샘플링하지만, 오히려 부드러운 비디오 생성 능력이 저하될 수 있으므로 Cascade Preprocessing 과정에서 측정한 T를 기반으로 동적으로 샘플링 전략을 선택한다.

Magic Text-Encoder

Metamorphic video의 설명은 일반 비디오의 설명보다 많은 시간 및 상태 정보를 포함한다.

모든 네트워크의 학습이 완료된 후 CLIP text encoder에도 LoRA를 추가하여 훈련.

Experiment

Project Page:

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions. A largely overlooked problem in T2V is that existing models have not adequately encoded physical knowl

pku-yuangroup.github.io

저작자표시

'논문 리뷰 > Diffusion Model' 카테고리의 다른 글

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation (0)	2024.05.07
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation (0)	2024.05.03
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback (0)	2024.04.18
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching (1)	2024.04.08
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation (1)	2024.04.04
Getting it Right: Improving Spatial Consistency in Text-to-Image Models (SPRIGHT Dataset) (0)	2024.04.03

Ostin X

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Abstract

Methodology

ChronoMagic Dataset

MagicTime Model

Experiment

'논문 리뷰 > Diffusion Model' 카테고리의 다른 글

티스토리툴바

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Abstract

Methodology

ChronoMagic Dataset

MagicTime Model

Experiment

'논문 리뷰 > Diffusion Model' 카테고리의 다른 글

'논문 리뷰/Diffusion Model' Related Articles

티스토리툴바