본문 바로가기

논문 리뷰

(452)

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis StyleGAN으로 text-to-image 따라잡기. 하지만 아직 갈 길이 멀다... Project Page stylegan-t StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis Axel Sauer Tero Karras Samuli Laine Andreas Geiger Timo Aila [PDF] [Code] sites.google.com Abstract Text-to-image를 위한 새로운 GAN 모델인 StyleGAN-T. Introduction GAN의 주요 이점은 합성 결과의 제어 가능성과 확산 모델에 비하면 매우 빠른 추론 속도. StyleGAN-XL에서 시작하여 다양한 데이터셋, 제어 가능..

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation 사전 훈련 T2I 모델로 비디오 생성 Project Page Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation A new method for text-to-video generation using one text-video pair. tuneavideo.github.io Abstract 사전 훈련된 Text-to-Image(T2I) 확산 모델의 튜닝을 통해 text prompt에서 비디오를 생성하는 Sparse-Causal Attention이 있는 Tune-A-Video(TAV) 제안 Introduction T2I 모델의 self attention을 여러 이미지로 확장하는 것만으로 프레임 간에 콘텐츠 일..

InstructPix2Pix: Learning to Follow Image Editing Instructions Project Page InstructPix2Pix We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the www.timothybrooks.com Abstract 언어 모델, text-to-image 확산 모델을 이용해 finetuning, inversion 없이 이미지를 편집하는 InstructPix2Pi..

GLIGEN: Open-Set Grounded Text-to-Image Generation Text2img 확산 모델에서 object 위치, 포즈 지정하기 Project Page GLIGEN:Open-Set Grounded Text-to-Image Generation. Large-scale text-to-image diffusion models have made amazing advances. However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN, Grounded-Language-to-Image Generation, a novel approach that build gligen.github.io Abstract Grounding input..

DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization DiffStyler 코드 리뷰, DiffStyler 써보기 Text guided stylization + dual architecture arXiv Github Abstract Text-guided stylization 확산 모델 DiffStyler Dual diffusion architecture를 사용하여 content와 style 사이의 균형을 제어 Content 이미지 기반의 학습 가능한 노이즈로 content의 구조를 보존 (모델을 학습시키는 게 아니라 sampling process의 입력인 학습 가능한 노이즈를 최적화하는 형태임) Introduction 예제 이미지를 이용한 stylization은 content와 style을 분리하는 과정을 거쳐야 하지만 텍스트는 해당 스타일 자체에 대한 의미..

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders 순수 Conv로만 이루어낸 ConvNeXt + MAE arXiv Github Abstract ConvNet에 masked autoencoder(MAE)와 같은 자체 지도 학습을 도입했을 때, 성능이 낮아졌다. 본 논문에서는 ConNeXt 아키텍처에 fully convolutional masked autoencoder framework를 도입하고 채널 간 feature competition을 강화하기 위해 Global Response Normalization (GRN) layer를 추가한다. Introduction MAE는 애초에 transformer sequence 처리에 최적화되어 있기 때문에 ConvNet에 적용했을 때 성능이 좋지 않고, 따라서 Conv용 MAE를 설계하는 것이 본 논문의 목표이다...

GAN 논문 분류 Architecture Improving FastGAN : 100개 미만의 샘플로 몇 시간만에 수렴하는 초경량 GAN Projected GAN : GAN의 성능을 향상시키는 무작위 투영 다중 판별기 제안 Alias-Free GAN(StyleGAN3) : StyleGAN2에서 신호처리의 개념을 도입해 엘리어싱을 완전히 제거 개념편, 아키텍처편 StyleGAN-XL : StyleGAN3 + Projected GAN EqGAN-SA : 히트맵과 D의 attention map을 이용해 GAN 훈련 평형 개선 Inversion StyleSpace : S 공간의 효율성 입증. S 공간에서의 속성 탐색 방법을 제안 ReStyle : 인코더 기반 반전과 직접 최적화 반전을 결합한 반전 인코더 모델 SAM Inversi..

Vision Transformer 논문 분류 Attention, ImprovingNAT : Attention을 마치 컨볼루션처럼 작동시키는 Neighborhood Attention 제안FAN : Self attention과 모델 견고성의 관계를 분석ViT-Adapter : Vanilla ViT에 쉽게 추가할 수 있고 성능이 크게 향상되는 간단하고 효율적인 어댑터 Inception Transformer : ViT와 CNN의 장점을 모두 활용하기 위한 모델. pooling, attention, convolution을 완전히 따로 분리해서 연산하는 것이 특징EfficientFormer : ViT의 mobile 환경에서의 지연 시간을 획기적으로 줄임MobileViTv2 : MobileViT에서 새로운 separable self-attention을 추가한 ..

Muse: Text-To-Image Generation via Masked Generative Transformers Text-to-image를 위한 VQ masked transformer Projet Page arXiv Github Abstract Diffusion 또는 Autoregressive(AR) 모델보다 훨씬 효율적이면서 이미지 생성에서 SOTA를 달성한 text-to-image transformer 모델 Muse. Muse는 사전 훈련된 대형 언어 모델에서 추출된 텍스트 임베딩을 고려해 무작위로 마스킹된 이미지 토큰을 예측하도록 훈련된다. 적은 샘플링 반복과 병렬 디코딩 사용으로 확산이나 AR 모델보다 빠르고 효율적이다. Introduction 하위 모듈들 256, 512로 한 쌍의 VQGAN tokenizer 부분적으로 마스킹된 저해상도 토큰 시퀀스에서 마스킹되지 않은 토큰과 텍스트 임베딩에 따라 마스킹된..

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer (MiDaS) 다양한 깊이 추정 데이터셋에 통합적으로 적용 가능한 손실함수 개발. 3D movies dataset 제안. 요약: 다양한 기준을 가진 데이터셋을 정렬하여 혼합 훈련 Multi-log-scale에서 x,y 축의 gradient를 포착하는 loss term arXiv Github Abstract Zero-shot monocular depth estimation을 위한 새로운 데이터 소스 제안. 다양한 깊이 추정 데이터셋에 통합적으로 적용 가능한 손실함수 개발. Introduction 본 논문의 목적은 다양한 환경에서 강력한 monocular depth estimation 모델을 훈련하는 방법을 조사하는 것이다. 이를 위해 다양한 감지 양식으로 얻은 데이터에 대해 훈련 가능한 손실함수를 개발하고 다양한 기존 ..

What the DAAM: Interpreting Stable Diffusion Using Cross Attention DAAM 코드 리뷰, DAAM 써보기 단어가 생성된 이미지에 미친 영향을 hitmap으로 표시 arXiv Github Abstract Stable diffusion에 대한 text-image 속성 분석. Introduction 모델에서 cross attention map을 결합하여 각 단어에 대한 2차원 속성 map을 생성한다. 이것을 Diffusion Attentive Attribution Maps(DAAM) 라고 함. DAAM을 semantic segment와 비교 프롬프트의 구문 공간에서의 관계가 이미지의 픽셀 공간에서의 관계와 어떻게 관련되는지를 특성화 DAAM의 렌즈를 통해 의미론적 현상, 특히 생성 품질에 영향을 미치는 현상을 연구하여 구문적 발견을 추가로 조사 예를 들어, '기린과 얼룩말'은..

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models DPM-solver에서 속도 개선, guided sampling에서도 작동되게 개선. Arxiv Github Abstract Guided sampling을 위한 DPM-solver. 또한 임계값 방법과 DPM-solver++의 multi-step 변형 제안. Introduction DPM-solver에서는 유도(guided) 샘플링에 대해 제대로 조사하지 않았다. 유도 샘플링의 경우 단순한 1차 solver인 DDIM보다도 좋지 않았다고 한다. 데이터 예측 매개 변수화로 ODE를 해결하기 위한 solver를 도출하고, train-test 불일치 문제를 완화하기 위해 동적 임계값 방법을 채택한다. 또한, 불안정 해결을 위해 다단계 solver를 개발. Background DPM-solver Paramete..

이전 1 ··· 29 30 31 32 33 34 35 ··· 38 다음

티스토리툴바