Mixtures of Experts Unlock Parameter Scaling for Deep RL

[arXiv](2024/02/13 version v1)

Abstract

강화 학습에서 MoE를 통한 피라미터 확장으로 성능 향상

Where to place the MoEs?

두 번째 FFN layer.

What is a token?

실험 결과 PerConv 방식이 가장 성능이 좋았다고 함.

또한 전문가 출력에서 입력 크기를 유지하기 위해 linear projection 추가.

What flavour of MoE to use?

Top1-MoE, SoftMoE

다양하고 도전적인 픽셀 기반 환경 모음인 Arcade Learning Environment의 20개 게임에 대해 평가.

SoftMoE가 더 좋아요.

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (SearchFormer) (0)	2024.03.07
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information (0)	2024.03.07
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data (0)	2024.02.23
YOLO-World: Real-Time Open-Vocabulary Object Detection (3)	2024.02.07
TOOD: Task-aligned One-stage Object Detection (0)	2024.02.07
Zero Bubble Pipeline Parallelism (0)	2024.01.26