Chain-of-Thought Reasoning Without Prompting (CoT-decoding)

[arXiv](2024/02/15 version v1)

Abstract

단순히 decoding process를 변경함으로써 prompting 없이 CoT reasoning path를 도출할 수 있는 CoT-decoding 소개

입력은 "Q:[question]\nA:" 형식이다.

Greedy decoding 대신 대안적인 경로를 탐색하며 CoT가 디코딩 경로에 존재할 때 더 높은 신뢰도를 갖는 경향이 있다.

Greedy decoding 모델은 즉각적인 문제 해결 경향이 있으며, 이는 일반적으로 정확도가 낮다.

반면에 첫 번째 디코딩 단계에서 top-k 토큰을 탐색하고 이후에 greedy decoding을 사용하면 자연스러운 CoT 추론이 드러난다.

총 토큰 수 n, t 시점에서 각각 첫 번째와 두 번째로 확률이 높은 토큰을 x_t¹, x_t²라고 할 때, 답변의 confident는 다음과 같이 계산한다.

모델의 logits을 조사한 결과 CoT 경로가 포함된 답변은 높은 confident를 가지는 경향이 있었다. (위 표의 파란색 = confident)

추가적인 heuristic으로는 길이를 기준으로 답변을 선택하는 것이 있으며, 직관적으로 답변이 길수록 CoT가 포함될 가능성이 높다.

Branching at other decoding steps

최적의 분기 지점은 작업에 따라 달라질 수 있다.

Aggregation of the decoding paths

Self-Consistency와 비슷하게 다음을 최대화하는, 답변이 a인 k번째 decoding paths를 집계하면 결과의 안정성이 향상된다.

Mathematical Reasoning Tasks

Natural Language Reasoning Tasks

Symbolic Reasoning Tasks

Linear Transformers with Learnable Kernel Functions are Better In-Context Models (ReBased) (1)	2024.02.28
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (SPIN) (0)	2024.02.26
Generative Representational Instruction Tuning (GRIT) (0)	2024.02.26
World Model on Million-Length Video And Language With RingAttention (LargeWorldModel) (0)	2024.02.22
More Agents Is All You Need (1)	2024.02.21
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs (0)	2024.02.21