Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Improving performance of LLM models using series of prompts that mimics how we humans solve complex problems

Sep 09, 2024

tilt shift photography of gray steel chains — Photo by Mike Alonzo on Unsplash

Introduction

It has been observed by authors [1] that scaling up model size doesn’t necessarily lead to ability of model to achieve high performance in arithmetic, symbolic reasoning and common sense. The authors [1] have introduced the concept of Chain of thought to enhance the performance of model in the above stated tasks

Intuition

Humans often solve complex problems using a technique that breaks down complex problems into systematic steps. The same approach has influenced authors at Google Brain Research Team to build the concept of the Chain-Of-Thought prompting.

Figure 1: Chain-of-thought prompting enables large language models to tackle complex arithmetic, commonsense, and symbolic reasoning tasks. Chain-of-thought reasoning processes are highlighted.

Advantages of Chain-Of-Thought Prompt

According to authors[1], “Chain-of-thought prompting has several attractive properties as an approach for facilitating reasoning in language models.

Allows for decomposition of problems into intermediate steps. This means that for problems that require more reasoning steps can be allocated additional computation.
Ease of debugging the series of reasoning steps taken for the providing a solution to posed problem since this approach provides “interpretable window into the behavior of the model”.
“Chain-of-thought reasoning can be used for tasks such as math word problems, commonsense reasoning, and symbolic manipulation, and is potentially applicable (at least in principle) to any task that humans can solve via language.”
Ease of generating the Chain-of-thought reasoning by using few-shot prompting.

Figure 2: Examples of (input, chain of thought, output) triples for arithmetic, commonsense, and symbolic reasoning benchmarks. Chains of thought are highlighted.

Results

According to authors[1], “Chain-of-thought prompting does not positively impact performance for small models, and only yields performance gains when used with models of ∼100B parameters.”
“Chain-of-thought prompting has larger performance gains for more-complicated problems where as for simpler problems the effect is marginal or negative.

References

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Machine Learning Diaries

Discussion about this post