Skeleton of Thought: LLMs Can Do Parallel Decoding Paper Reading

Company

Arize

Date Published

Aug. 24, 2023

Author

Sarah Welsh

Word count

5517

Language

English

Hacker News points

None

URL

arize.com/blog/skeleton-of-thought-llms-can-do-parallel-decoding-paper-reading

Summary

Skeleton-of-Thought approach aims to reduce large language model latency while enhancing answer quality by guiding LLMs to construct answer skeletons before parallel content elaboration, achieving impressive speed-ups of up to 2.39x across 11 models. This innovative methodology is similar to writing an outline on a given topic and relies on the chain-of-thought approach that encourages generative AI to showcase its presumed logic when answering a question or solving a problem. The method is data-centric, relying on prompt engineering to accelerate off-the-shelf LLMs without any changes to their model or hardware. SoT has been tested across 11 models and shows significant speed-up potential for common sense knowledge generation, with some question types achieving higher relevance and diversity in answer quality. However, the approach struggles with math questions due to its reliance on context from previous steps, which is not applicable in step-by-step reasoning tasks like math problems. Future work aims to explore trigger mechanisms for specific question types, develop a graph-of-thought architecture that mimics human thought processes, and potentially replace the attention mechanism with alternative architectures. The approach has potential applications in general chatbot systems, improving user experience and lowering system costs by parallelizing content elaboration between segments of a question or multiple questions.