SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
What's this blog post about?
Company
Together AI
Date published
June 18, 2024
Author(s)
Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin
Word count
1308
Hacker News points
None found.
Language
English