Speculative decoding for high-throughput long-context inference has been reevaluated, revealing that it can significantly improve throughput and latency. The analysis shows that as sequence lengths increase, bottlenecks shift from being compute-bound to memory-bound, making speculative decoding more effective. Two algorithmic innovations, MagicDec and adaptive Sequoia trees, have been proposed to take advantage of this shift. MagicDec uses a fixed context window in the draft model to speed up drafting, while adaptive Sequoia trees adaptively choose the tree size that maximizes speedup. These innovations can achieve significant speedups, up to 2x for LLaMA-2-7B-32K and 1.84x for LLaMA-3.1-8B on 8 A100 GPUs, making them an essential part of throughput optimization systems for long-context workloads.