Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

Seongjun Yang, Gibbeum Lee, Jaewoong Cho, Dimitris Papailiopoulos, Kangwook Lee