StackBytes
My notebook on LLM systems

Written by Nancy Bethala-Frounjian - systems engineer

Most Recent

DeepSeek on vLLM V1: The Bottleneck Moved from KV Cache to Burst Admission

Jun 7, 2026

DeepSeek on vLLM V1: The Bottleneck Moved from KV Cache to Burst Admission

A systems-level look at MoE + MLA serving and low KV-cache pressure

Read more
arrow-right
Why your GPU is idle but your requests are queued

May 30, 2026

Why your GPU is idle but your requests are queued

Read more
arrow-right
Why LLM Inference Needs Two Different GPUs

Mar 13, 2026

Why LLM Inference Needs Two Different GPUs

The case for splitting prefill and decode

Read more
arrow-right
© 2026 Nancy Bethala-Frounjian.
beehiivPowered by beehiiv