StackBytes
My notebook on LLM systems

Written by Nancy Bethala-Frounjian - systems engineer

Most Recent

DeepSeek on vLLM V1: The Bottleneck Moved from KV Cache to Burst Admission

Jun 7, 2026

DeepSeek on vLLM V1: The Bottleneck Moved from KV Cache to Burst Admission

A systems-level look at MoE + MLA serving and low KV-cache pressure

Why your GPU is idle but your requests are queued

May 30, 2026

Why your GPU is idle but your requests are queued

Why LLM Inference Needs Two Different GPUs

Mar 13, 2026

Why LLM Inference Needs Two Different GPUs

The case for splitting prefill and decode