Nancy Bethala-Frounjian

Nancy Bethala-Frounjian

Deconstructing complexity—Cloud, AI, Ops

DeepSeek on vLLM V1: The Bottleneck Moved from KV Cache to Burst Admission

Jun 7, 2026

•

6 min read

DeepSeek on vLLM V1: The Bottleneck Moved from KV Cache to Burst Admission

A systems-level look at MoE + MLA serving and low KV-cache pressure

Nancy Bethala-Frounjian

Why your GPU is idle but your requests are queued

May 30, 2026

•

3 min read

Why your GPU is idle but your requests are queued

Nancy Bethala-Frounjian

Why LLM Inference Needs Two Different GPUs

Mar 13, 2026

•

5 min read

Why LLM Inference Needs Two Different GPUs

The case for splitting prefill and decode

Nancy Bethala-Frounjian

I Built an Evidence-First Technical Due Diligence Agent

Feb 16, 2026

•

3 min read

I Built an Evidence-First Technical Due Diligence Agent

Nancy Bethala-Frounjian

First‐Principles Security at the Edge: Designing the Gateway Layer

Jan 22, 2026

•

5 min read

First‐Principles Security at the Edge: Designing the Gateway Layer

The Security Spine of a Multi‐Tenant LLM Platform

Nancy Bethala-Frounjian

Building an LLM inference platform with intelligent routing

Jan 7, 2026

•

5 min read

Building an LLM inference platform with intelligent routing

Reducing inference cost up to ~60% using deterministic routing and model fallback

Nancy Bethala-Frounjian

Beyond the Black Box: Insights from nvidia-smi

Dec 28, 2025

•

3 min read

Beyond the Black Box: Insights from nvidia-smi

How hardware-level monitoring drives real-world MLOps performance

Nancy Bethala-Frounjian

MLOps at Scale: Implementing Service Mesh & Triton for Enterprise Inference

Dec 26, 2025

•

4 min read

MLOps at Scale: Implementing Service Mesh & Triton for Enterprise Inference

Beyond the Pod: Designing a Modern Inference Ecosystem

Nancy Bethala-Frounjian

Beyond the ML Model: Why I Built a Complete Inference Ecosystem

Dec 19, 2025

•

2 min read

Beyond the ML Model: Why I Built a Complete Inference Ecosystem

The “Last Mile” of MLOps: Building a Production-Ready Inference Stack with NVIDIA Triton

Nancy Bethala-Frounjian

Inside a GPU Node: How Modern AI Infrastructure Really Works

Nov 22, 2025

•

3 min read

Inside a GPU Node: How Modern AI Infrastructure Really Works

What actually happens inside the machines that train and run AI models at scale.

Nancy Bethala-Frounjian

Nov 17, 2025

•

2 min read

Must Read For AI Engineer

Book - AI Systems Performance Engineering By Chris Fregly

Nancy Bethala-Frounjian

Nov 14, 2025

•

2 min read

NVIDIA GPU for AI Systems

Whats inside the Engine

Nancy Bethala-Frounjian