Modern LLM inference platforms are no longer single‑tenant, single‑model playgrounds. They are multi‑tenant, multi‑model, high‑throughput systems where thousands of organizations send sensitive prompts that must be isolated, validated, shaped, and routed with absolute precision.
This layer is not just a traffic entry point. It is the blast‑radius boundary, the zero‑trust enforcement point, and the contractual guardian of every tenant’s data.
While building the edge gateway for a multi‑tenant LLM inference platform the network layer became the most thought‑provoking piece of the entire architecture.
I didn’t start with Envoy. I started with a question: “Where does trust begin in a multi‑tenant system?”
Not in the model. Not in the GPU. Not even in the Router.
It begins the moment a tenant (customer) sends a prompt. And that’s why Layer 1 — the Edge Gateway — became the most critical layer to get it right.
This deep dive walks through the architecture, the reasoning behind each decision, and the exact flow a tenant’s (customers) prompt takes through before it ever reaches your Router or the backend GPU workers.
I needed something that could:
Terminate TLS
Extract identity
Enforce RBAC
Apply quotas
Validate schema
Shape traffic
Forward trusted context
Scale horizontally
Fail fast
Emit metrics
I chose Envoy, Not because its popular. But because it gave me the tools to enforce identity, policy, schema, and isolation — all before a single token hits the inference pipeline.
Why an Edge Gateway Exists at All
Before we talk further about Envoy, we need to understand the job of Layer 1.
The Edge Gateway must:
Authenticate who is sending the request
Authorize what they’re allowed to do
Validate the shape and schema of the request
Rate‑limit and quota‑enforce per tenant
Normalize and sanitize the payload
Forward a clean, tenant‑scoped request to the Router
And it must do all of this before any expensive inference work happens.
This is why Envoy is the perfect fit: It is programmable, filter‑driven, zero‑trust‑aligned, and built for high‑throughput, low‑latency traffic.
End‑to‑End Deep Dive Through the Gateway

TLS/mTLS: Protecting the Prompt
The Tenant sends an HTTPS request. Envoy terminates TLS or mTLS.
Why it matters:
The prompt is encrypted end‑to‑end.
No plaintext on the wire.
No middleman can read or tamper with it.
HTTP Connection Manager: Structuring the Request
Envoy parses:
Headers
Body
Method
Protocol
Why it matters: This is where the request becomes structured data Envoy can reason about.
Identity Extraction: Who Is This Customer?
Envoy extracts identity from:
JWT tokens
mTLS certificates
It produces trusted metadata, not raw client headers.
Why it matters: Identity is the foundation of multi‑tenant isolation. If identity is wrong, everything downstream is wrong.
Policy Engine: What Are They Allowed to Do?
Envoy calls an external or internal policy engine:
RBAC
OPA
ext_authz
It evaluates:
Allowed models
Allowed routes
Allowed regions
Allowed features
Plan limits
Suspension status
Why it matters: This is where you enforce contractual boundaries.
Rate Limits & Quotas: Fairness and Safety
Envoy enforces:
QPS per customer
Burst limits
Token quotas
Max prompt size
Max concurrency
Why it matters: Prevents noisy neighbors from starving the platform. Protects GPU capacity. Protects your cloud bill.
Schema Validation: The Firewall for Prompt Safety
Envoy validates:
JSON structure
Required fields
Allowed fields
Forbidden fields
Type correctness
Why it matters: Prevents malformed or malicious requests from reaching the Router.
Forwarding with Trusted Tenant Context
Envoy forwards the sanitized request to the Router using mTLS.
It attaches:
x-tenant-idx-tenant-planx-tenant-orgx-tenant-limits
Why it matters: Router never parses JWTs. Router never trusts client headers. Router only trusts Envoy.
This keeps security at the edge and routing in the core.
What I Learned Building
The network is not plumbing — it’s the first trust boundary.
Envoy is not “just a proxy” — it’s a policy engine with routing capabilities.
Multi‑tenant correctness is not a feature — it’s a discipline.
Isolation is not optional — it’s the contract you make with every customer.
The edge is where safety begins — not the model or the GPU.
This layer is the guardian of the entire platform.
Closing Thoughts
If you get the Edge Gateway - Layer 1 right:
Your Router becomes simpler
Your Kubernetes Worker pools become safer
Your GPU scheduling becomes predictable
Your customers trust your platform
If you get it wrong, nothing downstream can save you.
This is why the Edge Gateway — and Envoy — became the most important architectural decision in the entire LLM platform build.
Keep Building - Keep Evolving
What I don’t build, I don’t truly know — it’s just theory on paper.
