Modern LLM inference platforms are no longer single‑tenant, single‑model playgrounds. They are multi‑tenant, multi‑model, high‑throughput systems where thousands of organizations send sensitive prompts that must be isolated, validated, shaped, and routed with absolute precision.

This layer is not just a traffic entry point. It is the blast‑radius boundary, the zero‑trust enforcement point, and the contractual guardian of every tenant’s data.

While building the edge gateway for a multi‑tenant LLM inference platform the network layer became the most thought‑provoking piece of the entire architecture.

I didn’t start with Envoy. I started with a question: “Where does trust begin in a multi‑tenant system?”

Not in the model. Not in the GPU. Not even in the Router.

It begins the moment a tenant (customer) sends a prompt. And that’s why Layer 1 — the Edge Gateway — became the most critical layer to get it right.

This deep dive walks through the architecture, the reasoning behind each decision, and the exact flow a tenant’s (customers) prompt takes through before it ever reaches your Router or the backend GPU workers.

I needed something that could:

  • Terminate TLS

  • Extract identity

  • Enforce RBAC

  • Apply quotas

  • Validate schema

  • Shape traffic

  • Forward trusted context

  • Scale horizontally

  • Fail fast

  • Emit metrics

I chose Envoy, Not because its popular. But because it gave me the tools to enforce identity, policy, schema, and isolation — all before a single token hits the inference pipeline.

Why an Edge Gateway Exists at All

Before we talk further about Envoy, we need to understand the job of Layer 1.

The Edge Gateway must:

  • Authenticate who is sending the request

  • Authorize what they’re allowed to do

  • Validate the shape and schema of the request

  • Rate‑limit and quota‑enforce per tenant

  • Normalize and sanitize the payload

  • Forward a clean, tenant‑scoped request to the Router

And it must do all of this before any expensive inference work happens.

This is why Envoy is the perfect fit: It is programmable, filter‑driven, zero‑trust‑aligned, and built for high‑throughput, low‑latency traffic.

End‑to‑End Deep Dive Through the Gateway

TLS/mTLS: Protecting the Prompt

The Tenant sends an HTTPS request. Envoy terminates TLS or mTLS.

Why it matters:

  • The prompt is encrypted end‑to‑end.

  • No plaintext on the wire.

  • No middleman can read or tamper with it.

HTTP Connection Manager: Structuring the Request

Envoy parses:

  • Headers

  • Body

  • Method

  • Protocol

Why it matters: This is where the request becomes structured data Envoy can reason about.

Identity Extraction: Who Is This Customer?

Envoy extracts identity from:

  • JWT tokens

  • mTLS certificates

It produces trusted metadata, not raw client headers.

Why it matters: Identity is the foundation of multi‑tenant isolation. If identity is wrong, everything downstream is wrong.

Policy Engine: What Are They Allowed to Do?

Envoy calls an external or internal policy engine:

  • RBAC

  • OPA

  • ext_authz

It evaluates:

  • Allowed models

  • Allowed routes

  • Allowed regions

  • Allowed features

  • Plan limits

  • Suspension status

Why it matters: This is where you enforce contractual boundaries.

Rate Limits & Quotas: Fairness and Safety

Envoy enforces:

  • QPS per customer

  • Burst limits

  • Token quotas

  • Max prompt size

  • Max concurrency

Why it matters: Prevents noisy neighbors from starving the platform. Protects GPU capacity. Protects your cloud bill.

Schema Validation: The Firewall for Prompt Safety

Envoy validates:

  • JSON structure

  • Required fields

  • Allowed fields

  • Forbidden fields

  • Type correctness

Why it matters: Prevents malformed or malicious requests from reaching the Router.

Forwarding with Trusted Tenant Context

Envoy forwards the sanitized request to the Router using mTLS.

It attaches:

  • x-tenant-id

  • x-tenant-plan

  • x-tenant-org

  • x-tenant-limits

Why it matters: Router never parses JWTs. Router never trusts client headers. Router only trusts Envoy.

This keeps security at the edge and routing in the core.

What I Learned Building

  • The network is not plumbing — it’s the first trust boundary.

  • Envoy is not “just a proxy” — it’s a policy engine with routing capabilities.

  • Multi‑tenant correctness is not a feature — it’s a discipline.

  • Isolation is not optional — it’s the contract you make with every customer.

  • The edge is where safety begins — not the model or the GPU.

This layer is the guardian of the entire platform.

Closing Thoughts

If you get the Edge Gateway - Layer 1 right:

  • Your Router becomes simpler

  • Your Kubernetes Worker pools become safer

  • Your GPU scheduling becomes predictable

  • Your customers trust your platform

If you get it wrong, nothing downstream can save you.

This is why the Edge Gateway — and Envoy — became the most important architectural decision in the entire LLM platform build.

Keep Building - Keep Evolving

What I don’t build, I don’t truly know — it’s just theory on paper.

Keep Reading