Right-Sizing Kubernetes Requests and Limits: How to Avoid OOMKills and Waste

September 10, 2025

Publish Date

Right-Sizing Kubernetes Requests and Limits: How to Avoid OOMKills and Waste

Details Image

Introduction: The Hidden Cost of Wrong Requests & Limits

Picture this: Your team just launched a major promotion campaign. Traffic surges exactly as marketing hoped but minutes later, your flagship service crashes.

Pods are in a CrashLoopBackOff state, restarts are piling up, and engineers are scrambling. The culprit? A single container hits its memory limit, triggering an OOMKill.

This isn't an uncommon story. Every Kubernetes engineer knows resource configuration matters, but few realize just how impossible it is to get right manually.

Overprovision, and you're burning money. Underprovision, and you risk outages. The stakes are high, yet the tooling and processes most teams rely on make it nearly impossible to hit the sweet spot.

What Are Requests and Limits?

Kubernetes schedules workloads based on two critical values you define in Pod specs:

resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "1"
    memory: "512Mi"
  • Request: The guaranteed amount of CPU or memory for a container. The scheduler uses these numbers to decide where to place the Pod.

  • Limit: The hard cap on what the container can consume at runtime. Exceeding a memory limit triggers an OOMKill; exceeding a CPU limit results in throttling.

Key behavior difference:

  • Requests affect scheduling.
  • Limits affect runtime enforcement.

When requests are too high, nodes look "full", leading to poor bin-packing efficiency and unnecessary node scaling. When limits are too low, workloads crash.

The Common Pitfalls in Resource Configuration

Even experienced teams often fall into these traps:

1. Guesswork

Developers set arbitrary numbers, or worse, leave defaults in place. These numbers stick around for months, silently driving waste or risk.

2. Equal Request and Limit

Setting request == limit seems safe but leaves no burst capacity. Memory spikes instantly result in OOMKills.

3. No Limits

Containers without limits can consume unlimited memory, turning one bad deployment into a node-wide outage—a noisy neighbor problem.

4. Overly Conservative Estimates

SREs, burned by outages, often over-allocate. A service needing 300Mi may get a 1Gi request, bloating costs by 3x.

5. Static Configs in Dynamic Environments

Resource profiles change with every release. Static settings quickly become outdated.

Why Manual Right-Sizing Fails

On paper, right-sizing sounds easy:

"Just gather metrics, analyze them, and adjust numbers."

But anyone running Kubernetes at scale knows this is a fantasy. Let's break down why.

Metrics Are Misleading

Metrics dashboards often show averages or 95th percentile values:

kubectl top pod

or via Prometheus queries like:

quantile_over_time(0.95, sum by(pod)(container_memory_usage_bytes)[5m])

But:

  • Short-lived memory spikes often don't appear in sampled data.
  • The spike you miss is the one that triggers OOMKill.
  • To avoid this, teams over-allocate “just in case,” inflating costs.

Workloads Don't Stay Still

Modern microservices are dynamic by design:

  • Traffic fluctuates daily, weekly, seasonally.
  • Feature releases change memory profiles overnight.
  • Yesterday's "perfect" numbers are tomorrow's liability.

Too Many Services to Tune

In a cluster with 100+ services, even spending 30 minutes per service means days of tuning work. Repeat that every sprint, and your SRE team is just firefighting.

Dashboards Don't Tell You What to Do

Grafana or Datadog dashboards look impressive but don't answer the core question:

“What should I set my requests and limits to?”

Most engineers guess, run a deploy, and hope for the best.

VPA Isn't a Silver Bullet

The Vertical Pod Autoscaler (VPA) was designed to solve this, but:

  • It restarts Pods to apply new values, unacceptable for many production systems.
  • Its recommendations lag behind real-world traffic changes.
  • Bursty or unpredictable workloads often get inaccurate values.

Bottom line: Manual right-sizing is like playing darts blindfolded—you might hit the target occasionally, but you’ll waste enormous time and money doing it.

Where to Go From Here

If this resonates, you're not alone. Industry data shows Kubernetes clusters often use only 10–25% of CPU and 18–35% of memory.

Manual right-sizing is unsustainable at scale. The future lies in continuous, automated resource optimization , tools like VPA paved the way, but we now need solutions that:

  • Continuously adapt to changing workloads.
  • Eliminate Pod restarts when applying changes.
  • Optimize for both cost and reliability.

💡 Exciting news: This month, we're releasing an intelligent Workload Autoscaler that automatically right-sizes your Pods without restarts, helping your cluster run efficiently and reliably.

We've already opened an early access beta, and if you'd like to try it, feel free to contact usyour SRE team will thank YOU!

Smart savings on cloud,
start free in minutes

A 30-minute demo will show you how CloudPilot AI can slash your cloud costs while boosting efficiency.

Get Started today by booking a demo

Cta Image
Cta Image
Footer Logo

Unlock automated cloud savings and transform waste into profitability.

SlackDiscordLinkedInXGithubYoutube
CloudPilot AI Inc.
580 California Street, 12th & 16th Floors
San Francisco, CA 94104

Copyright © 2025 CloudPilot AI, Inc. All Rights Reserved.