Hermes EKS Benchmark: 85% Faster Image Pull and 34% Faster First HTTP 200, No Image Rebuilds

June 22, 2026

Hermes EKS Benchmark: 85% Faster Image Pull and 34% Faster First HTTP 200, No Image Rebuilds

Testimonial Image

CloudPilot AI

Engineering Team

Publish Date

June 22, 2026

Details Image

Background

For Kubernetes workloads with large images, cold start has two parts: the node prepares the image, then the application starts. Before a service can load code, plugins, configuration, models, or indexes, containerd first has to make the image filesystem available.

With the traditional containerd overlayfs path, the node downloads and unpacks the image before the container can start. Lazy loading changes that sequence. Instead of requiring the whole filesystem to be fully local before process start, the image is mounted through an index and files are fetched when they are accessed.

Hermes makes this workflow policy-driven. Application teams keep using their original OCI images. They do not need to rebuild images, publish converted tags, change Dockerfiles, or update Pod image references. The platform team defines a HermesPolicy, and Hermes prepares the lazy-loading artifacts in the cluster.

The previous Hermes benchmark focused mostly on Pod Ready and image pull behavior. This test goes one step further: it measures the path from a Pod being scheduled on a fresh EKS node to the first successful HTTP 200 response from the service.

This keeps the measurement close to the part Hermes can affect: image pull, mount, container start, and the application startup that follows. First HTTP 200 still includes runtime initialization, library and configuration reads, plugin loading, model or index loading, and service bootstrap, so the percentage gain is smaller than the raw image-pull gain.

What We Tested

We tested three HTTP workloads with large public images. They happen to be Java-based, but Hermes runs below the application runtime: it changes how the image filesystem is prepared and read, not how Java starts.

  • Solr: docker.io/library/solr:10.0.0
  • OpenSearch: docker.io/opensearchproject/opensearch:2.19.1
  • Spark master: docker.io/apache/spark:python3-java17

The Pods used the upstream public images, pinned by digest. No Hermes-specific converted image tag was introduced.

We compared two paths:

  • overlay: standard containerd overlayfs image pull and unpack.
  • Hermes: the same workload image, with the Hermes lazy-loading path on the target node.

Each service and variant ran three times, for a total of 18 target runs.

Test Environment

The benchmark ran on EKS:

  • Kubernetes: v1.34.9-eks-93b80c6
  • Node OS: Amazon Linux 2023
  • Runtime: containerd 2.2.4
  • Target instance type: m6i.large
  • Platform: linux/amd64
  • Node isolation: each target run used a fresh target node

For every run, the benchmark removed the previous target node before starting the next run. This avoids measuring a local image cache left behind by a previous Pod.

Experiment Steps

Step 1: Install the Hermes Controller and CRD

The controller watches HermesPolicy resources and Pods. When it observes a matching image, it builds and caches the lazy-loading artifacts.

kubectl apply -f deploy/hermespolicy-crd.yaml
kubectl apply -f deploy/hermes-controller-eks.yaml
kubectl -n hermes-system rollout status deploy/hermes-controller

Step 2: Configure Two Target NodePools

The test used two NodePool/NodeClass pairs:

  • an overlay NodePool: standard AL2023/containerd overlay path.
  • a Hermes NodePool: AL2023/containerd with the Hermes daemon installed through EC2NodeClass userData.

The Hermes-enabled NodeClass installed the daemon during node bootstrap:

export HERMES_INSTALLER_URL="..."
export HERMES_DAEMON_URL="..."
export HERMES_DAEMON_SHA256="..."

curl -fsSL "${HERMES_INSTALLER_URL}" | \
  HERMES_DAEMON_URL="${HERMES_DAEMON_URL}" \
  HERMES_DAEMON_SHA256="${HERMES_DAEMON_SHA256}" \
  bash -s --

Both NodePools used the same instance type and capacity type.

Step 3: Create the HermesPolicy

The policy selected the three benchmark workload images:

apiVersion: hermes.cloudpilot.ai/v1alpha1
kind: HermesPolicy
metadata:
  name: benchmark-images
spec:
  imageSelectors:
    - imageRegex: '.*(solr:10\.0\.0|opensearchproject/opensearch:2\.19\.1|apache/spark:python3-java17).*'
  platforms:
    - linux/amd64

Step 4: Wait for Artifacts to Be Ready

Hermes artifact preparation is outside the target Pod startup timing window. It is a controller-side preparation step for matching images.

In this run, preparation took:

  • Solr: 58.918s
  • OpenSearch: 2m44.999s
  • Spark: 1m4.493s

The target workload benchmark started after the artifacts reached Ready.

Step 5: Run Fresh-Node Target Pods

For each service and variant, the benchmark:

  1. Deleted any previous target Pod and Service.
  2. Removed the previous target node.
  3. Waited for the target node to be fully gone.
  4. Created a watcher Pod on a stable node.
  5. Created the target workload Pod on a fresh target node.
  6. Recorded image pull, container start, Pod Ready, and first HTTP 200 timing.

The first HTTP 200 timestamp was recorded after the watcher received a successful HTTP response, not before sending the request.

Step 6: Workload YAML and HTTP 200 Probe

The same application YAML was used for both variants. The only scheduling difference was the target node pool: hermes-overlay for the overlay run and hermes-startuplocal for the Hermes run.

Each target Pod was exposed by a headless Service. The Service selected one run-specific Pod through app, variant, and e2e-run labels:

apiVersion: v1
kind: Service
metadata:
  name: opensearch-overlay-r1-20260622025902-ope
  namespace: hermes-e2e
spec:
  clusterIP: None
  publishNotReadyAddresses: true
  selector:
    app: opensearch-e2e
    variant: overlay
    e2e-run: 20260622025902-opensearch-overlay-r1
  ports:
    - name: http
      port: 9200
      targetPort: http

publishNotReadyAddresses: true lets the watcher resolve and call the Pod through Service DNS before the Pod is marked Ready. Without this, the Service would usually hide the endpoint until readiness succeeds, which would make it harder to measure the first successful HTTP 200.

Here is the OpenSearch target Pod used in the benchmark. The Hermes run used the same container spec and changed only the run labels/name and karpenter.sh/nodepool selector:

apiVersion: v1
kind: Pod
metadata:
  name: opensearch-overlay-r1-20260622025902-ope
  namespace: hermes-e2e
  labels:
    app: opensearch-e2e
    variant: overlay
    e2e-run: 20260622025902-opensearch-overlay-r1
spec:
  restartPolicy: Never
  terminationGracePeriodSeconds: 1
  nodeSelector:
    kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    karpenter.sh/nodepool: hermes-overlay
  containers:
    - name: opensearch
      image: docker.io/opensearchproject/opensearch:2.19.1@sha256:7ad3c515e43fb1642ddf2181dfd03402e42e85a16030e098ed1f3fc1404d7e89
      imagePullPolicy: IfNotPresent
      args:
        - opensearch
        - -Ediscovery.type=single-node
        - -Ehttp.host=0.0.0.0
        - -Etransport.host=127.0.0.1
      env:
        - name: OPENSEARCH_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        - name: DISABLE_SECURITY_PLUGIN
          value: "true"
      ports:
        - name: http
          containerPort: 9200
      readinessProbe:
        httpGet:
          path: /_cluster/health?local=true
          port: http
        periodSeconds: 1
        timeoutSeconds: 1
        failureThreshold: 900
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
          ephemeral-storage: "8Gi"
        limits:
          memory: "3Gi"

Solr and Spark used the same Pod shape. Their workload container blocks were:

# Solr
- name: solr
  image: docker.io/library/solr:10.0.0@sha256:c5d3e51740f81612dac200c91908c253bf8302dca330874d6dcef23dacafc723
  imagePullPolicy: IfNotPresent
  env:
    - name: SOLR_HEAP
      value: "512m"
  ports:
    - name: http
      containerPort: 8983
  readinessProbe:
    httpGet:
      path: /solr/admin/info/system
      port: http
    periodSeconds: 1
    timeoutSeconds: 1
    failureThreshold: 900
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
      ephemeral-storage: "4Gi"
    limits:
      memory: "3Gi"

# Spark master
- name: spark-master
  image: docker.io/apache/spark:python3-java17@sha256:6fb854a580e552290a21d7b9c6214f2d8840733e63a90ff687a2ffce80f45ef9
  imagePullPolicy: IfNotPresent
  command:
    - /opt/spark/bin/spark-class
  args:
    - org.apache.spark.deploy.master.Master
    - --host
    - 0.0.0.0
    - --port
    - "7077"
    - --webui-port
    - "8080"
  ports:
    - name: web
      containerPort: 8080
    - name: master
      containerPort: 7077
  readinessProbe:
    httpGet:
      path: /
      port: web
    periodSeconds: 1
    timeoutSeconds: 1
    failureThreshold: 900
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
      ephemeral-storage: "4Gi"
    limits:
      memory: "4Gi"

The HTTP 200 timestamp came from a separate watcher Pod running on a stable managed node. It repeatedly called the Service DNS name and printed first_http_200_ns only after receiving HTTP status 200:

apiVersion: v1
kind: Pod
metadata:
  name: watch-opensearch-overlay-r1-20260622025902-ope
  namespace: hermes-e2e
  labels:
    app: opensearch-first-200
    variant: overlay
    e2e-run: 20260622025902-opensearch-overlay-r1
spec:
  restartPolicy: Never
  nodeSelector:
    eks.amazonaws.com/nodegroup: eks-spot-20260620152041180300000015
  containers:
    - name: watcher
      image: python:3.12-alpine
      imagePullPolicy: IfNotPresent
      env:
        - name: TARGET_URL
          value: http://opensearch-overlay-r1-20260622025902-ope.hermes-e2e.svc.cluster.local:9200/_cluster/health?local=true
      command:
        - sh
        - -lc
        - |
          python - <<'PY'
          import os
          import time
          import urllib.request

          url = os.environ["TARGET_URL"]
          print(f"watcher_start_ns={time.time_ns()}", flush=True)
          while True:
              try:
                  with urllib.request.urlopen(url, timeout=0.5) as response:
                      if response.status == 200:
                          print(f"first_http_200_ns={time.time_ns()}", flush=True)
                          break
              except Exception:
                  pass
              time.sleep(0.25)
          PY

Results

ServiceOverlay pullHermes pullPull reductionPull deltaOverlay scheduled -> HTTP 200Hermes scheduled -> HTTP 200HTTP 200 reductionHTTP 200 delta
OpenSearch20.371s2.998s85.28%-17.373s37.943s30.375s19.95%-7.568s
Solr9.053s2.584s71.46%-6.469s18.572s14.919s19.67%-3.653s
Spark15.763s3.811s75.82%-11.952s20.191s13.304s34.11%-6.887s

OpenSearch showed the biggest pull-time change, from 20.371s to 2.998s. Spark showed the biggest first-HTTP-200 change, from 20.191s to 13.304s.

What These Numbers Mean

Hermes had the clearest effect on image pull. Across the three workloads, image pull time dropped by 71% to 85%.

First HTTP 200 improved by 20% to 34%. That number is lower than the pull-time improvement because first HTTP 200 includes more than image pull:

  • container creation
  • process start
  • runtime initialization
  • code, library, configuration, or model reads
  • plugin loading
  • service readiness checks

For OpenSearch, the overlay baseline from Pod scheduled to first HTTP 200 was about 38 seconds. Hermes reduced pull time from about 20 seconds to about 3 seconds, and the service reached first HTTP 200 about 7.6 seconds sooner. The remaining time is OpenSearch startup work after the container has already started.

That is why the table shows both image pull and first HTTP 200. Pull time shows the direct image-path effect; first HTTP 200 shows what reached the service boundary.

Conclusion

The benchmark showed a clear image-pull reduction without changing application image references or rebuilding images.

In this EKS test, OpenSearch pull time dropped from 20.371s to 2.998s. Spark first HTTP 200 dropped from 20.191s to 13.304s.

The result is not Java-specific. Hermes works at the image filesystem layer, so the same mechanism applies to any OCI image. These Java-based workloads are just a concrete test set with real HTTP readiness behavior.

Operationally, the workflow stayed simple: the platform prepared the lazy-loading artifacts, while the Pods kept their normal images and specs. The next place to improve is the post-container-start path, where application-critical files should behave closer to a local overlay filesystem.

Follow the project here: https://github.com/cloudpilot-ai/hermes

クラウドでのスマートな節約、
数分で無料で始める

30分のデモで、CloudPilot AIがクラウドコストを削減しながら効率性を高める方法をご紹介します。

デモを予約して今すぐ始める

Cta Image
Cta Image
Footer Logo

自動化されたクラウド節約を解放し、無駄を収益性に変えましょう。

SlackDiscordLinkedInXGithubYoutube
CloudPilot AI, Inc.
455 Market St, 19th Floor
San Francisco, California 94105
SOC 2 Type II compliant badge

Copyright © 2026 CloudPilot AI, Inc. All Rights Reserved.