Hermes EKS Benchmark: 85% Faster Image Pull and 34% Faster First HTTP 200, No Image Rebuilds

June 22, 2026

Hermes EKS Benchmark: 85% Faster Image Pull and 34% Faster First HTTP 200, No Image Rebuilds

CloudPilot AI

Engineering Team

Publish Date

June 22, 2026

Background

For Kubernetes workloads with large images, cold start has two parts: the node prepares the image, then the application starts. Before a service can load code, plugins, configuration, models, or indexes, containerd first has to make the image filesystem available.

With the traditional containerd overlayfs path, the node downloads and unpacks the image before the container can start. Lazy loading changes that sequence. Instead of requiring the whole filesystem to be fully local before process start, the image is mounted through an index and files are fetched when they are accessed.

Hermes makes this workflow policy-driven. Application teams keep using their original OCI images. They do not need to rebuild images, publish converted tags, change Dockerfiles, or update Pod image references. The platform team defines a HermesPolicy, and Hermes prepares the lazy-loading artifacts in the cluster.

The previous Hermes benchmark focused mostly on Pod Ready and image pull behavior. This test goes one step further: it measures the path from a Pod being scheduled on a fresh EKS node to the first successful HTTP 200 response from the service.

This keeps the measurement close to the part Hermes can affect: image pull, mount, container start, and the application startup that follows. First HTTP 200 still includes runtime initialization, library and configuration reads, plugin loading, model or index loading, and service bootstrap, so the percentage gain is smaller than the raw image-pull gain.

What We Tested

We tested three HTTP workloads with large public images. They happen to be Java-based, but Hermes runs below the application runtime: it changes how the image filesystem is prepared and read, not how Java starts.

Solr: docker.io/library/solr:10.0.0
OpenSearch: docker.io/opensearchproject/opensearch:2.19.1
Spark master: docker.io/apache/spark:python3-java17

The Pods used the upstream public images, pinned by digest. No Hermes-specific converted image tag was introduced.

We compared two paths:

overlay: standard containerd overlayfs image pull and unpack.
Hermes: the same workload image, with the Hermes lazy-loading path on the target node.

Each service and variant ran three times, for a total of 18 target runs.

Test Environment

The benchmark ran on EKS:

Kubernetes: v1.34.9-eks-93b80c6
Node OS: Amazon Linux 2023
Runtime: containerd 2.2.4
Target instance type: m6i.large
Platform: linux/amd64
Node isolation: each target run used a fresh target node

For every run, the benchmark removed the previous target node before starting the next run. This avoids measuring a local image cache left behind by a previous Pod.

Experiment Steps

Step 1: Install the Hermes Controller and CRD

The controller watches HermesPolicy resources and Pods. When it observes a matching image, it builds and caches the lazy-loading artifacts.

kubectl apply -f deploy/hermespolicy-crd.yaml
kubectl apply -f deploy/hermes-controller-eks.yaml
kubectl -n hermes-system rollout status deploy/hermes-controller

Step 2: Configure Two Target NodePools

The test used two NodePool/NodeClass pairs:

an overlay NodePool: standard AL2023/containerd overlay path.
a Hermes NodePool: AL2023/containerd with the Hermes daemon installed through EC2NodeClass userData.

The Hermes-enabled NodeClass installed the daemon during node bootstrap:

export HERMES_INSTALLER_URL="..."
export HERMES_DAEMON_URL="..."
export HERMES_DAEMON_SHA256="..."

curl -fsSL "${HERMES_INSTALLER_URL}" | \
  HERMES_DAEMON_URL="${HERMES_DAEMON_URL}" \
  HERMES_DAEMON_SHA256="${HERMES_DAEMON_SHA256}" \
  bash -s --

Both NodePools used the same instance type and capacity type.

Step 3: Create the HermesPolicy

The policy selected the three benchmark workload images:

apiVersion: hermes.cloudpilot.ai/v1alpha1
kind: HermesPolicy
metadata:
  name: benchmark-images
spec:
  imageSelectors:
    - imageRegex: '.*(solr:10\.0\.0|opensearchproject/opensearch:2\.19\.1|apache/spark:python3-java17).*'
  platforms:
    - linux/amd64

Step 4: Wait for Artifacts to Be Ready

Hermes artifact preparation is outside the target Pod startup timing window. It is a controller-side preparation step for matching images.

In this run, preparation took:

Solr: 58.918s
OpenSearch: 2m44.999s
Spark: 1m4.493s

The target workload benchmark started after the artifacts reached Ready.

Step 5: Run Fresh-Node Target Pods

For each service and variant, the benchmark:

Deleted any previous target Pod and Service.
Removed the previous target node.
Waited for the target node to be fully gone.
Created a watcher Pod on a stable node.
Created the target workload Pod on a fresh target node.
Recorded image pull, container start, Pod Ready, and first HTTP 200 timing.

The first HTTP 200 timestamp was recorded after the watcher received a successful HTTP response, not before sending the request.

Step 6: Workload YAML and HTTP 200 Probe

The same application YAML was used for both variants. The only scheduling difference was the target node pool: hermes-overlay for the overlay run and hermes-startuplocal for the Hermes run.

Each target Pod was exposed by a headless Service. The Service selected one run-specific Pod through app, variant, and e2e-run labels:

apiVersion: v1
kind: Service
metadata:
  name: opensearch-overlay-r1-20260622025902-ope
  namespace: hermes-e2e
spec:
  clusterIP: None
  publishNotReadyAddresses: true
  selector:
    app: opensearch-e2e
    variant: overlay
    e2e-run: 20260622025902-opensearch-overlay-r1
  ports:
    - name: http
      port: 9200
      targetPort: http

publishNotReadyAddresses: true lets the watcher resolve and call the Pod through Service DNS before the Pod is marked Ready. Without this, the Service would usually hide the endpoint until readiness succeeds, which would make it harder to measure the first successful HTTP 200.

Here is the OpenSearch target Pod used in the benchmark. The Hermes run used the same container spec and changed only the run labels/name and karpenter.sh/nodepool selector:

apiVersion: v1
kind: Pod
metadata:
  name: opensearch-overlay-r1-20260622025902-ope
  namespace: hermes-e2e
  labels:
    app: opensearch-e2e
    variant: overlay
    e2e-run: 20260622025902-opensearch-overlay-r1
spec:
  restartPolicy: Never
  terminationGracePeriodSeconds: 1
  nodeSelector:
    kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    karpenter.sh/nodepool: hermes-overlay
  containers:
    - name: opensearch
      image: docker.io/opensearchproject/opensearch:2.19.1@sha256:7ad3c515e43fb1642ddf2181dfd03402e42e85a16030e098ed1f3fc1404d7e89
      imagePullPolicy: IfNotPresent
      args:
        - opensearch
        - -Ediscovery.type=single-node
        - -Ehttp.host=0.0.0.0
        - -Etransport.host=127.0.0.1
      env:
        - name: OPENSEARCH_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        - name: DISABLE_SECURITY_PLUGIN
          value: "true"
      ports:
        - name: http
          containerPort: 9200
      readinessProbe:
        httpGet:
          path: /_cluster/health?local=true
          port: http
        periodSeconds: 1
        timeoutSeconds: 1
        failureThreshold: 900
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
          ephemeral-storage: "8Gi"
        limits:
          memory: "3Gi"

Solr and Spark used the same Pod shape. Their workload container blocks were:

# Solr
- name: solr
  image: docker.io/library/solr:10.0.0@sha256:c5d3e51740f81612dac200c91908c253bf8302dca330874d6dcef23dacafc723
  imagePullPolicy: IfNotPresent
  env:
    - name: SOLR_HEAP
      value: "512m"
  ports:
    - name: http
      containerPort: 8983
  readinessProbe:
    httpGet:
      path: /solr/admin/info/system
      port: http
    periodSeconds: 1
    timeoutSeconds: 1
    failureThreshold: 900
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
      ephemeral-storage: "4Gi"
    limits:
      memory: "3Gi"

# Spark master
- name: spark-master
  image: docker.io/apache/spark:python3-java17@sha256:6fb854a580e552290a21d7b9c6214f2d8840733e63a90ff687a2ffce80f45ef9
  imagePullPolicy: IfNotPresent
  command:
    - /opt/spark/bin/spark-class
  args:
    - org.apache.spark.deploy.master.Master
    - --host
    - 0.0.0.0
    - --port
    - "7077"
    - --webui-port
    - "8080"
  ports:
    - name: web
      containerPort: 8080
    - name: master
      containerPort: 7077
  readinessProbe:
    httpGet:
      path: /
      port: web
    periodSeconds: 1
    timeoutSeconds: 1
    failureThreshold: 900
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
      ephemeral-storage: "4Gi"
    limits:
      memory: "4Gi"

The HTTP 200 timestamp came from a separate watcher Pod running on a stable managed node. It repeatedly called the Service DNS name and printed first_http_200_ns only after receiving HTTP status 200:

apiVersion: v1
kind: Pod
metadata:
  name: watch-opensearch-overlay-r1-20260622025902-ope
  namespace: hermes-e2e
  labels:
    app: opensearch-first-200
    variant: overlay
    e2e-run: 20260622025902-opensearch-overlay-r1
spec:
  restartPolicy: Never
  nodeSelector:
    eks.amazonaws.com/nodegroup: eks-spot-20260620152041180300000015
  containers:
    - name: watcher
      image: python:3.12-alpine
      imagePullPolicy: IfNotPresent
      env:
        - name: TARGET_URL
          value: http://opensearch-overlay-r1-20260622025902-ope.hermes-e2e.svc.cluster.local:9200/_cluster/health?local=true
      command:
        - sh
        - -lc
        - |
          python - <<'PY'
          import os
          import time
          import urllib.request

          url = os.environ["TARGET_URL"]
          print(f"watcher_start_ns={time.time_ns()}", flush=True)
          while True:
              try:
                  with urllib.request.urlopen(url, timeout=0.5) as response:
                      if response.status == 200:
                          print(f"first_http_200_ns={time.time_ns()}", flush=True)
                          break
              except Exception:
                  pass
              time.sleep(0.25)
          PY

Results

Service	Overlay pull	Hermes pull	Pull reduction	Pull delta	Overlay scheduled -> HTTP 200	Hermes scheduled -> HTTP 200	HTTP 200 reduction	HTTP 200 delta
OpenSearch	20.371s	2.998s	85.28%	-17.373s	37.943s	30.375s	19.95%	-7.568s
Solr	9.053s	2.584s	71.46%	-6.469s	18.572s	14.919s	19.67%	-3.653s
Spark	15.763s	3.811s	75.82%	-11.952s	20.191s	13.304s	34.11%	-6.887s

OpenSearch showed the biggest pull-time change, from 20.371s to 2.998s. Spark showed the biggest first-HTTP-200 change, from 20.191s to 13.304s.

What These Numbers Mean

Hermes had the clearest effect on image pull. Across the three workloads, image pull time dropped by 71% to 85%.

First HTTP 200 improved by 20% to 34%. That number is lower than the pull-time improvement because first HTTP 200 includes more than image pull:

container creation
process start
runtime initialization
code, library, configuration, or model reads
plugin loading
service readiness checks

For OpenSearch, the overlay baseline from Pod scheduled to first HTTP 200 was about 38 seconds. Hermes reduced pull time from about 20 seconds to about 3 seconds, and the service reached first HTTP 200 about 7.6 seconds sooner. The remaining time is OpenSearch startup work after the container has already started.

That is why the table shows both image pull and first HTTP 200. Pull time shows the direct image-path effect; first HTTP 200 shows what reached the service boundary.

Conclusion

The benchmark showed a clear image-pull reduction without changing application image references or rebuilding images.

In this EKS test, OpenSearch pull time dropped from 20.371s to 2.998s. Spark first HTTP 200 dropped from 20.191s to 13.304s.

The result is not Java-specific. Hermes works at the image filesystem layer, so the same mechanism applies to any OCI image. These Java-based workloads are just a concrete test set with real HTTP readiness behavior.

Operationally, the workflow stayed simple: the platform prepared the lazy-loading artifacts, while the Pods kept their normal images and specs. The next place to improve is the post-container-start path, where application-critical files should behave closer to a local overlay filesystem.

Follow the project here: https://github.com/cloudpilot-ai/hermes