cki_tools.rpm_cache

S3-backed RPM download cache for Fedora mirrors

The rpm-cache consists of two AWS Lambda functions that provide a transparent caching proxy for RPM downloads from Fedora mirrors. Requests for .rpm files are cached in S3 and served via presigned URLs on subsequent hits; all other requests (repodata, etc.) are forwarded to the origin without caching.

Environment variable Secret Required Description
ORIGIN_HOST no no upstream mirror hostname, defaults to dl.fedoraproject.org
S3_BUCKET_NAME no yes S3 bucket for cached RPMs
UPLOADER_LAMBDA_ARN no yes ARN of the uploader Lambda function (handler only)
CACHE_EXTENSIONS no no space-separated cacheable extensions, defaults to .rpm
PRESIGNED_URL_EXPIRATION no no presigned URL lifetime in seconds, defaults to 3600
CKI_DEPLOYMENT_ENVIRONMENT no no deployment environment for Sentry tagging
ORIGIN_HEAD_TIMEOUT no no HEAD request timeout in seconds, defaults to 5
SENTRY_DSN yes no Sentry DSN for error reporting
LAMBDA_HANDLER no yes container image Lambda handler function name

Lambda functions

The rpm-cache image provides two Lambda handlers selected via LAMBDA_HANDLER:

  • cki_tools.rpm_cache.handler_lambda: API Gateway entry point that checks S3 for a cached copy, returns a presigned redirect on hit, and falls back to an origin redirect on miss (after triggering the uploader asynchronously).
  • cki_tools.rpm_cache.uploader_lambda: async worker invoked by the handler on cache miss; downloads the RPM from origin and uploads it to S3.

Cache behavior

The handler uses a HEAD-first strategy to minimise S3 egress costs:

  • .rpm requests (origin available): handler sends a HEAD request to origin first; if the origin returns 200, the client is redirected there directly (no S3 egress). If the RPM is not yet cached, the uploader is triggered asynchronously to populate the cache for future fallback.
  • .rpm requests (origin unavailable, cached): when the HEAD returns 404 or times out and the RPM exists in S3, the handler redirects to a time-limited presigned S3 URL.
  • .rpm requests (origin unavailable, not cached): handler redirects to origin as a last resort (the client will see the origin’s error).
  • Non-.rpm requests: handler returns 302 to origin (no caching).
  • Errors: any S3/Lambda error falls back to a 302 redirect to origin, so the cache is never in the critical path.

Request and response format

The handler receives API Gateway v2 (HTTP API) events with a {path+} catch-all route parameter. The path mirrors the origin URL structure:

  • Input: event["pathParameters"]["path"] – everything after the hostname, e.g. pub/fedora/linux/updates/44/Everything/x86_64/Packages/f/foo-1.0.fc44.x86_64.rpm
  • 302 redirect: all responses are redirects via the Location header, pointing to either the origin URL or a presigned S3 URL
  • 400: returned only when pathParameters or path is missing

All S3 or Lambda invocation errors fall back to a 302 redirect to origin, so consumers (dnf, rpm-lockfile-prototype) never see an error from the cache itself.

Deployment

  • Container image: quay.io/cki/rpm-cache
  • S3 key prefix: cached objects are stored under cache/<path> in the configured bucket
  • Deployment config: lives in the deployment-all repo as an Ansible playbook using the cki_aws_lambda role (same pattern as receiver)

Testing

  • Unit tests: python -m pytest tests/test_rpm_cache.py
  • Integration tests: inttests/images/rpm-cache/ (runs via CI image inttest job, or locally with tox -e image -- inttests/images/rpm-cache)

Design rationale

This module intentionally avoids depending on cki-lib to keep the Lambda image small (12 pip packages vs 41+). See the module docstring in cki_tools/rpm_cache.py for details.