Air-Gapped Deployment (Phase 11)
Dual-server architecture for managing hosts on isolated networks that have no internet access. One server (Collector) lives on the public side, mirrors upstream package and CVE feeds, and burns the result to optical media. A second server (Repository) lives on the air-gapped side, ingests the media, serves it as a local mirror, and lets managed agents update normally without ever touching the internet.
Architecture
Two servers, both running the same SysManage binary. The role: key in sysmanage.yaml picks which half of the air-gap pair this server runs as. License is the same Enterprise SKU on both — one license includes both engines.
- Collector — public-side. Loads
airgap_collector_engine. Captures package mirrors (apt-mirror, dnf reposync, pkg fetch), CVE / NVD snapshots, and CIS / STIG compliance feeds. Bundles the result into one or more ISO images, signs the manifest with ed25519, optionally burns to physical media. - Repository — private-side. Loads
airgap_repository_engine. Ingests optical media, verifies the signature against the configured public key, verifies per-file SHA-256 hashes, copies the payload into a local mirror tree, and generates native repo metadata so each managed agent's package manager can consume it. Also rewrites each agent's APT / DNF / pkg config to point at the local mirror. - Standard — the default. No air-gap; manages hosts directly online. This is what every pre-Phase-11 deployment was, and what new deployments default to when
role:is omitted fromsysmanage.yaml.
The two halves never share a database. Collector and Repository communicate exclusively through the disc — there is no network connection between them by design. The Repository records the manifest's signer_fingerprint + collector_iso_label so audit trails can correlate runs across the gap by inspecting both halves' databases after the fact.
The role: config setting
Add one key to /etc/sysmanage.yaml on each server:
# Collector — public side
role: collector
# Repository — private side
role: repository
# Standard (default)
role: standard
# (or omit the key entirely)
The startup config loader validates the value at boot; an unknown role aborts startup with a clear error. The role is also exposed via the unauthenticated GET /api/v1/server-info endpoint, which the web UI uses to render a chip in the header bar so operators can see at a glance which half of an air-gap pair they're looking at.
If role: is set to collector or repository but the corresponding Pro+ engine isn't loaded (typically a license problem), the chip turns red and a tooltip explains what's missing. Server keeps running in degraded mode so you can fix the license without losing access to the rest of the UI.
Collection cycle (Collector side)
Triggered by an operator (UI, API, or cron) on the public-side server. The engine emits a deployment plan that runs locally on the collector itself; it does not dispatch to managed agents.
- Request —
POST /api/v1/airgap/collector/collection/runswith a body listing each distro/version/repo to capture, plus flags for CVE and compliance snapshot inclusion and target media size (one DVD, BD-R, etc.). - Validation — distros are checked against the supported list (13 families in v0.1.0); version strings are constrained to a tight regex so they cannot smuggle shell metacharacters into the dispatch templates; iso_label is constrained the same way.
- Mirror — per-distro dispatch templates run apt-mirror (Debian / Ubuntu), dnf reposync (RHEL family), zypper download (openSUSE / SLES), pkg fetch (FreeBSD), apk fetch (Alpine), or distribution-specific equivalents. Each command has a 4-hour timeout ceiling; the agent's in-flight journal (Phase 11.6) covers the multi-hour run across WebSocket reconnects.
- CVE / compliance snapshot — if requested, NVD and vendor-advisory feeds are captured at the same instant so the resulting media set has a coherent point-in-time view of public vulnerability data.
- ISO build —
xorriso -as mkisofsbundles the staging tree into one ISO with the manifest embedded at/manifest.json. The post-build step computes the ISO's SHA-256 for the transfer log. - Sign — the manifest is canonicalized (sorted keys, no whitespace) and signed with the operator's ed25519 private key. The envelope carries the signature, the SHA-256 of the public key as a fingerprint, the algorithm identifier, and the manifest format version.
- Burn — optional.
POST /api/v1/airgap/collector/iso/burnwrapsgrowisofsagainst the configured optical-drive device. The burn-step timeout ceiling is 4 hours so slow drives don't trip a watchdog mid-write.
Note on multi-disc spanning. v0.1.0 ships the single-disc happy path — if the staging tree exceeds the configured media size, the burn step fails at runtime. Multi-disc bin-packing is tracked as a Phase 11.1 follow-up. The schema already supports it (the manifest table carries disc_index + disc_count) so it lands additively.
Ingestion cycle (Repository side)
Operator inserts the optical media into the air-gapped server, then triggers ingestion via the UI or API. The repository engine refuses to serve any byte until the signature and file hashes have verified.
- Mount — the engine emits a deployment plan that mounts the ISO read-only at
/mnt/sysmanage-airgap-ingest. - Verify signature — the manifest envelope is decoded; the engine refuses to ingest a manifest whose
format_versionis newer than its own software version (forward-compat fail-safe). The signature is verified against the public key configured in the repository'ssysmanage.yaml. Strict mode (the default) rejects the HMAC-SHA-256 fallback envelope that the collector produces only when cryptography isn't installed. - Verify file hashes — every file listed in the manifest is streamed through SHA-256 and compared against the manifest's expected hash. The first mismatch aborts the entire ingestion; the partial copy is rolled back.
- Copy — once verification passes,
rsync -a --delete-aftercopies the payload from the mount point to the local repository root at/var/lib/sysmanage/airgap-repo. - Generate native repo metadata — per distro family:
apt-ftparchive packages | gzip > Packages.gzfor APT,createrepo_c --updatefor DNF / zypper,pkg repofor FreeBSD,apk indexfor Alpine. - Unmount — the ISO is unmounted; the operator can eject the disc.
- Repoint agents — the engine emits a per-host plan that rewrites each managed agent's package-manager configuration:
/etc/apt/sources.list.d/sysmanage-airgap.listfor Debian / Ubuntu,/etc/yum.repos.d/sysmanage-airgap.repofor RHEL family / openSUSE,/usr/local/etc/pkg/repos/sysmanage-airgap.conffor FreeBSD. Existing agents pick up the change on their next package operation.
Per-distro install channels
For new child hosts created on the air-gapped network, sysmanage-agent itself needs to install from a substitutable channel — direct GitHub-release downloads can't be substituted with a private mirror. Phase 11.8 switched the per-distro agent_install_commands to consume the published upstream channels instead:
- Ubuntu / Debian — Launchpad PPA (
ppa:bceverly/sysmanage-agent). Repository'sbuild_agent_repoint_planrewrites/etc/apt/sources.list.d/to point at the local mirror before the install runs. - Fedora / RHEL / Rocky / Alma / Oracle Linux / CentOS Stream — Fedora Copr (
bceverly/sysmanage-agent). Substitutable via/etc/yum.repos.d/sysmanage-airgap.repo. - openSUSE / SLES — Open Build Service (
home:bceverly/sysmanage-agent). Substitutable via the same yum/zypper repo file pattern. - FreeBSD / OpenBSD / NetBSD — currently direct downloads. The repository engine can still rewrite
pkg.confto point at the local mirror, so the agent comes up via a different path than its first-boot script expects but still without internet access. A formal upstream ports / pkgsrc submission is tracked as a follow-up.
Compliance context across the gap
Phase 11.3 wires the repository's freshness state into compliance reports so operators can distinguish two failure modes that look identical without context:
- not_applied — a newer version is on the local mirror but hasn't been installed on this host yet. Fix locally; cheap.
- not_transferred — the public CVE feed (captured by the collector at last media transfer) lists a fix that isn't on the local mirror at all. Fix requires a new collection / burn / ingest cycle; expensive.
- current — the host's installed version matches the mirror's latest, and no public-side CVE has surfaced since last transfer.
Each repository tracks days since last ingest. Compliance UI surfaces a freshness label: current (≤ 7 days), stale (8–30 days), very_stale (> 30 days), or never. Use this to size the air-gap transfer cadence against your organization's tolerance for delayed patch availability.
Troubleshooting
Ingestion fails with "signature does not match manifest payload"
The repository's configured public key doesn't match the private key that signed this disc. Either the disc was tampered with in transit, or the operator regenerated the keypair on the collector without distributing the new public key to the repository. Re-export the public key from the collector and replace the value in the repository's sysmanage.yaml.
Ingestion fails with "manifest format_version exceeds repository's max"
The collector has been upgraded to a new SysManage version that changed the manifest format, but the repository hasn't been upgraded yet. Upgrade the repository server to the same version, then re-ingest. The format-version check is a deliberate forward-compat fail-safe — older repositories cannot be tricked into ingesting media with new fields they don't understand.
Role chip in the header bar is red
The role: in sysmanage.yaml is set to collector or repository, but the corresponding Pro+ engine isn't loaded. Most common cause: the Enterprise license isn't valid (expired, missing, or doesn't include the air-gap engines). Check the License page in Settings; if the license is fine, check /var/log/sysmanage/server.log for module-loader errors.
Multi-hour collection cycle gets killed mid-run
Should not happen as of Phase 11.6 — the agent now writes a per-plan in-flight journal at ~/.sysmanage-agent/inflight/<message_id>.json before each subprocess starts, with a 30-second heartbeat. If the agent process restarts (or the WebSocket bounces), the journal carries enough state for the agent to either re-attach to the still-live subprocess or emit a synthetic command_result so the server's mirror row clears. If you see a hung mirror anyway, file an issue with the contents of the inflight journal.