Background
A regional telecom operator was building out multi-site edge infrastructure for latency-sensitive network functions and adjacent application workloads. The platform team needed an OpenShift posture that worked on bare metal at the edge, was managed from a central hub, and could be upgraded across a fleet of small clusters without on-site staff at each one.
Challenge
- Bare-metal OpenShift with SR-IOV and Multus for the network-function workloads.
- Multi-site lifecycle — every site gets its software state from a central control plane, with drift detection and self-healing.
- Low-touch operations — sites are remote and minimally-staffed; the operating model has to assume nobody is in the room.
- Unified observability — operations needed a single pane for fleet health, with per-site detail one click away.
Approach
A hub cluster, hosted centrally, runs Advanced Cluster Management and the fleet-level GitOps control plane. Each edge site is a small OpenShift cluster registered to ACM, pulling its desired state from internal GitLab via OpenShift GitOps in pull-mode. RHACS posture is uniform across sites and centrally tuned.
Network-function workloads use SR-IOV for line-rate networking, with Multus providing the secondary-interface attachment patterns the workloads expect. Cluster topology is optimised for hardware refresh cycles — control-plane separation, hardware-independent machine configs, and a documented re-image flow.
Observability stitches the fleet together: Datadog as the upper-layer pane, with OpenTelemetry instrumenting both platform and application workloads. ACM observability provides cluster-level metrics into the same surface. OADP runs scheduled backups to a central S3-compatible store.
Outcome
- Edge platform live across multiple sites under a single ACM-managed fleet
- GitOps drift detection and auto-remediation operational
- SLO-driven operations practice in place, with a documented upgrade orchestration runbook
- Per-site mean-time-to-recover materially below the prior bare-metal baseline
Engagement shape
Initial bring-up: approximately 22 weeks, sequenced across hub stand-up, first-site template, multi-site rollout, observability, and operations-practice handover. Ongoing quarterly upgrade orchestration available as a bounded managed service.