Operations

Operations & Managed Services

Day-2 platform operations, SRE practice bring-up, observability, incident response — and optional managed services where you need them.

Most platforms fail not at install but in the boring months that follow — when upgrades slip, alerts go stale, capacity quietly fills, and the runbook nobody updated stops matching the system. CompTech Lab brings up the operations practice that survives those months.

What we deliver

  • Day-2 operations bring-up. Operating model, on-call structure, escalation paths, alert hygiene, runbook patterns, incident postmortem culture. Tailored to your team’s size and shape — not copy-pasted from another customer.
  • Observability. Three-pillar stacks (metrics, logs, traces) wired through the platform and applications. Datadog if you have it; OpenTelemetry + Prometheus + Loki + Grafana + Tempo when you don’t. Kiali for service mesh visibility where Istio is in play.
  • SRE practice. SLO definitions that match the business reality (not 99.999% on everything), error budgets that mean something, blameless postmortems that produce action items that close.
  • Capacity and cost optimisation. Right-sizing, request/limit tuning, autoscaling baselines, cluster autoscaler posture, cloud-cost guardrails, FinOps reporting integrated with your finance team’s reality.
  • Upgrade orchestration. OpenShift, operator catalog, cluster add-ons, application workloads — sequenced, rehearsed in staging, and executed in production with a documented rollback plan.
  • Backup, DR, and replication drills. OADP for namespace-level backup, Velero for raw Kubernetes resources, snapshot replication, full DR drills with documented RPO/RTO numbers.
  • Optional managed services. When your team is not yet ready to take operations long-term, we can run a defined scope (specific clusters, specific workloads, specific hours) as a bridge engagement. Not as a permanent dependency.

How we work

Operations engagements are explicit about what the platform team owns at the end. The default is a clean handover with runbooks and tooling, not a long-running services contract. Managed services are bounded and time-boxed.

Engagement shape

Operations practice bring-up: 8–12 weeks, alongside or following a platform engagement. Observability stand-up: 4–8 weeks. Bridge managed services: 3–12 months, time-boxed at contract.