DigitalOcean – Droplet Compute / AI Infrastructure
- Own the compute control plane for the Droplet platform: ~80 HTTP and gRPC endpoints across 11 services that orchestrate the VM lifecycle of a multi-tenant fleet of 2M+ active instances across ~20 global regions, with product-layer APIs behind operations like resize, snapshot, rebuild, and teardown. Serves 500K+ users at 650 RPS (56M req/day).
- Built capacity controls that keep the fleet ahead of demand: proactive quota and capacity-limit signals for enterprise customers (including GPU / AI-infrastructure capacity), surfacing risk before it impacts workloads.
- Led migration of Droplet infrastructure from direct PostgreSQL/CockroachDB access to a centralized gRPC control plane on Kubernetes, decoupling the control-plane services and improving fleet-wide agility, isolation, and fault-tolerance.
- Lead P0/P1 incident response across the compute estate: root-cause analysis, postmortems, and metrics-based alerting (Prometheus, Grafana, OpenSearch) that measurably reduced MTTR.