◆Posted Mar 31, 2026

Senior Site Reliability Engineer

Role Overview Were hiring a Senior Site Reliability Engineer to own and scale the infrastructure behind our courtroom transcription platform. This is not a routine ops role - youll work on high-availability Kubernetes clusters, manage complex deployments with ArgoCD, and ensure reliability for a system processing sensitive, real-time data. Youll collaborate with a small team of elite builders and be the go-to expert for keeping our platform robust, secure, and fast. Key Responsibilities - Deploy, manage, and optimize Kubernetes clusters in production environments. - Operate and maintain ArgoCD for GitOps-based deployments. - Troubleshoot and iron out performance, reliability, and scaling issues across our clusters. - Build and maintain observability (metrics, logging, alerting) to catch and resolve issues proactively. - Collaborate with backend and product teams to ensure smooth, reliable releases. - Define and enforce infrastructure best practices, focusing on security, scalability, and resilience. Qualifications - 10+ years of experience in production infrastructure, reliability, or DevOps roles. - Proven experience deploying and managing Kubernetes clusters at scale. - Experience maintaining CI/CD with GitHub actions. - Hands-on expertise with ArgoCD (setup, tuning, troubleshooting). - Solid foundation in Linux systems, networking, and container internals. - Experience with monitoring/alerting stacks (Prometheus, Grafana, Loki, etc.). - Comfortable diving into complex problems and quickly stabilizing systems. Bonus: - Experience with GCP. - Contributions to open-source infrastructure or reliability tooling.

Apply Now

Senior Site Reliability Engineer

More Remote Jobs