Devopie
Apply Now
Senior Site Reliability Engineer (Regina)
Description
What You’ll Do
Reliability Engineering
Define and manage SLIs, SLOs, and error budgets Reduce MTTD, MTTA, and MTTR through structured incident response Conduct blameless postmortems and drive preventative improvements Champion reliability in architectural reviews and production readiness Observability & Monitoring
Design actionable, symptom-based alerts (not noise) Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, Recent Relic, X-Ray, ADOT Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs) Ensure full observability coverage across critical paths Cloud & Infrastructure
Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail) Build resilient, multi-AZ and regionally replicated systems Implement autoscaling and fault‑tolerant architecture Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)
#J-18808-Ljbffr
Reliability Engineering
Define and manage SLIs, SLOs, and error budgets Reduce MTTD, MTTA, and MTTR through structured incident response Conduct blameless postmortems and drive preventative improvements Champion reliability in architectural reviews and production readiness Observability & Monitoring
Design actionable, symptom-based alerts (not noise) Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, Recent Relic, X-Ray, ADOT Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs) Ensure full observability coverage across critical paths Cloud & Infrastructure
Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail) Build resilient, multi-AZ and regionally replicated systems Implement autoscaling and fault‑tolerant architecture Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)
#J-18808-Ljbffr