Site Reliability Engineer (SRE)
Riyadh, Riyadh Province, Saudi Arabia دوام كامل
كن أول من يتقدم بطلب
- خبرة
- 5+ yrs
- مرتب
- —
- الوظائف الشاغرة
- 1
- تم النشر
- • 4 قطع
- Work mode
- في المكتب
- Resume
- Required to apply
Where you'll work
المسمى الوظيفي
Role overview
This position focuses on keeping production environments highly available, fast, and dependable through cloud operations, automation, monitoring, and disciplined incident handling.
What you'll do
- Design, build, and maintain scalable AWS-based infrastructure using Terraform or CloudFormation.
- Set up and operate observability and monitoring platforms such as Prometheus, Grafana, Splunk, or Datadog.
- Respond to incidents, perform root cause analysis, participate in on-call rotations, and work with SLIs, SLOs, and error budgets.
- Automate recurring operational work to increase reliability, efficiency, and recovery speed.
- Support Kubernetes, Docker, CI/CD pipelines, runbooks, and ITIL-based operational processes.
Skills and experience needed
- Hands-on background in SRE, DevOps, production support, or cloud operations with AWS exposure.
- Working knowledge of Kubernetes, Docker, Linux, and core networking concepts.
- Ability to script in Python, Bash, or Go.
- Experience with monitoring platforms and incident resolution / RCA workflows.
- Familiarity with infrastructure as code, CI/CD tools, and enterprise support systems is preferred.
Experience
A minimum of 5 years of experience in SRE, DevOps, cloud, or production support roles is required.