- অভিজ্ঞতা
- যেকোনো
- বেতন
- —
- শূন্যপদ
- 1
- পোস্ট করা হয়েছে
- 3 ঘন্টা আগে
- Work mode
- অফিসে
- Resume
- Required to apply
Where you'll work
কাজের বিবরণ
Role Overview
Infosys is hiring a Site Reliability Engineer for its Mangaluru, Karnataka, India location. This is a full-time, on-site role focused on keeping systems reliable, scalable, and efficient across cloud and operational environments.
Core Responsibilities
- Build and maintain CI/CD/CT pipelines using tools such as Jenkins, Bamboo, Azure DevOps, or AWS CodePipeline.
- Work with infrastructure-as-code solutions like Terraform, CloudFormation, and Azure ARM to automate cloud infrastructure management.
- Create, operate, and refine monitoring and log-analysis systems using platforms such as AppDynamics, Datadog, Splunk, Kibana, Prometheus, Grafana, and Elasticsearch.
- Use AIOps platforms such as Dynatrace, Splunk, and ServiceNow to support incident handling, observability, and alert-noise reduction.
- Oversee infrastructure capacity and performance to support growth across public and private cloud environments.
- Set and enforce standards for system architecture, deployment practices, metrics, and operational activities.
- Track service availability and system health, and participate in incident response activities.
- Improve delivery speed, performance, and efficiency through automation, process improvements, postmortem analysis, and configuration reviews.
- Coordinate and communicate effectively across teams and functions within the organization.
- Troubleshoot and monitor production systems to maintain strong uptime and stability.
- Strengthen and evolve high-availability architecture and day-to-day operational processes.
- Integrate GenAI and AIOps capabilities to automate incident detection, root cause analysis, and resolution workflows, including self-healing scripts and intelligent runbooks.
- Use prompt engineering to improve the quality and usefulness of responses from AI-driven observability and automation tools.
- Apply cloud AI services such as AWS Bedrock, Azure OpenAI, and GCP Vertex AI to design smarter SRE solutions for cloud platforms.
- Have working knowledge of agentic AI approaches that can be applied in operations and support environments.
Required Background
- Hands-on experience with at least one high-level programming language such as Python, Ruby, or Go.
- Good understanding of object-oriented programming concepts.
- Practical experience with CI/CD/CT pipeline design and implementation.
- Strong familiarity with infrastructure automation and cloud operations tools.
- Experience using monitoring, observability, and log-management solutions for production support.
- Exposure to AIOps, GenAI, and AI/ML platforms used in operations workflows.
- Ability to manage production systems, support availability targets, and respond to incidents effectively.
Additional Information
This position emphasizes reliability engineering, cloud-scale operations, automation, and the adoption of AI-driven support practices. The role also calls for collaboration across the organization and a focus on continuous improvement in system uptime, performance, and operational efficiency.