Site Reliability Engineer III
Aumni
Site Reliability Engineer III
Job Information
- Job Identification 210663092
- Job Category Software Engineering
- Business Unit Corporate Sector
- Posting Date 09/03/2025, 01:20 PM
- Locations MAGMA,UNIT-1,PHASE-IV,SY NO.83/1,PLOT NO 2, GR Floor TO 2 Floor and 5 Floor TO 16 Floor,Basement 1,2, Hyderabad, IN-TG, 500081, IN
- Apply Before 10/03/2025, 06:00 AM
- Job Schedule Full time
Job Description
As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will collaborate with engineering, support, and operations teams to maintain and improve the reliability of mission-critical applications. You’ll participate in incident management, troubleshooting, and continuous improvement, and help implement automation and monitoring solutions. On-call rotation is part of the role, requiring effective action during production incidents and a commitment to operational excellence. You’ll share knowledge, follow best practices, and contribute to a culture of learning and innovation. We value team players who communicate clearly, solve problems proactively, and focus on customer needs.
Job responsibilities
- Design, develop, and operate solutions for application reliability, monitoring, and automation.
- Execute incident response, troubleshooting, and root cause analysis to resolve production issues and improve system stability.
- Build and maintain CI/CD pipelines using Jenkins (including global libraries), and implement infrastructure as code with Terraform.
- Develop and support containerized applications using Docker and Kubernetes, ensuring robust deployments and scalability.
- Implement and maintain observability solutions using tools such as Grafana, Prometheus, Splunk, and OpenTelemetry.
- Collaborate with engineering and support teams to drive continuous improvement and operational excellence.
- Participate in on-call rotation, responding to production incidents and ensuring timely resolution.
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
- Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
- Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, OpenTelemetry).
- Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
- Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
- Willingness to participate in on-call rotation and respond to production incidents.
- Ability to break down issues, document solutions, and communicate effectively with team members and customers.
Preferred qualifications, capabilities, and skills
- Familiar in banking, fintech, or regulated environments.
- Participation in game days or chaos engineering.
- Interest in sharing knowledge and best practices with peers.
Similar Jobs