Lead Site Reliability Engineer
Aumni
Job Description
Out of the successful launch of Chase in 2021, we’re a new team, with a new mission. We’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. Our team is key to our success. We’re people-first. We value collaboration, curiosity and commitment.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Accelerators Engineering team, you are the heart of this venture, focused on getting smart ideas into the hands of our customers. You have a curious mindset, thrive in collaborative squads, and are passionate about new technology. By your nature, you are also solution-oriented, commercially savvy and have a head for fintech. You thrive in working in tribes and squads that focus on specific products and projects – and depending on your strengths and interests, you'll have the opportunity to move between them.
While we’re looking for professional skills, culture is just as important to us. We understand that everyone's unique – and that diversity of thought, experience and background is what makes a good team, great. By bringing people with different points of view together, we can represent everyone and truly reflect the communities we serve. This way, there's scope for you to make a huge difference – on us as a company, and on our clients and business partners around the world
Job responsibilities
- Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance
- Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues
- Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team
- Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
- Utilize Infrastructure as code: use Terraform and GitLab CI/CD for automation, containerize our environments (Kubernetes, Helm charts), and leverage cloud technologies to meet our goals
- Expertly manage, configure and troubleshoot operating system issues, storage (block and object), networking (VPCs, proxies and CDNs), and administer high-availability Cockroach, PostgreSQL and Redis clusters
- Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations
- Evolves and debug critical components of applications and platforms
- Provides comprehensive and ongoing guidance, tools, and solutions to support the firms’ growth
- Makes significant contributions to JPMorgan Chase’s site reliability community via internal forums, communities of practice, guilds, and conferences
Required qualifications, capabilities, and skills
- Formal training or certification on site reliability culture and principles concepts and proficient advanced experience implementing site reliability within an application or platform
- Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
- Proven public or private cloud experience (GCP is our priority))
- Fluency in at least one programming language such as (e.g., Python, Java, Go)
- Extensive Kubernetes operational experience (ideally including Istio, ArgoCD)
- Proficiency in continuous integration and continuous delivery tools e.g., Jenkins, GitHub, Terraform, etc
- Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
- Experience with troubleshooting common networking technologies and issues
- Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines
- Ability to communicate data-based solutions with complex reporting and visualization methods
Preferred qualifications, capabilities, and skills
- Recognized as an active contributor of the engineering community
#ICBEngineering #ICBcareers