Lead Site Reliability Engineer- Infrastructure Platforms
Aumni
This job is no longer accepting applications
See open jobs at Aumni.See open jobs similar to "Lead Site Reliability Engineer- Infrastructure Platforms" NEXT Frontier Capital.Job Description
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
Job responsibilities
- Participate in the engineering design/review process for the JPMC Wide Area Network (WAN), defining the Non-Functional Requirement and work with the various teams to implement the solution with the clear documentation
- Improve aspects of network products related to reliability related nonfunctional requirements such as logging, monitoring, observability, performance, scalability, capacity, resiliency, etc.
- Participate in the engineering design/review of nonfunctional requirements for Quantum Key Distribution, SD-WAN and SASE architectures
- Participate in the engineering design to support network segmentation practices
- Collaborates with technical experts, key stakeholders, and team members to resolve complex problems, automate processes, reduce toil and modernize operations
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
- Perform research and discovery on industry tools and lead build versus buy
- Participate in on-call rotation as an escalation contact for production issues
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability Engineering concepts and 5+ years of applied experience
- Professional level engineering knowledge of IP routing protocols such as OSPF, BGP, ISIS
- Professional level engineering knowledge of IPv4/v6, MPLS, segment routing.
- Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
- Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Experience with one or more application performance management technologies (AppDynamics, Dynatrace, Riverbed SteelCentral, Prometheus)
- Ability to initiate and implement ideas to solve business problems
- Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
- Experience triaging and diagnosing issues in complex distributed architectures leveraging infrastructure and application telemetry
- CCIE with service provider experience is preferred.
This job is no longer accepting applications
See open jobs at Aumni.See open jobs similar to "Lead Site Reliability Engineer- Infrastructure Platforms" NEXT Frontier Capital.