Lead Infrastructure Engineer - Network Operation
Aumni
Job Description
Assume a vital position as a key member of a high-performing team that delivers infrastructure and performance excellence. Your role will be instrumental in shaping the future at one of the world's largest and most influential companies.
Job responsibilities
- Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
- Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
- Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
- Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
- Improve aspects of network products related to reliability related nonfunctional requirements such as logging, monitoring, observability, performance, scalability, capacity, resiliency, etc.
- Perform research and discovery on industry tools and lead build versus buy
- Collaborate with other network and software engineering teams to automate processes, reduce toil and modernize operations
- Participate in on-call rotation as an escalation contact for production issues
- Turn theory into practice, navigate through ambiguity to build a plan
- Accomplish common goals using SCRUM practices
Required qualifications, capabilities, and skills
- Bachelor’s Degree in Computer Science, Engineering, Mathematics or other related disciplines
- Minimally 5 years of site reliability engineering or related experience
- Minimally 3 years of network engineering or related experience.
- Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
- Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Familiarity with troubleshooting common networking technologies and issues
- Experience with one or more application performance management technologies (AppDynamics, Dynatrace, Riverbed SteelCentral, Prometheus)
- Ability to initiate and implement ideas to solve business problems
- Experience triaging and diagnosing issues in complex distributed architectures leveraging infrastructure and application telemetry
- Experience with one or more infrastructure automation technologies (Ansible, Terraform, Puppet, building APIs and services using REST, SOAP, etc.)