Sr Lead Site Reliability Engineer - Software Defined Infrastructure
Aumni
Job Description
We are seeking a highly skilled Sr. Lead Site Reliability Engineer to join our dynamic team. The ideal candidates will have deep technical expertise, a passion for problem-solving, a drive to enable operational excellence based on the product/tech lifecycle, a drive to innovate, and, ideally, an interest in advancing modern engineering practices and communities. You will play a critical role in developing, innovating around, and exploring resilient, fault tolerant, secure, compliant, yet technology / product team friendly software defined technology infrastructure oriented around the relevant principles of zero trust. You will collaborate with cross-functional teams to design, implement, and support systems that are resilient, secure, and scalable. Your expertise will help drive our business strategies and operational excellence.
Key Responsibilities:
- Technical Expertise: Apply deep technical knowledge to analyze complex systems, anticipate issues, and mitigate risks. Design, Engineer, and Test Reliability in the platform/infrastructure. Develop secure, high-quality code and review and debug code written by others.
- Collaboration: Work with other engineering teams to architect and iterate on designs, patterns, or changes required to resolve issues and modernize technology processes.
- Development: Create and deliver secure, high-quality prototype and production tests, monitors, contracts, and other pertinent capabilities.
- Leadership: Provide technical guidance and strategic direction to support the business and its technical teams. Mentor junior engineers and foster a culture of continuous learning and development. Provide technical guidance and strategic direction to support the business and its technical teams, contractors, and vendors.
- Mentorship: Advise junior engineers and technologists, fostering a culture of continuous learning and development.
- Innovation: Drive decisions that influence product design, application functionality, and technical operations. Implement site reliability principles and practices.
- Compliance: Execute work according to compliance standards, risk and security, and business objectives.
- Diversity and Inclusion: Contribute to a team culture of diversity, equity, inclusion, and respect.
Qualifications:
- Technical Skills: Advanced knowledge in Reliability Engineering, testing automation, service contracts and operations, telemetry, observability, programming languages (e.g., Python, Java), Infrastructure as Code (Terraform, Pulumi, Ansible), and other infrastructure technologies (e.g., hardware, databases, storage, identity, cloud infrastructure).
- Experience: 5+ years of hands-on experience in reliability engineering, distributed system design, kubernetes, general infrastructure platform management/operation, reliability engineering and operational stability.
- Cloud Expertise: Practical experience with cloud-native technologies and virtualization, with the ability to operate in and migrate across public and private clouds.
- Problem-Solving: Ability to tackle design and functionality problems independently with little to no oversight.
- Security: Experience with secure coding, AppSec, threat modeling, code testing, third-party cybersecurity controls/testing, threat/risk assessment, vulnerabilities/weaknesses, and penetration testing.
- Communication: Strong communication skills to work effectively with technical stakeholders and senior technology leaders.
Preferred Qualifications:
- Network Domain: Experience with network engineering, underlay and overlay topologies at scale, and Policy as Code frameworks/tooling (OPA, Kyverno).
- Certifications: Formal training or certification in software engineering or infrastructure engineering concepts.
- Start-up Experience: Experience in rapidly evolving start-up style organizations within a large regulated firm.
- Cross-Functional Knowledge: Drive to continue developing technical and cross-functional knowledge outside of the product.
Why Join Us:
- Innovative Environment: Be part of a team that values innovation and continuous improvement.
- Career Growth: Opportunities for professional development and career progression.
- Global Impact: Work on projects that have a global impact and contribute to the growth of a leading financial institution.
- Diverse Culture: Join a team that values diversity, equity, inclusion, and respect