Lead Software Engineer - Data Engineer
Aumni
Software Engineering, Data Science
Bengaluru, Karnataka, India
We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.
As a Lead Software Engineer at JPMorganChase within the Marketing Automation Platforms Team, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.
Job responsibilities
Lead, mentor, and grow a high-performing team of 5 – 7 engineers across multiple workstreams, fostering a culture of innovation, ownership, and technical excellence.
Set the technical vision and engineering roadmap for the Data Products platform, aligning with firmwide priorities.
Operate as a player-coach — providing hands-on architectural guidance while empowering the team to own and deliver independently.
Drive cross-functional collaboration with platform teams, domain Data Product Owners, AI/ML teams and governance teams.
Architect and own the end-to-end technical design of the Data Products Studio — a scalable, enterprise-grade platform that orchestrates the discovery, design, build, and productionization of data products from the CCB Data Lake and Snowflake.
Design the platform's AI/Agentic AI layer, leveraging intent agents, NLP Text-to-SQL, Knowledge Graphs (KAG), RAG, Vector Databases, and Agent-to-Agent (A2A) communication to enable intelligent, automated data product creation and natural language interaction with the data estate.
Establish and enforce architectural standards, design patterns, and engineering best practices across the team — ensuring scalability, security, resilience, and maintainability.
Lead the design and development of Agentic AI capabilities that power the Data Products Framework — including autonomous discovery agents that profile and recommend data product candidates, design agents that auto-generate data contracts and schema recommendations, build agents that generate and optimize data pipelines, governance agents that auto-apply entitlements based on data classification, and quality agents that detect anomalies, drift, and trigger self-healing remediation.
Architect the Agent-to-Agent communication layer enabling multi-agent orchestration across the data product lifecycle — from discovery through productionization.
Leverage RAG (Retrieval Augmented Generation) and Vector Databases to enable contextual, knowledge-grounded AI interactions with metadata, lineage, and data catalog information.
Implement NLP Text-to-SQL capabilities allowing business users to explore the CCB Data Lake and Snowflake using natural language, lowering the barrier to data product discovery.
Required qualifications, capabilities, and skills
Proven track record of architecting and delivering large-scale, enterprise-grade data platforms or frameworks from concept through production in a large corporate environment.
Deep hands-on expertise in Python, SQL, and at least one additional language (Java 17+, Spring, Boot), with strong system design and distributed systems knowledge.
Extensive experience designing, building, and optimizing ETL/ELT pipelines at scale, including batch and real-time data processing.
Strong proficiency in PySpark for distributed data processing, including DataFrame and Dataset APIs and Spark SQL.
Experience working with UI frameworks (React, Angular).
Extensive experience with AWS cloud services including S3, Athena, Glue, Lambda, Step Functions, IAM, KMS, and Terraform.
Basic knowledge of Snowflake (architecture, performance optimization, Tasks, Streams, Stored Procedures, Materialized Views, security model)
Experience designing and building AI/ML-powered platforms or applications, with working knowledge of LLMs, RAG architectures, Vector Databases, NLP, and agentic frameworks.
Deep understanding of data governance principles including metadata management, data lineage, access control (RBAC/ABAC), data classification, and policy enforcement.
-
Experience with Grafana or equivalent observability platforms for custom dashboards, APM, SLA monitoring and alerting
Carry out critical tech solutions across multiple technical areas as an integral part of an agile team