Data Engineer (Entity Resolution)

SquarePeg

SquarePeg

Data Science
United States
Posted on Jul 2, 2025
About SquarePeg

SquarePeg uses AI to screen and score tens of thousands of job applicants—fast, fairly, and at scale. Our platform ingests messy resume and job data from multiple systems and applies advanced ranking models to help recruiters get to inbox zero with confidence. Clean, deduplicated, and well-resolved data is core to everything we do.

We’re hiring a Data Engineer with deep entity resolution experience to help us improve how we match people to jobs—especially when the inputs are ambiguous, inconsistent, or incomplete.

What you'll do:

  • Build and scale data pipelines that ingest, clean, and resolve person, company, and job entities across disparate datasets (ATS exports, resumes, job descriptions, data sets, APIs)
  • Own our entity resolution layer: design logic for deduplication, disambiguation, and canonicalization of candidates and companies
  • Improve our internal identity graphs for people, companies, and job titles by integrating open and proprietary data sources
  • Implement and refine blocking strategies, fuzzy matching, and ML-based similarity scoring to improve match precision and recall
  • Work closely with product and Eng to test resolution accuracy and continuously tune performance for production workloads
  • Monitor data integrity and build systems to surface issues before they affect scoring or UX

What we're looking for:

  • 4+ years of experience in data engineering or applied data science, ideally working with large-scale B2B or recruiting datasets
  • Hands-on experience with entity resolution, including rule-based and ML-based approaches (e.g., record linkage, string similarity, embeddings, supervised matching models)
  • Proficiency in Python and SQL; experience with Spark, DuckDB, or similar frameworks is a plus
  • Strong understanding of data quality, normalization, and the challenges of real-world messy input data
  • Thoughtful engineering mindset: you write testable, maintainable code and think about edge cases before they bite

Nice to Have

  • Experience with recruiting/talent data (e.g., resumes, job postings, ATS data)
  • Familiarity with open-source tools like Splink, Dedupe, Scikit-learn, or Faiss for similarity matching
  • Experience working with skills taxonomies or job-title ontologies
  • Prior experience in a high-velocity startup environment

Why SquarePeg?

  • We’re solving one of the most painful and high-volume problems in hiring: figuring out which applicants are actually worth reading
  • You’ll work on a small, senior team with a bias for shipping, pragmatism, and deep tech
  • Your work will directly improve the quality of our applicant scoring, customer trust, and platform performance
  • Competitive compensation, early equity, and a remote-first culture that respects your time