Principal Data Engineer
Dun & Bradstreet
Job Description
Job Summary: We are seeking an experienced Principal Data Engineer to help build the next generation of our identity graph and data platform. This role is focused on designing, developing, and optimizing large-scale data pipelines and systems that ingest, process, and unify complex datasets from diverse sources (web, mobile, AdTech, government, and proprietary data). This is a highly hands-on, technical role for someone who can quickly understand existing systems, operate independently, and deliver high-quality solutions at scale.
The ideal candidate is deeply analytical, detail-oriented, and experienced with building performant data pipelines and systems handling billions of records. Key Responsibilities: Design, build, and optimize scalable data pipelines and ETL/ELT workflows for large, complex datasets Design and implement foundational data architecture supporting identity resolution and ID graph systems Develop and enhance systems supporting identity resolution and ID graph construction (data ingestion, normalization, matching, and deduplication) Process and unify multi-source datasets (cookies, device IDs, behavioral data, third-party and proprietary data) Write efficient, testable, and maintainable code using Python and SQL for large-scale data processing Optimize data models, queries, and storage strategies for performance, scalability, and cost efficiency Build and maintain data validation, monitoring, and alerting systems to ensure data quality and reliability Troubleshoot, debug, and improve existing data pipelines and infrastructure Own and drive complex data problems end-to-end, from initial design through production deployment Make and influence key technical decisions related to data architecture, scalability, and system design Collaborate with data, platform, DevOps, and product teams to deliver scalable, production ready solutions Translate business and product requirements into practical, performant data solutions Document data pipelines, systems, and workflows clearly Continuously improve system performance, data quality, and pipeline resilience Contribute to building new capabilities that improve how customers understand and leverage data insights. Key Skills: 8–12+ years of hands-on experience in data engineering or large-scale data processing.
Proven experience building and maintaining production-grade data pipelines and distributed systems Demonstrated experience architecting and delivering large-scale data platforms or mission critical data systems Strong expertise in: o SQL and relational databases (Postgres, BigQuery, Redshift, etc.) o Python for data processing and analysis Experience with Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Cloud Functions) and/or AWS (S3, Redshift, EMR, RDS) Experience working with large-scale datasets (hundreds of millions to billions of records) Strong understanding of data modeling, partitioning, indexing, and query optimization. Experience with distributed data processing and parallelization techniques. • Experience moving large volumes of data across systems and architectures Familiarity with CI/CD, containerization, and orchestration tools (Docker, Kubernetes, GitHub Actions, etc.) Strong debugging and troubleshooting skills in complex data environments Experience with version control (Git) and Agile tools (Jira, Confluence, etc.). Highly analytical with strong attention to detail and a data-driven mindset.
Ability to hit the ground running, quickly understand systems, and deliver independently Comfortable working in a remote, fast-paced, and collaborative environment Proven ability to drive system design and implementation. Preferred: Experience with identity graphs, entity resolution, or record linkage systems. Background in AdTech, digital identity, cookies, or audience data platforms Experience with real-time or streaming data systems.
Familiarity with data quality, observability, and monitoring frameworks Experience with data visualization tools (Looker, Tableau, Power BI) Knowledge of data privacy, compliance, and governance considerations Experience with modern data platforms such as Snowflake and Databricks Exposure to AI/ML technologies, including experience working with or integrating agentic frameworks.