This role will focus on building and maintaining scalable data pipelines in Azure Databricks, transforming large volumes of automotive and marketing data into governed, analytics-ready Delta tables. The ideal candidate is highly skilled in PySpark, SQL, and Azure data services, with strong attention to detail and a passion for clean, reliable data. This position plays a key role in powering our MDM platform, building and maintaining key pipelines to the CRM Application, AI initiatives, and business intelligence solutions across Latcha’s enterprise data environment.

Key Responsibilities

• Design, build, and maintain scalable data pipelines in Azure Databricks to process structured and unstructured marketing and automotive data across Bronze, Silver, and Gold layers.
• Develop and optimize PySpark ETL workflows for ingesting data from external vendors (Experian, OEM, Dealer Tire, Meta, Basis, etc.) using Azure Blob, Volumes, and Delta tables.
• Implement robust data quality frameworks using Great Expectations and custom validation scripts to ensure data completeness, consistency, and accuracy.
• Collaborate with data architects and analysts to model dealer-centric and customer-centric data for reporting, analytics, and machine learning use cases.
• Automate and monitor pipeline executions via Databricks Jobs and Azure Data Factory; manage schema evolution, partitions, and performance tuning.
• Contribute to development of internal Python utilities and libraries for schema alignment, transformations, and reusable ETL logic.
• Work closely with the integrations and AI/ML engineering teams to operationalize gold-layer datasets for APIs, dashboards, and machine learning models.

Required Skills

• Advanced proficiency in PySpark and SQL (Databricks SQL, Delta Lake).
• Strong understanding of Azure Data Ecosystem – Databricks, Data Factory, Blob Storage, Volumes, Key Vault, and Unity Catalog.
• Hands-on experience building ETL pipelines using Delta architecture (Bronze → Silver → Gold).
• Proficiency with Git, CI/CD pipelines, and version control best practices.
• Ability to design efficient data models with partitioning, clustering, and schema enforcement.
• Experience working with JSON, Parquet, CSV, and other structured file types.
• Strong understanding of data governance, schema alignment, and error handling in distributed systems.

Nice-to-Have Skills

• Experience with Great Expectations, Soda, or similar data quality frameworks.
• Familiarity with FastAPI and exposing Delta tables via REST APIs.
• Knowledge of MLflow, feature stores, and model lifecycle management in Databricks.
• Experience with Power BI and Fabric Mirroring for analytics layer integration.
• Exposure to AI/LLM-based automation and RAG pipelines (preferred but not required).
• Understanding of Delta MERGE logic, schema evolution, and optimization (Z-ordering, caching).
• Experience with Azure DevOps or GitHub Actions for CI/CD automation.
• Working knowledge of Docker and containerized deployments.

Experience & Qualifications

• 3–6 years of experience in data engineering or analytics engineering, ideally within a Databricks + Azure environment.
• Bachelor’s degree in Computer Science, Information Systems, Data Engineering, or related field.
• Prior experience in marketing data, CRM, or automotive datasets is highly desirable.
• Strong communication skills and ability to collaborate in cross-functional teams.

Learn more about this Employer on their Career Site

Data Services Engineer