Data Engineer
Latinem Private Limited
Job Description
Position: Data Engineer - Agentic AI Data Pipelines & Automation
Location: Hyderabad (On-site)
Experience: 2-4 years
Must Have Skills: Data Engineering, ETL/ELT Pipelines, Python, SQL, Data Modeling, Data Warehousing, Process Automation, Workflow Orchestration, API Integration, Real-Time and Batch Processing, Azure Data Services, Data Quality and Governance
About the Role
We are looking for a capable Data Engineer to build and manage the data foundation required for agentic AI, intelligent automation, and software-led process transformation. The role will focus on turning fragmented, manual, and poor-quality data sources into reliable pipelines and reusable data assets that can support AI agents, workflow automation, analytics, and operational decision-making. The candidate will work closely with backend and AI teams to ensure that business processes have access to trusted, timely, and well-structured data in an Azure-first environment.
Key Responsibilities
- Design, build, and maintain scalable ETL/ELT pipelines that ingest, transform, validate, and publish data required for analytics, AI, automation, and operational workflows.
- Develop data pipelines for structured, semi-structured, and messy source systems, including ERP, CRM, support platforms, spreadsheets, flat files, APIs, logs, and internal repositories.
- Create reliable data foundations for agentic AI use cases by preparing clean, context-rich, governed, and accessible datasets for retrieval, workflow execution, and decision support.
- Build batch and near-real-time data processing solutions to support business-critical applications, operational dashboards, automation engines, and AI-driven process flows.
- Design and optimize data models, data marts, data lakes, and warehouse structures for performance, usability, and business traceability.
- Implement data quality checks, reconciliation logic, schema validation, monitoring, lineage awareness, and exception handling across pipelines.
- Integrate multiple enterprise systems and third-party sources so that fragmented business data can be consolidated into usable, trustworthy outputs.
- Work closely with AI engineers and backend developers to ensure downstream systems receive consistent, timely, and production-ready data.
- Support reporting, root cause analysis, and process improvement by exposing meaningful datasets and metadata to business and technology stakeholders.
- Troubleshoot pipeline issues, optimize performance, reduce failure points, and continuously improve data engineering standards and practices.
What You Should Bring
- Bachelor’s degree in Computer Science, Engineering, Information Systems, Data Engineering, or a related field.
- 2-4 years of experience in data engineering, pipeline development, and data platform implementation.
- Strong proficiency in Python and SQL, including data transformation, query optimization, and handling of large datasets.
- Hands-on experience designing ETL/ELT workflows and using orchestration tools such as Airflow, Azure Data Factory, or similar platforms.
- Experience with Azure data services such as Azure Data Factory, Azure Data Lake, Azure Synapse, Microsoft Fabric, Event Hub, or related tools.
- Strong understanding of data modeling, data warehousing concepts, partitioning, schema design, and pipeline performance optimization.
- Ability to work with low-quality, inconsistent, and fragmented enterprise data and convert it into structured and reliable information assets.
- Experience with API-based ingestion, file-based ingestion, incremental loads, data reconciliation, and production support for data pipelines.
- Good understanding of data governance, data quality, security, access management, and operational monitoring in enterprise environments.
- Strong analytical mindset, attention to detail, and ability to collaborate across business, AI, and engineering teams.
Preferred Skills
- Experience with streaming or event-based data processing using Kafka, Spark Streaming, or Azure Event Hub.
- Exposure to AI/ML data pipelines, RAG preparation flows, vectorization support processes, or data layers for agentic AI systems.
- Understanding of master data, reference data, metadata management, and enterprise integration patterns.
- Experience with BI and analytics tools such as Power BI, Tableau, or semantic models for business consumption.
- Familiarity with DevOps, CI/CD, testing practices, and version-controlled pipeline deployment.
- Exposure to financial, billing, customer operations, or service workflow data domains will be an advantage.