About the job
Position: Senior Data Engineer (Databricks, PySpark, AWS, ML/AI, LLM)
Role Overview:
We are looking for a Senior/ Junior Data Engineer with expertise in Databricks, PySpark, AWS, and a deep understanding of machine learning (ML), artificial intelligence (AI), and Large Language Models (LLM). This role will be responsible for building and managing data pipelines, developing ML and AI-driven solutions, and optimizing our data infrastructure to support scalable analytics and intelligent applications.
Responsibilities:
Data Pipeline Development: Design, develop, and maintain complex data pipelines using Databricks and PySpark to ingest, process, and transform large datasets from multiple sources.
Data Warehousing and Modeling: Develop and maintain data models and warehouse architecture that support data analytics and business intelligence.
Machine Learning & AI Integration: Collaborate with Data Scientists and ML Engineers to design, deploy, and optimize ML/AI models, including LLMs, to enhance business processes and drive decision-making.
Cloud Infrastructure Management: Manage and optimize data infrastructure components on AWS, utilizing services such as S3 and Kinetics
Automation & Optimization: Implement automated data quality checks using DataBricks
Collaboration & Mentoring: Work closely with cross-functional teams, including Data Science, Product, and Software Engineering, to meet business requirements and mentor junior team members.
Technical Innovation: Stay updated on new technologies and methodologies in data engineering, ML, AI, and LLM, and drive initiatives to incorporate them into our technology stack where relevant.
Requirements:
Educational Background: Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
Experience: 2-5+ years of experience in data engineering, with a strong focus on DataBricks, PySpark, and AWS.
Technical Skills:
Advanced proficiency in Databricks and PySpark for ETL processes and data transformation.
In-depth experience with AWS services, particularly S3 and Kinetics.
Proficiency in SQL and relational database design.
Experience working with Machine Learning frameworks (e.g., TensorFlow, PyTorch) and deployment of models in a production environment.
Familiarity with AI and LLM technologies, including practical experience in fine-tuning, deploying, and optimizing LLMs.
Strong programming skills in Python; experience with Scala is a plus.
Soft Skills:
Excellent communication and collaboration skills, with the ability to convey complex technical concepts to non-technical stakeholders.
Proactive problem-solving abilities and a strong attention to detail.
Ability to mentor and guide junior engineers, fostering a collaborative team environment.
Preferred Qualifications:
Exposure to real-time data processing frameworks (e.g., Apache Kafka).
Bachelor's or Master's degree in Computer Science