Description

 Job Description:

 

5+ years of experience in data engineering, with at least 2-3 years of experience in machine learning engineering or deploying ML models in production.
Proven experience in building and maintaining scalable data pipelines, data warehouses, and infrastructure to support ML workflows.
 

Technical Skills

Proficiency in big data frameworks and tools such as Apache Spark, Hadoop, Kafka, and Airflow.
Advanced skills in data modeling, ETL processes, and data pipeline automation, with a focus on performance and scalability.
Experience with cloud platforms (AWS, GCP, Azure) and their data services, such as AWS Glue, Google BigQuery, or Azure Data Lake.
Strong programming skills in Python, SQL, and experience with data query optimization.
Familiarity with ML frameworks (e.g., TensorFlow, PyTorch, Scikit-Learn) and libraries for building and testing machine learning models.
Knowledge of containerization and orchestration tools (Docker, Kubernetes) for deploying and managing ML models in production.
 

Machine Learning Engineering Skills

Experience in feature engineering, data preprocessing, and building data pipelines to support ML training and inference.
Knowledge of MLOps best practices for continuous integration, deployment, and monitoring of ML models in production.
Familiarity with model lifecycle management tools such as MLflow, TFX, or Databricks to streamline ML workflows.
Strong understanding of data versioning, reproducibility, and monitoring of ML models to ensure model integrity over time.
Ability to work with structured and unstructured data, with hands-on experience in NLP, computer vision, or time-series data for machine learning applications.
 

Data Engineering Skills

Proficiency in data storage and warehousing solutions (e.g., Snowflake, Redshift, BigQuery) for scalable data architecture.
Understanding of data governance, quality, and security best practices, including data lineage and compliance with regulations.
Experience with data lake architecture and data partitioning strategies to support large-scale data analysis.
Ability to optimize data infrastructure for low-latency access and high throughput, especially for real-time ML applications.

Education

Any Graduate