Job Summary: We are looking for a talented and motivated a Data Engineer to join our team. The ideal candidate will have expertise in building, maintaining, and optimizing data pipelines and workflows. You will collaborate with cross-functional teams to drive the implementation of high-quality data solutions, working in an environment that leverages modern cloud architectures, big data technologies, and containerized environment.
Roles & Responsibilities:
• Design, develop, and maintain scalable ETL data pipelines using Dagster, NiFi, and Apache Spark to ensure reliable and timely data delivery.
• Implement data transformation workflows using DBT to ensure efficient data modelling and high-performance query execution.
• Leverage Python and libraries such as Pandas and NumPy for data manipulation, analysis, and building custom data transformations.
• Manage and monitor infrastructure with Prometheus and Grafana to ensure high availability and performance of data pipelines.
• Containerize applications and manage them using Docker and Kubernetes in a production environment.
• Architect, design, and manage cloud infrastructure on Google Cloud Platform (GCP), ensuring security, scalability, and cost optimization.
• Write complex SQL/T-SQL queries, stored procedures, and database transformation jobs to support business needs.
• Conduct complex ad-hoc data analysis for clients, ensuring data accuracy and actionable insights.
• Collaborate with data scientists, analysts, and business stakeholders to understand data needs and deliver solutions that drive decision-making
Proficiency in Python and libraries such as Pandas and NumPy for data analysis and manipulation
Experience with Dagster and DBT for orchestrating and managing data workflows
Familiarity with Apache NiFi and Apache Spark for ETL processes and big data transformations
Python Optimisation like threading & multi-processes will be preferable
Strong knowledge of Prometheus and Grafana for infrastructure monitoring and performance tuning
Hands-on experience with Docker and Kubernetes for container orchestration in a production environment
Cloud architecture experience, preferably with Google Cloud Platform (GCP), including GCP services like Big Query, Dataflow, and Cloud Storage
Expertise in database transformation, stored procedures, and creating/managing schedules in production environments
Experience writing and optimizing complex SQL queries for relational databases (e.g., SQL-Server, PostgreSQL, MySQL)
Ability to handle ad-hoc client analysis and provide meaningful insights from data
Bachelor’s or master’s degree in computer science, Data Engineering, or related fields
2-3 years of experience as a Data Engineer or in a similar role
Bachelor's Degree