primary_skills(Must_have)
- Programming Skills: Proficiency in Python, Pyspark, and SQL is essential.
- Big Data Experience: Hands-on experience with Spark and relational databases such as Postgres is required.
- Orchestration Tools: Familiarity with orchestration technologies like Airflow, Kubeflow, and Microsoft Fabric is a plus.
- Data Engineering: Ability to build and support scalable data engineering pipelines, including ETL processes, for both batch and streaming data.
- Analytics and Monitoring: Good analytical skills for data exploration and experience in real-time monitoring dashboards and alerting systems.
- Distributed Systems: Knowledge and experience in working with distributed systems software development.
- Data Modeling: Demonstrated experience in data modeling for big data infrastructure.
- Performance Optimization: Critical experience in optimizing data loading and ingestion processes for efficiency.
- Developer Tools and Cloud Platforms: Familiarity with tools like GitHub, Docker, VSCode, and experience working on public clouds like AWS/Azure.
Any Graduate