Description

Automate data tasks on GCP.
•       Work with data domain owners, data scientists and other stakeholders to that data is consumed effectively on GCP.
•       Design, build, secure and maintain data infrastructure, including data pipelines, databases, data warehouses, and data processing platforms on GCP.
•       Measure and monitor the quality of data on GCP data platforms.
•       Implement robust monitoring and alerting systems to proactively identify and resolve issues in data systems. Respond to incidents promptly to minimize downtime and data loss.
•       Develop automation scripts and tools to streamline data operations and make them scalable to ensure accommodate growing data volumes and user traffic.
•       Optimize data systems to ensure efficient data processing, reduce latency, and improve overall system performance.
•       Collaborate with data and infrastructure teams to forecast data growth and plan for future capacity requirements.
•       Ensure data security and compliance with data protection regulations. Implement best practices for data access controls and encryption.
•       Collaborate with data engineers, data scientists, and software engineers to understand data requirements, troubleshoot issues, and support data-driven initiatives.
•       Continuously assess and improve data infrastructure and data processes to enhance reliability, efficiency, and performance.
•       Maintain clear and up-to-date documentation related to data systems, configurations, and standard operating procedures.

Qualifications we seek in you!
Minimum Qualifications / Skills
•       Bachelor’s or master’s degree in computer science, Software Engineering, Data Science or related field, or equivalent practical experience

Preferred Qualifications/ Skills
•       Proficiency in data technologies, such as relational databases, data warehousing, big data platforms (e.g., Hadoop, Spark), data streaming (e.g., Kafka), and cloud services (e.g., AWS, GCP, Azure).
•       Strong programming skills in languages like Python (numpy, pandas, pyspark), Java (Core Java, Spark with Java, functional interface, lambda, java collections), or Scala, with experience in automation and scripting.
•       Experience with containerization and orchestration tools like Docker and Kubernetes is a plus.
•       Experience with data governance(data plex), data security, and compliance best practices on GCP.
•       Solid understanding of software development methodologies and best practices, including version control (e.g., Git) and CI/CD pipelines.
•       Strong background in cloud computing and data-Intensive applications and services, with a focus on Google Cloud Platform.
•       Experience with data quality assurance and testing on GCP.
•       Proficiency with GCP data services (BigQuery; Dataflow; Data Fusion; Dataproc; Cloud Composer; Pub/Sub; Google Cloud Storage).
•       Strong understanding of logging and monitoring using tools such as Cloud Logging, ELK Stack, AppDynamics, New Relic, Splunk, etc.
•       Knowledge of AI and ML tools is a plus.
•       Google Associate Cloud Engineer or Data Engineer certification is a plus.
•       Experience in data engineering or data science on GCP.
 

Education

Any Gradutate