Functions/Responsibility Performed
Design & Develop Data Pipelines:
Orchestrate complex data pipelines using data integration tools like Informatica and Python, ensuring seamless data flow from various sources.
Leverage GCP Dataflow, Cloud Functions, and other cloud technologies to build scalable and resilient data ingestion and processing pipelines.
Implement robust CI/CD workflows using GitHub Actions and Argo CD to automate pipeline deployments and ensure consistency.
Monitor and manage production solutions. Optimize and fine-tune models for performance, accuracy, and scalability.
Document best practices and quality standards to be adherence during development of data science solutions.
Conduct review and provide feedback on data science work applications.
Manage & Analyze Data:
Work with diverse data sources, including relational databases (Oracle, SQL Server, MySQL, Postgres, Snowflake), big data platforms (Hadoop, Parquet files, BigQuery, Big Lake managed Iceberg), and streaming data (Kafka, GCP Dataflow/Proc).
Employ powerful compute engines like Hive, Impala, and Spark to analyze massive datasets and derive valuable insights.
Deliver Actionable Insights:
Collaborate with business stakeholders to understand their challenges and requirements.
Translate business problems into analytical frameworks and identify opportunities to address complex problems.
Build APIs and user-friendly interfaces to present data results and empower informed decision-making.
Drive Machine Learning Innovation:
Explore and implement Vertex AI models to generate quick insights and support business requirements.
Stay at the Forefront:
Continuously learn and adapt to emerging data technologies and best practices.
Contribute to the ongoing improvement of data infrastructure and processes.
Bachelor's degree