Job Description:
We are seeking a highly skilled Principal Data Engineer to join our team and lead the design, development, and implementation of our data infrastructure and solutions. In this pivotal role, you will build and maintain scalable, reliable, and efficient data pipelines, data warehouses, and data lakes. Collaborating closely with data architects, scientists, and analysts, you will ensure that our data is accessible, secure, and aligned with business objectives.
Key Responsibilities:
- Design and implement Kafka connectors to sync updates from source data stores.
- Create partitioned Kafka topics for syncing updates to destination data marts.
- Develop multiplexed data analytics workloads using Apache Flink for real-time data transformations and monitoring.
- Build dashboards with Datadog and CloudWatch to ensure system health and user support.
- Establish schema registries that promote data governance.
- Work collaboratively with your West Coast-based scrum team to manage documentation, backlogs, and code reviews.
- Design efficient database schemas with a focus on query access patterns.
- Maintain CI/CD pipelines using infrastructure-as-code methodologies.
- Migrate on-prem ETL jobs written in PHP to AWS Flink and Glue processes.
- Collaborate with QA Engineers to build automated test suites.
- Partner with end-users to resolve service disruptions and promote our data products.
- Oversee data quality and notify upstream data producers of any issues.
- Develop and maintain the overall data platform architecture strategy and implementation plans to support company initiatives.
- Lead the development of real-time data streaming solutions and establish data governance policies.
Qualifications:
- Basic understanding of genomic concepts and terminology.
- Experience with PyFlink and AWS Kinesis.
- Willingness to work PST hours (8:00 AM - 5:00 PM or 9:00 AM - 6:00 PM).
- Familiarity with key technologies, including Apache Kafka, Debezium, Python, Apache Flink, MySQL, AWS services (CDK, Terraform, Athena, Glue, Lambda), Docker, and JavaScript.
- Experience building data APIs and providing Data as a Service.
- Experience integrating with SaaS platforms such as SAP and Salesforce.
- Knowledge of PHP MVC frameworks (e.g., Symfony) is a plus.
- Experience with Atlassian products (Jira, Confluence, Bamboo) and system diagramming tools (Miro, LucidCharts, Visio).
- 6+ years of experience working with professional scrum teams or equivalent education.
- 4+ years of experience using Git version control.
- 3+ years of experience designing and indexing relational databases.
- 2+ years of experience building and operationalizing real-time data streams.
- Bachelor's or master's degree in computer science, data science, mathematics, life sciences, or equivalent experience.
Preferred Qualifications:
- AWS Associate Solution Architect certification.
- AWS Data Engineer certification.