As a Sr. Machine Learning Engineer on our GenAI applications team, you will help lead development of innovative generative AI products that address the needs of our constituents (students, alumni, faculty, researchers, staff, and community at large). This key technical leadership role requires hands-on expertise across the full machine learning lifecycle. In this role, you will collaborate with data scientists, product managers, and data engineers to operationalize machine learning models in production and manage the lifecycle of artificial intelligence algorithms on a variety of domains. You will develop and deploy novel approaches to optimize existing machine learning systems to maximize their business value.
You will also help us build and scale our GenAI application platform. This platform will be where GenAI application developers can share their data and code. As custodians of this platform, we intend to use the best practices in the field along with existing repositories to expedite the path from prototype for GenAI applications and unlock economies of scale. You will be highly influential in advancing our GenAI applications and guide teams towards impactful and ethical AI. We seek an expert who is eager to grow and disseminate GenAI model expertise across the organization.
Duties and Responsibilities:
• Architect, build, maintain, and improve new and existing suite of GenAI applications and their underlying systems.
• Automate machine learning pipelines, monitor performance and costs, and optimize models by using techniques such as LoRA/QLoRA.
• Establish reusable frameworks to streamline model building, deployment and monitoring. Incorporate comprehensive monitoring, logging, tracing, and alerting mechanisms.
• Build guardrails, compliance rules and oversight workflows into the GenAI application platform, such as establishing approval chains for model updates and staged rollout for production releases
• Develop templates, guides and sandbox environments for easy onboarding of new contributors and experimentation with new techniques
• Ensure development of user-facing applications in the GenAI application platform is easy and safe by enforcing rigorous validation testing before publishing user-generated models and implement a clear peer review process of applications
• Use your entrepreneurial spirit to identify new opportunities to optimize business processes, improve consumer experiences, and prototype solutions to demonstrate value.
• Work closely with data scientists and analysts to create and deploy new product features online and in mobile apps.
• Contribute to and promote good software engineering practices across the team.
• Mentor and educate team members to adopt best practices in writing and maintaining production machine learning code.
• Actively contribute to and re-use community best practices.
• Monitor, debug, track, and resolve production issues.
• Work with project managers to ensure that projects proceed on time and on budget.
• Collaborate with Technical Product Managers to ensure proper tracking of algorithmic performance KPIs and prioritize performance improvements based on effort and impact.
• Complete other responsibilities as assigned.
Required Skills and Qualifications:
• Minimum of seven years’ post-secondary education or relevant work experience
• Bachelor's degree in mathematics, physics, computer science, engineering, statistics, or an equivalent technical discipline.
• Minimum of five years’ software development experience with Python and SQL.
• Minimum of three years’ experience building pipelines to deploy NLP and deep learning models into production in a cloud environment
• Minimum three years’ experience using PyTorch, Tensorflow, or MXNet, along with optimizing code for GPU clusters
• Experience building advanced workflows such as retrieval augmented generation, model chaining, dynamic prompting, PEFT/SFT, etc. using Langchain and similar tools
• Experience establishing model guardrails and developing bias detection and mitigation techniques for AI applications using tools such as NeMo
• Experience with various embedding models and setting up and tuning vector databases to improve performance of semantic search and retrieval systems
• Understand the underlying fundamentals such as Transformers, Self-Attention mechanisms that form the theoretical foundation of LLMs
• Experience working with a variety of relational SQL and NoSQL databases, big data tools: Hadoop, Spark, Kafka; a Linux environment; (AWS).
• Knowledge of data pipeline and workflow management tools.
• Expertise in standard software engineering methodology, e.g., unit testing, test automation, continuous integration, code reviews, design documentation.
Bachelor's degree