Web Scraping Engineer

UST
Toronto, ON, Canada

Description

You Are

As a Web Scraping focused Data Engineer, you will be responsible for extracting and ingesting data from websites using web crawling tools. In this role you will own the creation process of these tools, services, and workflows to improve crawl/ scrape analysis, reports, and data management.

The Opportunity

Mitigate reputation risk thru AI driven Data Quality to ensure highest quality data and services are offered to clients
Revenue generation thru New Business for Alternative Data
Communicating with third party vendors on specific data requirements for web scraping
Develop custom scripts and workflows using Python, SQL, and C# to automate data processing tasks.
Transforming and manipulating raw complex data into structured and consumable format data
Machine Learning and Quantitative Modeling
Build anomaly detection model leveraging packages like Prophet or similar
Build anomaly detection models for geospatial and other practices based on domain requirements.

This position description identifies the responsibilities and tasks typically associated with the performance of the position. Other relevant essential functions may be required.

What You Need

Experience running large scale of web scrapes.
Experience in analyzing web scraping requirements.
Familiarity with techniques and tools for crawling, extracting and processing data (e.g. Scrapy, pandas, MapReduce, SQL, BeautifulSoup, etc.).
Strong grasp of data modeling concepts to design and develop efficient data storage and retrieval systems.
Minimum 4+ years of experience as a Data Engineering with a Master’s degree; or 5+ years with a Bachelor’s degree along with the relevant working experience.
2-3+ years Financial Industry experience.
Experience working as a Data Engineer in a production environment.
Experience working with a modern scalable Data Lake or Data warehouses in Snowflake.
5+ years of proficient experience working with programming languages such as Python, PySpark, SQL, Scala, Shell scripting etc.
Experience understanding of the Spark Architecture is preferred.
Preferred one or more Database experience (MySQL, Microsoft SQL Server, MongoDB, PostgreSQL)
Experience working with containers and orchestration tools like (Docker, Kubernetes, Apache Airflow, CI/CD, etc.)
Experience in promoting data ingestion pipelines by using CI/CD e.g., Jenkins.
Excellent written and verbal communication, presentation skills.
Experience working with one or more cloud platforms (Azure, AWS or GCP )
preferred: Azure
Experience working with distributed notebook environments like Databricks, Azure Synapse, etc.
Experience working with Git, Azure DevOps.
Understanding of Machine learning algorithms i.e., Anomaly detection
Ability to work in Agile methodology.

Key Skills

Python SQL C# PySpark Scala Shell scripting MySQL Spark Azure AWS GCP Git

Education

Any Graduate

Back To Jobs

Posted On: 11-Dec-2024
Experience: 5+ years of experience
Openings: 1
Category: Web Scraping Engineer
Tenure: Flexible Position