You Are
As a Web Scraping focused Data Engineer, you will be responsible for extracting and ingesting data from websites using web crawling tools. In this role you will own the creation process of these tools, services, and workflows to improve crawl/ scrape analysis, reports, and data management.
The Opportunity
- Mitigate reputation risk thru AI driven Data Quality to ensure highest quality data and services are offered to clients
- Revenue generation thru New Business for Alternative Data
- Communicating with third party vendors on specific data requirements for web scraping
- Develop custom scripts and workflows using Python, SQL, and C# to automate data processing tasks.
- Transforming and manipulating raw complex data into structured and consumable format data
- Machine Learning and Quantitative Modeling
- Build anomaly detection model leveraging packages like Prophet or similar
- Build anomaly detection models for geospatial and other practices based on domain requirements.
This position description identifies the responsibilities and tasks typically associated with the performance of the position. Other relevant essential functions may be required.
What You Need
- Experience running large scale of web scrapes.
- Experience in analyzing web scraping requirements.
- Familiarity with techniques and tools for crawling, extracting and processing data (e.g. Scrapy, pandas, MapReduce, SQL, BeautifulSoup, etc.).
- Strong grasp of data modeling concepts to design and develop efficient data storage and retrieval systems.
- Minimum 4+ years of experience as a Data Engineering with a Master’s degree; or 5+ years with a Bachelor’s degree along with the relevant working experience.
- 2-3+ years Financial Industry experience.
- Experience working as a Data Engineer in a production environment.
- Experience working with a modern scalable Data Lake or Data warehouses in Snowflake.
- 5+ years of proficient experience working with programming languages such as Python, PySpark, SQL, Scala, Shell scripting etc.
- Experience understanding of the Spark Architecture is preferred.
- Preferred one or more Database experience (MySQL, Microsoft SQL Server, MongoDB, PostgreSQL)
- Experience working with containers and orchestration tools like (Docker, Kubernetes, Apache Airflow, CI/CD, etc.)
- Experience in promoting data ingestion pipelines by using CI/CD e.g., Jenkins.
- Excellent written and verbal communication, presentation skills.
- Experience working with one or more cloud platforms (Azure, AWS or GCP )
- preferred: Azure
- Experience working with distributed notebook environments like Databricks, Azure Synapse, etc.
- Experience working with Git, Azure DevOps.
- Understanding of Machine learning algorithms i.e., Anomaly detection
- Ability to work in Agile methodology.