Description

Job Description:

Common data archetypes, writing and coding functions, algorithms, logic development, control flow, object-oriented programming languages, external libraries and how to collect data from different sources.

This includes having knowledge of scraping, application program interfaces, databases, and publicly available repositories.
Structured data, such as from relational database management systems, and spreadsheets; semi structured data, such as log files, Extensible Markup Language and JavaScript Object Notation; and unstructured data, such as text, video, audio, and images.
Relational databases and NoSQL databases, such as Apache Hadoop, Apache Spark and other MPP databases.
SQL-based querying of databases using joins, aggregations, and subqueries.
Open-source tools, including real-time data processing products, such as Apache Beam, Kafka, and Spark Structured Streaming; time series databases, such as Influx DB; relational databases, such as Postgres; graph databases, such as Neo4j; and software development environments, such as Git and GitHub.
Abstraction tools, such as Kubernetes.
Mastery of computer programming and scripting languages, such as Scala, Java or Python, as well as an ability to create programming and processing logic.
Experience with machine learning algorithms and automated machine learning to automate and build continuous learning data processing streams and pipelines.
Data warehousing tools and techniques, such as Apache Hive.
Knowledge of cloud platform particularly AWS is also needed

Education

Any Gradute