A
Job Vacancy for Data Engineer
Salary undisclosed
Checking job availability...
Original
Simplified
RESPONSIBILITIES Create and maintain optimal data pipeline architecture via scheduling and authoring tools such as Airflow & dbt. Wrangling large, complex data sets that meet functional / non-functional business requirements. Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability. Build the data infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL including Google, Microsoft & AWS cloud data/serverless technologies. Build analytics models that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics. Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs. REQUIREMENTS Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as query optimization. Strong analytic skills related to working familiarity with a variety of databases - MSSQL, MySQL, PostgreSQL, MongoDB. Build processes supporting data transformation, data structures, metadata, dependency and workload management. A successful history of manipulating, processing, and extracting value from large disconnected datasets. Knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores. Strong project management and organizational skills. Experience supporting and working with cross-functional teams in a dynamic environment. Experience using one or more of the following software/tools: Big data storage & transform tools: Hadoop, Spark, Kafka, Snowflake, BigQuery. Relational SQL and NoSQL databases. Data pipeline and workflow management tools: Luigi, Airflow. AWS cloud services: EC2, Labda, EMR, RDS, Redshift Stream-processing systems: Kafka, Spark-Streaming, etc. Object-oriented/object function scripting languages: Python, R.