Cloud Engineer
Apply on
Job Description
Data Engineer
Years of Experience – 8-12 years
Key competences:
- Data Engineer /Data Engineering Tech Lead
- Data Engineer with Cloudera and Azure Cloud experience
- Expertise on PySpark, Azure Synapse, Azure Data Factory, Hadoop, Hive
- Experienced in Batch and Realtime Data integration using Azure Cloud technologies
- Tertiary qualifications in a relevant discipline with relevant certifications in Microsoft Azure.
- Comprehensive knowledge of public cloud environments and industry trends.
- Significant experience supporting, designing, and developing public cloud solutions via
Infrastructure as Code, including Terraform and ARM.
- Extensive DevOps experience.
- The ability to communicate effectively and work collaboratively with diverse team members.
- Demonstrated experience in security hardening and testing.
- Proven ability in creating and updating accurate documentation.
- Excellent verbal and written communication skills.
Willingness and flexibility to work outside of standard office hours, and on weekends as required.
Specific activities required:
- Lead the implementation of infrastructure via code and provide strategic
advice/recommendations for the development and advancement of Microsoft Azure technologies
based on previous research on trends in public cloud environments.
- Integrate and automate the delivery of standardized Azure deployments, in conjunction with
orchestration products such as Azure DevOps with Terraform, Azure ARM templates and other
modern deployment technologies.
- Act as the escalation point for level three Azure related issues, providing technical support and
fault resolution, as well as guidance and mentoring of operational run teams, both locally and
remotely throughout the organization.
- Ensure the appropriate gathering of business requirements and their translation into
appropriate solutions.
- Maintain and deliver all related documentation for the design, development, build, and
deployment methods used, ensuring the source of control of all applicable code is stored and
managed properly.
- Provide guidance and assistance to al support teams.
- Provide complimentary support and leadership in hardening and security testing.
ETL - medallion architecture; common data pipeline development using framework or meta data;
how to handle data quality (lookup, check); data warehousing concepts ( star/snowflake;
fact/dim, surrogate key / primary key / Slowly Changing Dimension (SCD) - SCD I&II difference,
Normalization types ( 2NF vs 3 NF); how to identify Delta records/files
PySpark - Spark architecture; create Data frame from collection of data; remove duplicate value
from DF; PySpark vs TSQL difference including select, aggregate functions, union, limit, add new
column into DF, filtering; Window functions ( lead, lag, string replace , substring index); JOINS,
group by having count greater than/lower than; selecting data from multiple tables ; Case when
with multiple conditions are ok; Performance optimization ( how to mitigate shuffle and data
skew);
Hive - managed vs external table ; create DDL for external table ; Change the settings within a
Hive session; validate functions like Trim, Replace , concat , etc; How to establish JDBC connection;
What are the 3 primary complex Datatypes in Hive & difference
Azure Synapse - azure Subscription/Service principal/ tenant ID; Azure Synapse Pipeline
development, Azure Dedicated Pool SQL DDL/DML development , Azure Synapse Spark notebook
development;
ADLS Communication - Crip & confident, Video without background animation, Validate pyspark or SQL exp using candidate notebook via screen sharing
Scheduler - Oozie workflow for high level Orchestration & Scheduling , How/Where to create
spark application and how to invoke on this, validate event-based triggers , re-run from failure
point, notifications, and exception handling