Data Architect

Full Time, onsite
Boost
Wilayah Persekutuan Kuala Lumpur, Malaysia

Salary undisclosed

Apply on

Linkedn

Original

Simplified

The role will be responsible for planning, spearheading and implementing data blueprint and all design related work for structure and unstructured data related to Boost’s use cases and mainly to support Boost’s to incorporate unified data modelling across AI, Machine learning, and Analytics projects (Data Architect). On top of that, the role needs to support Data Engineers, who are responsible for designing, implementing, managing data pipelines, and ensuring optimal

performance.

Responsibilities:

Ability to plan, develop, and implement data blueprints and designs for Boost projects to incorporate unified data modeling across AI, Machine Learning, and Analytics projects.
Support the ongoing Centralized Data Platform initiative between Boost entities, aiming to create a single view of Boost users and merchants.
Ensure the cloud infrastructure is cost-effective and perform cost optimization to ensure that the annual cost is below BP targets.
Work closely with data analysts and data scientists to build aggregated data model tables that facilitate efficient dashboarding.
Collaborate with Boost’s Analytics team to ensure structured and unstructured data capturing and ingestion as per design requirements.
Manage and maintain data dictionary definitions, relationships, metadata, and dimensioning of big data servers/technologies, as well as governance practices.
Work with Boost’s IT and security architects to ensure compliance with data security and privacy policies.
Manage and lead end-to-end data lifecycle management activities and ensure consistency, quality, and availability in data management.
Research new technologies related to big data and accelerated processing, proposing updates to data management, data retrieval, backups, and storage technologies and standards.
Ensure enterprise storage and all other technical system components meet current and future projections.
Recommend and provide solutions to develop and enhance data frameworks.
Design, plan, manage, and implement data pipelines to incorporate structured and unstructured data sources used for AI, Machine Learning, and Analytics projects.
Automate manual work in the data pipeline to reduce manual effort and improve efficiency.
Manage ETL processes and ensure optimal performance by monitoring and advising on necessary infrastructure changes. Design ETL/ELT processes and pipelines from various big data sources into Data Lakes or Data Warehouses.
Work with the IT team, engineers, and Data Scientists to plan, extract, transform, and load large volumes of data. Liaise with various subject matter experts and team members to recommend ongoing improvements to methods and algorithms. Present findings and the rationale behind them in simple terms for the business.

Key Result Areas;

1. Improvement in Cloud infrastructure monthly costs.

2. Improvement in SLA for any new developments of data pipeline and Data Warehouse/DataMart.

3. Completion of Centralised Data Platform with required project timeline.

4. Building Aggregated data model tables to improve datalake performance and efficiency.

Requirements:

Bachelor’s Degree in engineering or computer science with a minimum of 5 years of experience in designing, developing and maintaining huge Data Warehouse and analytics projects.
A minimum of 5 years of experience with data warehouse design for RDBMS such as for Oracle, MS SQL, PostgreSQL and MySQL.
4 years or more of industry experience in software development, data engineering, business intelligence, data warehouse or related field with a track record of manipulating, processing, and extracting value from large multidimensional datasets from multiple sources.
3 years or more of experience with ETL tools , developing Spark ETL jobs, cloud technologies Data Modeling, and working with Business Intelligence systems and experienced in writing SQL scripts.
Minimum 2 years of experience working with cloud platforms (e.g., AWS, Azure, Google Cloud) in a data engineering capacity.
Experience with Service Oriented Architecture (SOA), web services, enterprise data management, information security, applications development, and cloud-based architectures.
Experience with Hadoop clusters, in memory processing, GPGPU processing and parallel distributed computing systems.
Experience building data pipelines using Kafka, Flume, and accelerated Stream processing.
Experience in migrating data from on-premises to the cloud.
Experience in designing NoSql, HDFS, Hive, HBASE Datamarts and creating data lakes.
Experience in languages like Java, Python and/or R on Linux OS.
Strong proficiency in PySpark for processing large datasets and optimizing performance in PySpark.
Experience with Telecommunications, Financial/Fintech, IoT, Data visualization and GIS projects.
Advanced knowledge of Big Data analysis and management with Apache tools such as Zookeeper, Pig, Hive, HBase, Solr and with Cloudera and Elastic Search.
Experience with Terraform or AWS CloudFormation is beneficial.
Experience with GitHub, Bitbucket, or similar Git-based technologies is beneficial.