Epicareer Might not Working Properly

Features

Career Guide

Jobs or company...

For Employer

Chaos Engineering Specialist

Full Time, onsite
Payment Network Malaysia
Kuala Lumpur Engineering - Software (Information & Communication Technology) Full time, Malaysia

Salary undisclosed

Apply on

JobStreet

Original

Simplified

A) SUMMARY OF RESPONSIBILITIES

We are looking for a skilled and passionate Platform Engineer with expertise in Chaos Engineering and resiliency testing. The ideal candidate will have a strong background in distributed systems, cloud infrastructure, and container orchestration. You will be responsible for designing, implementing, and managing chaos experiments to test the resilience of our platform. Your work will directly contribute to our platform's ability to withstand and recover from unexpected failures, ensuring continuous and reliable service for our clients.

B) KEY AREAS OF RESPONSIBILITIES

Develop and implement chaos engineering strategies to test the resilience of our platform infrastructure.

Design, execute, and automate chaos experiments using tools such as Gremlin, Chaos Mesh, Litmus, or similar.

Collaborate with platform engineering and DevOps teams to identify critical systems and components for testing.

Build and maintain a robust monitoring and observability framework to analyze the impact of chaos experiments.

Identify weaknesses in the current infrastructure and provide recommendations for improvement.

Integrate chaos engineering practices into CI/CD pipelines using GitOps tools like ArgoCD and Atlantis.

Contribute to the development and maintenance of Kubernetes clusters, AWS EMR, AWS MSK Kafka, and VSphere environments.

Utilize Terraform for infrastructure as code (IaC) to manage cloud resources.

Participate in on-call rotation and assist in incident management and root cause analysis.

Stay up to date with the latest trends and best practices in chaos engineering, resiliency testing, and cloud infrastructure.

C) FUNCTIONAL COMPETENCIES

Functional Competencies

Strong understanding of Kubernetes, Docker, and container orchestration.

Proficiency in AWS services, including EMR, MSK Kafka, and experience with VSphere.

Experience with infrastructure as code (IaC) tools, particularly Terraform.

Familiarity with GitOps practices and tools such as ArgoCD and Atlantis.

Hands-on experience with chaos engineering tools (e.g., Gremlin, Chaos Mesh, Litmus).

Solid understanding of distributed systems, microservices architecture, and cloud-native technologies.

Excellent problem-solving skills and a proactive approach to identifying and addressing potential issues.

Strong communication skills and the ability to work effectively in a collaborative team environment.

D) QUALIFICATIONS & EXPERIENCE

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

3+ years of experience in platform engineering, site reliability engineering (SRE), or DevOps roles, with a focus on chaos engineering.

APPLY

Similar Jobs

1d ago

Merchant Renewal Specialist

QASHIER SDN. BHD.

Kuala Lumpur Account & Relationship Management (Sales) Full time, Malaysia

Full Time, onsiteFull Time, onsite

Salary undisclosed

1d ago

Customer Support Specialist (SG)

QASHIER SDN. BHD.

Kuala Lumpur Help Desk & IT Support (Information & Communication Technology) Full time, Malaysia

Full Time, onsiteFull Time, onsite

Salary undisclosed

1d ago

Quality Compliance Specialist - Multiple Roles Open

Masimo.

Pasir Gudang, Malaysia

Full Time, onsiteFull Time, onsite

Salary undisclosed

1d ago

Senior Software Engineer, Fullstack

Data Science, Engineering

Petaling, Malaysia

Full Time, onsiteFull Time, onsite

Salary undisclosed