Epicareer Might not Working Properly
Learn More

Site Reliability Engineer

Salary undisclosed

Apply on


Original
Simplified

Permanent Position

Job Description: Site Reliability Engineer (SRE)

Experience:- 10+ years

Job Location - KL Malaysia

Position Overview

We are seeking a skilled Site Reliability Engineer (SRE) to support and enhance the availability, reliability, and performance of applications on Azure Cloud and associated platforms. This role will involve close collaboration with service delivery teams to ensure operational excellence and continuous improvement in application reliability.

Key Responsibilities

- **Application Support**: Oversee the availability, reliability, and performance of applications on Azure Cloud, ensuring business operational SLAs are met.

- **Incident Management**: Work with the service delivery function to manage incidents, operational costs, service improvements, and ongoing application health monitoring.

- **DevOps Integration**: Create a link between development and operations by applying a software engineering mindset to application reliability and instilling this culture in agile development teams.

- **Continuous Improvement**: Maintain and improve the resiliency of core applications and infrastructure platforms through a continuous improvement backlog.

- **Automation & Standardization**: Enhance platform infrastructure through automation and standardization.

- **Operational Excellence**: Drive best practices for secure, high-performing, resilient, efficient infrastructure, and cost-optimized applications and workloads.

(Key Tasks)

- **Risk Management**: Maintain existing automation infrastructure to identify risks related to performance, reliability, capability, and scalability.

- **DevSecOps Automation**: Manage deployment, maintenance, and enhancements of DevSecOps automation workflows, collaborating closely with developers.

- **Root Cause Analysis**: Conduct root cause analyses on incidents and implement code fixes for permanent remediation.

- **Documentation**: Document automation processes and runbooks across all environments and technical administration tasks.

- **Culture Change**: Champion the adoption of continuous improvement in application reliability and embed these practices in day-to-day development.

Who We Are Looking For

- **Broad Skillset**: A well-rounded engineer with an interest in service reliability, automation, monitoring, scalability, and high-availability systems.

- **Analytical Mindset**: Natural curiosity, initiative, and a willingness to think outside the box to solve problems using engineering approaches.

- **DevSecOps Experience**: Proven track record of end-to-end design and implementation in DevSecOps practices and cloud infrastructure management using Infrastructure as Code (IAC).

- **Support Function Experience**: Experience in executing support functions for customer-facing products and services, handling incidents under a service management framework and agile methodologies.

- **Modern Architectures**: Understanding of modern DevSecOps architectures and the utility of Docker and containers.

- **Configuration Management**: Proficiency with tools such as Chef, Ansible, Terraform, and Kubernetes, along with experience in monitoring and metrics platforms like Prometheus and Grafana.

- **Coding Skills**: Proficiency in various programming languages and a pragmatic approach to solving problems.

- **Technology and Culture Change**: Experience in driving technology, process, and cultural changes to instill site reliability in development practices.

Preferred Technologies

- **DevSecOps**: Prometheus, Grafana, Azure Cloud Management, Azure Monitor, Azure App Insights, Splunk, SolarWinds, Azure DevOps, Docker, Kubernetes, Ansible, YAML, Atlassian tool chain, Sonar Cloud, Microsoft SQL Server, Networking (DNS, HTTP, WAF, Load Balancing, Reverse Proxy), Web Servers (Nginx, Apache/Tomcat).

- **Development**: Angular.js, Node.js, SQL, Apache Kafka.

What We Offer

- A dynamic and collaborative work environment.

- Opportunities for professional development and continuous learning.

- Competitive salary and benefits package.

If you are passionate about site reliability and eager to make a significant impact, we would love to hear from you!

If you are interested in this position please share your updated resume at [email protected]

Also please fill the details:-

Current Salary:

Expected Salary:

Notice period:

Total Years of Experience:

Visa:

Nationality:

Similar Jobs

1d ago

Desktop Engineer
Accenture Southeast Asia

Full Time, onsite, onsite

Salary undisclosed

1d ago

Site Supervisor
Soonhup Licheng Sdn Bhd

Full Time, onsite, onsite

Salary undisclosed