Site Reliability Engineer (HYBRID)
RM 13,000 - RM 15,999 / Per Mon
Checking job availability...
Original
Simplified
Key Responsibilities - Design robust system architectures to ensure high availability and scalability. - Create automation tools to improve efficiency and reduce manual work. - Monitor and analyze system reliability using SLOs and SLIs. - Perform root cause analysis after incidents and implement improvements. - Work with development and operations teams to establish reliability best practices. - Troubleshoot and resolve issues with databases, networks, and deployments, including platform-level problems (e.g., Kubernetes). - Ensure timely issue resolution while meeting Service Level Agreements (SLAs). - Identify system performance bottlenecks and recommend improvements. - Document processes and incident responses to support knowledge sharing. Qualifications - Skilled in programming (e.g., Python, Golang, Java) for operational tasks. - Experienced in designing scalable and reliable systems. - Strong understanding of SRE principles, including SLOs, SLIs, and incident management. - Proficient in managing cloud platforms like AWS, Azure, or Google Cloud. - Expertise in Linux system administration. - Proven ability to troubleshoot performance and connectivity issues. - Knowledgeable in networking concepts and problem-solving techniques. - Strong analytical skills and a proactive mindset. - Comfortable working both independently and as part of a team. Preferred Skills - Experience with monitoring tools and performance tuning. - Proficiency in scripting for system administration. - Knowledge of networking and troubleshooting methods. - Hands-on experience with cloud services (e.g., AWS, Azure, Google Cloud). - Familiarity with DevOps practices like CI/CD, infrastructure as code, and containerization.
Similar Jobs