Epicareer Might not Working Properly
Learn More
M

Site Reliability Engineer (DevOps, Linux)

RM 6,000 - RM 7,999 / month

Checking job availability...

Original
Simplified
Site Reliability Engineer Must Have: • Experience in observability, capacity planning, issue analysis and troubleshooting large-scale massively distributed, fault-tolerant systems running cloud native applications in micro services architecture; • Ability to debug scripts and automate routine tasks in OS, network, database or application servers. Coding experience beyond simple scripts; • Experience in Devops process, programming knowledge in at least one of the following languages: Java, Python, or Go; • Scripting skills in at least of the following: Shell, Terraform, Ansible, Chef or Puppet; • Deep Understanding of Unix/Linux operating systems, virtual machines, containers, Container management systems, Enterprise cloud platforms and data structures; • Engage in and improve the lifecycle of services—from Launch through to deployment, operation and optimization in reliability and user experience; • Ensure service reliability once they are live by measuring and monitoring availability, latency, and overall system health. Practice sustainable incident response; • Gather and analyze metrics from tech stack to assist in performance tuning and fault handling of P0/P1/P2/P3 type of issues; • Participate in system design suggestions, platform management, Balance feature development speed and reliability with well-defined service level objectives; • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve. Good to Have: • Automation framework design using popular frameworks like salt stack, spinnaker, stack storm; • Experience on managing large scale Big data clusters; • Experience in Chaos engineering design and implementation; • Experience in large scale container management platform with auto scaling and intelligent scheduling; • Experience in Data analysis or data science or Data development of PB size data; • Experience of SIEM, threat modelling, vulnerability detection design, deployment and optimization; • Experience in cloud services network design, rules/policy creation, deployment, performance tuning; • Experience in DB consistency detection, slow query tooling, Performance tuning of middleware including RDBMS, NoSQL, distributed caches. Professional Knowledge Requirements • Bachelor degree or above in Computer science/Electronics & communication; • Have in-depth knowledge of SRE role and Devops process; • Have strong observation and critical thinking to handle business emergencies; • Ability to adapt to dynamic environment and apply problem solving skills to resolve issues; • Have excellent written and verbal communication skills; • Deep exposure to data analysis-based decision making scenarios; • Established record of continuous learning and upskilling tracking industry trends.