Senior Site Reliability Engineer

Full Time, onsite
BigPay Malaysia Sdn Bhd
Kuala Lumpur, Malaysia

Salary undisclosed

Checking job availability...

Original

Simplified

We’re looking for a site reliability engineer. Your job will be to look after the stability and performance of our production systems as a whole and ensure that they continue to run without incident. When incidents occur, you will be on the front line of investigation and remediation and direct action from our backend and other teams. Responsibilities Continuously monitoring our distributed automated systems, and responding to and reporting incidents. Having a wide-ranging understanding of our operational products as a whole ( especially around financial services such as real-time payments, remittances, lending etc. ), sufficient to intervene in any one of them to correct issues and tune performance. Planning and building alerting, monitoring and other operational systems to detect and correct problems. Own system monitoring and observability stack and enhance alerting mechanism Jointly own system deployment lifecycle with Backend Engineering to be the DevOps bridge for engineers Writing code to reduce the risk of incidents and to make them easier to diagnose when they do occur. Writing code to perform performance and functionality checks at all stages of the software development life cycle. Providing education and awareness to other teams of the impact of their work on yours and encouraging working practices that improve reliability. Providing and leading changes to our architecture and feedback to management about reliability experiences in practice and what we can do to improve our uptime. Capturing and providing metrics with sufficient detail for audit purposes. This is a wide-ranging brief that covers a range of complex software engineering topics and you will be expected to become familiar with a large, multi-product codebase which responds in sometimes complex ways to varying conditions. This role will obviously require some out of hours work; however, we operate a rota and a follow-the-sun model which should minimise the amount of time you need to spend out of working hours. Tech stack Our backend consists of services (microservices or domain-separated monoliths depending on product) written in a variety of languages - Rust, Java, Kotlin - communicating with Kafka and REST APIs and running on kubernetes in Google Cloud. Most modern code is in rust. Apps in BigPay are mostly written in Dart using flutter. Web interfaces are built in angular or react and we use python for tooling and data manipulation. Our data is in various databases from postgresql to BigQuery. We are technology-agnostic and will adopt the best tool for a job. All our code goes through PR and ships to production via a continuous delivery pipeline. To be successful You should have: At least 3 years experience in software development or SRE A good first degree in Computer Science or a related discipline. Solid demonstrated experience running and managing high performance systems on the cloud ( GCP preferred ) Knowledge of modern devops and SRE practices An enthusiasm for technology and an ability to learn new things quickly. To be a self-starter, willing to take responsibility for deliverables and able to organise to deliver. Experience with the following is highly preferred: Linux, K8s, docker, terraform Rust, Kotlin, or Java Spring Boot Kafka React or Angular Python We would particularly welcome applications from people who can work across boundaries between development, devops, mobile and web frontend. Why BigPay? - Join a fearless adventure, where your opinion and input is highly contributional. - Work in a fast paced, growing company where you will be empowered to succeed. - An environment where you can challenge and be challenged. - You will be surrounded by a multidisciplinary group of experts. - Competitive salary & benefit.