Epicareer Might not Working Properly
Learn More

Software Engineer

  • Full Time, onsite
  • MANPOWER STAFFING SERVICES (M) SDN BHD
  • Petaling Jaya, Malaysia
Salary undisclosed

Checking job availability...

Original
Simplified
Job Description: The Day-to-Day Activities: Deliver high-quality AI infrastructure solutions: You will work with the Machine Leaning Platform team to design and develop the infrastructure to support distributed data processing and model training. You will utilize GitOps to ensure the reproducibility of the system's cloud infrastructure on different Kubernetes clusters. Develop observability solutions for Machine Learning pipelines: You will be responsible for developing and integrating monitoring and alerting within Grab’s monitoring stack powered by Datadog, Prometheus, and Grafana. You will also contribute to the creation of runbooks and DevOps guides. The Must-Haves: Understand terraform and popular modules like EKS. Able to understand complex code bases and analyze dependencies. Understanding of Kubernetes and experience in managing large clusters. Understanding of core AWS cloud concepts like ec2, autoscaling groups, launch templates, subnets, etc. Understand core components like coredns, autoscaler, csi driver, load balancer controllers, service mesh, etc. Perform zero down time cluster upgrades for clusters serving critical online traffic and tight SLA batch jobs