AI Infrastructure & Systems Specialist

Full Time, onsite
SNS Network (M) Sdn Bhd
Petaling Jaya, Malaysia

RM 5,000 - RM 8,000 / month

Checking job availability...

Original

Simplified

We are seeking a highly skilled and versatile AI Infrastructure & Systems Specialist to drive the deployment, orchestration, and management of our AI computing infrastructure. This role is critical in ensuring the smooth operation of our GPUaaS platform, optimizing AI workloads, and supporting customers in their AI adoption journey. The ideal candidate should have expertise in GPU systems, AI orchestration platforms, software engineering for AI, and technical leadership.

Key Responsibilities

1. GPU Systems Specialist

· Deploy, configure, and manage NVIDIA H100-powered GPU servers and networking infrastructure (InfiniBand).

· Optimize GPU performance for AI/ML workloads and troubleshoot hardware/software issues.

· Collaborate with NVIDIA and Dell teams for system optimization and technical support.

· Validate the GPU fractions, perform necessary checks and upgrades on GPU device drivers and Nvidia libraries.

2. Orchestration & Virtualization Specialist

· Implement and manage RUN.AI, Kubernetes, and NVIDIA MIG to optimize GPU resource allocation.

· Ensure seamless multi-tenant AI workload management and scaling strategies.

· Automate AI/ML pipeline orchestration for efficient resource utilization.

· Monitor the Services exposed via Kubernetes orchestration, enable GitOps model for the AI/ML workflows and ensure the K8s cluster works optimally.

· Monitor the Cluster usage, GPU quotas, Storage utilization and build performance reports on regular basis to assess the functionality of the K8s GPU stack.

3. AI Software Engineer

· Work with AI teams to enable model training and fine-tuning using PyTorch, TensorFlow, and RAPIDS.

· Develop and optimize AI/ML workflows on high-performance computing (HPC) environments.

· Integrate AI frameworks with cloud and on-prem GPU clusters.

4. Technical Manager

· Serve as the primary technical advisor to customers, ensuring seamless AI deployment.

· Conduct technical workshops, bootcamps, and onboarding sessions for users.

· Collaborate with universities, enterprises, and startups to drive AI adoption.

Requirements

Bachelor’s/Master’s degree in Computer Science, AI, Data Science, or related fields.
3-5 years of hands-on experience in AI infrastructure, GPU computing, or cloud platforms.
Strong expertise in NVIDIA GPUs, Kubernetes, and workload orchestration.
Experience with AI model training, MLOps, and software engineering.
Excellent troubleshooting, automation, and scripting skills (Python, Bash, Terraform, etc.).
Strong communication and leadership skills to work with both technical and non-technical stakeholders.
Applicants must be willing to work at 3 Two Square, Petaling Jaya

Nice-to-Have

Experience with Dell PowerEdge servers, NVIDIA NVAIE suite, and InfiniBand networking.
Knowledge of data engineering, storage solutions (Dell PowerScale), and AI analytics.
Prior experience in a tech leadership role or customer-facing AI solutioning.

Benefits:

ESOS
Over Time
EPF, Socso and EIS
Annual Leave up to 30 days/ Medical Leave/ Medical Claim
Staff Purchase Scheme
Product Training
Oversea Trip: Asia, America, Australia, Europe
Salary Review - Twice per year

Job Types: Full-time, Permanent

Pay: RM5,000.00 - RM8,000.00 per month

Benefits:

Maternity leave
Opportunities for promotion
Professional development

Schedule:

Monday to Friday

Supplemental Pay:

Overtime pay

Application Question(s):

Are you able to work from Monday to Friday, 8.30am to 6.30pm?

Experience:

AI infrastructure and GPU computing: 3 years (Preferred)

Work Location: In person