System Engineer (CPU/GPU)

Full Time, onsite
Eps Ventures Sdn Bhd
Kuala Lumpur, Malaysia

RM 5,000 - RM 8,000 / month

Checking job availability...

Original

Simplified

The ideal candidate will have a strong background in system architecture, cloud infrastructure, and GPU technologies. This role involves designing, implementing, and maintaining robust systems that support AI, machine learning, and high-performance computing workloads. This role will be instrumental in ensuring the reliability, scalability, and efficiency of our GPU services, enabling our customers to achieve exceptional performance for their AI projects.

Key Responsibilities:

Design and deploy scalable, high-performance systems that leverage GPU resources for AI applications. Ensure optimal configuration for various workloads.
Manage and maintain the underlying infrastructure, including servers, networking, and storage solutions. Monitor system performance and implement necessary optimizations.
Work closely with software engineers and customers to understand their requirements and provide the necessary infrastructure support for AI model training and deployment.
Implement, manage, monitor, and maintain the platform to ensure optimal performance and high reliability.
Provide technical guidance across complex infrastructure projects.
Diagnose and resolve system issues related to hardware, software, and network performance. Provide technical support for internal teams and customers as needed.
Develop automation scripts to streamline system deployment, monitoring, and maintenance tasks.
Explore the possible enhancements to improve the daily process and troubleshooting/ticketing procedure.
Create and maintain comprehensive documentation for system configurations, processes, and procedures to ensure knowledge sharing within the team.

Qualifications:

Bachelor’s degree in Computer Science or a related technical field
Proven experience (3+ years) as a System Engineer or in a similar role within IT infrastructure or cloud services.
Introduce technology and software to improve the performance, resiliency, and quality of service in IT infrastructure.
Strong experience in managing bare metal servers, GPU infrastructure, or high-performance computing systems.
Familiarity with monitoring tools (e.g., Prometheus, Grafana) and logging frameworks.
Possess a deep understanding of Linux fundamentals.
Understand the Kubernetes environments and be able to run the debugging

Job Type: Permanent

Pay: RM5,000.00 - RM8,000.00 per month

Benefits:

Opportunities for promotion
Professional development

Schedule:

Monday to Friday

Application Question(s):

how many years of experience you have working with GPU clusters?
are you familiar with Linux environment?
How long is your notice period?
What is your expected salary?

Work Location: In person