Site Reliability Engineer
Salary undisclosed
Checking job availability...
Original
Simplified
Responsibility
- Ensure all our infrastructure are running at optimal condition.
- Provide deployment, patches and update on all services that running on public cloud and on premise.
- Identify and resolve support ticket that are related to our infrastructure and services.
- Work closely with developer to provide a completed, up to date and readable documentation.
- Develop SRE task related documentation for future reference and better tracing.
- Monitor our services using Grafana and identify bottleneck if any. Provide immidiate action and troubleshooting when necessary.
- Maintain, enhance our monitoring system including but not limited to Grafana, Victoria Metrics, Alert Manager.
- Work closely with cross department to provide update and patch on our services using our CICD tools.
- Identify on system log to provide better understand on service outage and issues.
- Perform preventive maintenance to our system and infra.
- Always willing to learn new technology and tools.
Requirement
- Having 1 years or more in DevOps, Network engineer, SRE related field is required.
- Familiar with Linux and networking related skills.
- Able to work and solve problems independently when required.
- Having hands-on experience with bash script.
- Brief understanding on how cloud infrastructure (Alicloud, AWS, GCP and more) works.
- Able to work on call
- Willing to learn new technology such as Grafana, Terraform, Gitlab CI/CD, ArgoCD and Ansible.
Nice to have
- Understand how docker and kubernetes work
- Programming experience. (python and golang)
- Brief understanding on Terraform, Ansible, Packer is a plus
- Having hands-on knowledge in cloud computing, kubernetes, Gitlab etc.
- Having hands-on knowledge in Terraform and Ansible related skills.
Responsibility
- Ensure all our infrastructure are running at optimal condition.
- Provide deployment, patches and update on all services that running on public cloud and on premise.
- Identify and resolve support ticket that are related to our infrastructure and services.
- Work closely with developer to provide a completed, up to date and readable documentation.
- Develop SRE task related documentation for future reference and better tracing.
- Monitor our services using Grafana and identify bottleneck if any. Provide immidiate action and troubleshooting when necessary.
- Maintain, enhance our monitoring system including but not limited to Grafana, Victoria Metrics, Alert Manager.
- Work closely with cross department to provide update and patch on our services using our CICD tools.
- Identify on system log to provide better understand on service outage and issues.
- Perform preventive maintenance to our system and infra.
- Always willing to learn new technology and tools.
Requirement
- Having 1 years or more in DevOps, Network engineer, SRE related field is required.
- Familiar with Linux and networking related skills.
- Able to work and solve problems independently when required.
- Having hands-on experience with bash script.
- Brief understanding on how cloud infrastructure (Alicloud, AWS, GCP and more) works.
- Able to work on call
- Willing to learn new technology such as Grafana, Terraform, Gitlab CI/CD, ArgoCD and Ansible.
Nice to have
- Understand how docker and kubernetes work
- Programming experience. (python and golang)
- Brief understanding on Terraform, Ansible, Packer is a plus
- Having hands-on knowledge in cloud computing, kubernetes, Gitlab etc.
- Having hands-on knowledge in Terraform and Ansible related skills.