Site Reliability Engineer

Salary undisclosed

Checking job availability...

Original

Simplified

Responsibility

Ensure all our infrastructure are running at optimal condition.
Provide deployment, patches and update on all services that running on public cloud and on premise.
Identify and resolve support ticket that are related to our infrastructure and services.
Work closely with developer to provide a completed, up to date and readable documentation.
Develop SRE task related documentation for future reference and better tracing.
Monitor our services using Grafana and identify bottleneck if any. Provide immidiate action and troubleshooting when necessary.
Maintain, enhance our monitoring system including but not limited to Grafana, Victoria Metrics, Alert Manager.
Work closely with cross department to provide update and patch on our services using our CICD tools.
Identify on system log to provide better understand on service outage and issues.
Perform preventive maintenance to our system and infra.
Always willing to learn new technology and tools.

Requirement

Having 1 years or more in DevOps, Network engineer, SRE related field is required.
Familiar with Linux and networking related skills.
Able to work and solve problems independently when required.
Having hands-on experience with bash script.
Brief understanding on how cloud infrastructure (Alicloud, AWS, GCP and more) works.
Able to work on call
Willing to learn new technology such as Grafana, Terraform, Gitlab CI/CD, ArgoCD and Ansible.

Nice to have

Responsibility

Ensure all our infrastructure are running at optimal condition.
Provide deployment, patches and update on all services that running on public cloud and on premise.
Identify and resolve support ticket that are related to our infrastructure and services.
Work closely with developer to provide a completed, up to date and readable documentation.
Develop SRE task related documentation for future reference and better tracing.
Monitor our services using Grafana and identify bottleneck if any. Provide immidiate action and troubleshooting when necessary.
Maintain, enhance our monitoring system including but not limited to Grafana, Victoria Metrics, Alert Manager.
Work closely with cross department to provide update and patch on our services using our CICD tools.
Identify on system log to provide better understand on service outage and issues.
Perform preventive maintenance to our system and infra.
Always willing to learn new technology and tools.

Requirement

Having 1 years or more in DevOps, Network engineer, SRE related field is required.
Familiar with Linux and networking related skills.
Able to work and solve problems independently when required.
Having hands-on experience with bash script.
Brief understanding on how cloud infrastructure (Alicloud, AWS, GCP and more) works.
Able to work on call
Willing to learn new technology such as Grafana, Terraform, Gitlab CI/CD, ArgoCD and Ansible.

Nice to have