Epicareer Might not Working Properly
Learn More

Site Reliability Engineer

Salary undisclosed

Apply on


Original
Simplified
  • System Administration: Manage and maintain server environments, ensuring high availability and performance. Troubleshoot and resolve system issues promptly
  • Coding and Automation: Develop and maintain automation scripts and tools in Python to improve operational efficiency and reduce manual intervention
  • UNIX Scripting: Write and optimize UNIX shell scripts for monitoring, alerting, and automation of routine tasks
  • Collaboration: Work closely with development teams to implement reliable systems and applications, providing insights on best practices for reliability and performance
  • Incident Response: Participate in on-call rotations and respond to incidents, performing root cause analysis and implementing preventive measures
  • Documentation: Create and maintain documentation for system architecture, processes, and troubleshooting procedures
  • Continuous Improvement: Identify opportunities for process improvement and implement solutions to enhance system reliability and performance


Requirements

  • Bachelor's degree in Computer Science, Information Technology, or a related field
  • Proven experience in system administration and managing Linux/UNIX environments
  • Strong coding skills in Python and experience with scripting languages
  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack)
  • Excellent problem-solving skills and ability to troubleshoot complex systems
  • Strong communication and collaboration skills