Epicareer Might not Working Properly
Learn More

SRE / Observability Engineer

Salary undisclosed

Apply on

Availability Status

This job is expected to be in high demand and may close soon. We’ll remove this job ad once it's closed.


Original
Simplified

Job Description:

  • Configures monitoring/alerting/Dashboards/reporting for related application performance monitoring using Dynatrace APM Monitoring tool

  • Analyzing application performance issues by performing root cause analysis within Dynatrace

  • Makes technical recommendations on monitoring improvements (by creates Technical PoCs to demonstrate performance improvements), and partner with Engineering team on new standards for logs, business events, and tags

  • Building out of new Dynatrace monitoring solutions, including the testing and implementation of new features and metrics

  • Frontend application performance monitoring of RUM applications as well as creating and monitoring Synthetic transactions

  • Performing assessment analysis to identify scope of problems and escalate recurring issues to management

  • Train application teams on using Dynatrace for root cause analysis to resolve issues as well as provide guidance to address their needs

  • Maintaining proficiency in application and product expertise.

  • Keeping abreast of the new Dynatrace features and how that impacts licensing and monitoring opportunities

  • Demonstrating ability to communicate effectively with all levels, including customers, technical personnel and management.

  • Thoroughly document processes and standard operating procedures


Job Qualifications:

  • Bachelor's degree in computer science or a related discipline, or equivalent work experience is required.

  • 2-3 years of experience in APM tools such as Dynatrace

  • Expertise in Python/Shell scripting/automation

  • Expertise in application instrumentation and monitoring

  • Working knowledge of infrastructure components. (E.g. routers, load balancers , cloud products , container systems , compute, storage and networks)

  • Collaborate with development and support teams to resolve performance related issues

  • Able to function rationally and methodically in a high productivity environment.

  • Proven understanding of web technologies and distributed application architecture is required

  • Proven understanding of full life cycle software development methodologies is required

Preferred Skills:

  • Must have Experience with APM tools such as Dynatrace

  • Must have Excellent debugging and trouble shooting skills

  • Strong communication skills

  • Good to have Knowledge on Log analysis tools like Splunk / ELK / etc.

  • Should have Knowledge on Windows Server 2008-2019 OS, Linux, Solaris and AIX

  • Worked with Service Reliability Engineering team to design SLI and SLO for respective applications

  • Design and build Service Level Indicator (SLIs) metrics, including but not limited to Service Level Objectives (SLOs), Error Budget, Burn Rate Alerts