Epicareer Might not Working Properly
Learn More

Monitoring Specialist

Salary undisclosed

Apply on


Original
Simplified

Hiring for Infra Monitoring role for Kuala Lumpur, Malaysia location

Roles & Responsibilities:

  1. Daily activity in the morning - preparation of Daily Health Check on all the servers, network devices & Storage devices on or before 7:30AM MYT/SGT through BMC impact explorer tool.
  1. Once it is prepared need to escalate all the existing alerts (Critical, Major, Warning & Minor) to respective Infra teams and post the critical alerts count of all the teams in WhatsApp Management group and sent email to management & tower leads as every team manager will be aware on their existing alerts.
  1. Once it is escalated and posted in WhatsApp, need to coordinate with on-call persons only on critical alerts and get resolution update from them before 9:30AM MYT/SGT. After getting update from all the teams need to clear the alerts manually from monitoring tool, then again needs to update in WhatsApp as all the critical alerts are resolved and cleared.
  1. Remaining Major, Warning & Minor alerts needs to get update through email by end of the day from respective team members.
  1. Every day Critical, Major, Warning & Minor alerts are configured and being sent to monitoring team DL and associate on shift have to escalate this alert to respective team within 15mins.
  1. Escalation of all critical alerts also should posted in Microsoft teams tower groups and its essential for shift analysts on duty to get update from the team members. If analysts on duty did not get update from respective tower engineers, then they need to call through Microsoft teams / mobile hotline number as per IOC roaster to obtain update.
  1. If we receive critical alerts E.g. server/network/process/storage array/DB down, URL down during night (MY Time) then associate on duty needs to call to on-call person and inform. If on-call person did not respond, they need to call to contact respective tower leads and escalate.
  1. Currently we have issue on monitoring tool and alerts are not received via Monitoring Team DL. As per current process, analysts on duty have to perform manual monitoring by checking/ validating all alerts manually from monitoring tool and escalate to respective teams. We anticipate more than 1000 alerts will exist in the tool.
  1. Certificate Management- Once in a month activity to generate expiry certificates report of current month from Digicert portal and escalate to respective team to get update on certificate renewal through email.
  1. Monitoring team receives CSR file to renew the certificate, once we received this file, analysts on duty needs to renew this certificates through Digicert portal and share the certificate to respective team/ owner.
  1. By month end once all the certificates are renewed/decommissioned/not required, need to update to the management on certificates count / statistics.
  1. For every certificate renewal, respective team should raise NSR ticket, based on the ticket monitoring team associate will renew this certificate.

Monitoring team also performs Domain creation/validation through digicert portal and will do coordination with the team who manages this DNS servers.

  1. Any troubleshoot issues related to certificate will raise a case with Digicert support team and coordinate with them through email or chat until issue is resolved.
  1. Actively perform and attending to Monitoring Team Mailbox emails during shift to address queries, requests, escalations, issues, etc. from clients or internal.
  1. Analysts on duty is also required to take and perform appropriate actions on script-based configuration alerts and take necessary action as per the process escalation.