1 day old

Senior Site Reliability Engineer

Bengaluru, KA 560002 Work Remotely
Apply Now
Apply on the Company Site

Job Description

JD Site Reliability Engineer

Purpose of Position

As a Site Reliability Engineer / NOC Engineer, you will contribute to a global team responsible for performing 1st and 2nd level Event, Incident, and Problem management activities in a complex and highly technical environment. This position also helps lead continuous product improvements, assists with non-negotiable projects that support the Department Goals, performs system-wide upgrades and occasionally acts as an individual contributor or SME on special Ops projects. The Site Reliability Engineer will be accountable for the monitoring of multiple applications, network infra in hybrid Infra of physical datacenters and cloud environments on a local and worldwide level in a 24/7/365 production environment. This position is responsible for starting and driving the RCA/RFO/PIR process for any P1 issues that occur during their shift. This position will receive general direction on new assignments and has work reviewed by Operations Management team for the soundness of technical judgment, quality, and business sense.

Principal Responsibilities

NOC

* Monitor alerts from various tools and internal reports

* Work on troubleshooting the alerts and escalate based on the priority and the impact of the alerts

* Ability to analyze the alert and engage the required teams in a timely manner to avoid potential outage

* Follow the documented SOPs and work on resolving the alerts as per the SLA

* Verifies that all incoming operations incidents are in the ticketing system

* Responsible for escalating and prioritizing any unresolved issues to the appropriate on-call staff so the ticket can be closed in a timely manner and reports any violations to Ops management

* Work on issues related to our datacenters, cloud environments, network infrastructure, hardware and/or applications.

* Responsible for executing operational objectives and ensuring that the teams meet or exceed service level expectations by following defined resolution and escalation procedures and pre-defined intra-company outage communications and updates

* Meets regularly with operations teams, development, and other Site Reliability staff to prioritize future stage and live application, deployment, or project tasks.

* Proofing and recommending updates, patches, replacements or upgrades to current Site Reliability Software tools and Monitoring systems

Incident Management:

* Ability to run bridge calls for al P1/P2 outages that cause impact to customers

* Escalate and do the needed communications across various teams and update the higher management on the progress

* Ensure customers are notified on a timely manner with the ongoing process during the outage as per the process

* Ability to connect the dots and arrive at the big picture and confidence to run the incident bridge efficiently

* Ability to summarize the incident at any point and engage the required application and/or engineering teams in a timely manner to maintain product availability

* Post the outage, able to conduct the post mortem/root cause analysis meeting with all the involved stakeholders as per the SLA

* Able to comprehend the application and/or product details and convey the same in layman terms to upper management and customers as required

Problem Management:

* Identify trends to cause & work with respective teams to arrive at workable solutions such as enhancing monitoring, process adherence etc.

* Create/Update incident and problem management procedures to be used by the 1st Level and 2nd Level Site Reliability Technicians

Other Duties and Responsibilities

* Regularly participates in the Shift Handover process with previous and incoming shift teams to help sync and transfer any ongoing issues or outages

* Available for on-call and emergency response rotation as needed

* Maintains the Escalation contact matrix and processes to ensure that all levels of the Support Organization are listed and audits this list frequently and works with other staff and team members to maintain the on-call status of other Operations and Development personnel

* Responds to any additional needs coming from his/her Direct Management

* Ensures that the other members of the team follow and enforce the Ops Change Control procedures and immediately escalate any violations to Ops management

Knowledge and Skills

* Bachelor's degree or equivalent experience required

* 6 to 8 years' experience in a technical or network operations support environment.

* Strong written and verbal communication skills are must

* Basic understanding of TCP/IP networking, SNMP, UNIX/Linux/Windows Server Operating Systems, HTTP/HTTPS, SSH etc

* Linux Certification or equivalent experience required with demonstrated understanding of basic command line tools to investigate applications alerts.

* Basic cloud Knowledge of AWS core services.

* Experience in Change Management & Problem Management domains

* ITIL certification is a plus


At LogMeIn, Inc., we build category-defining products that unlock the potential of the modern workforce, makingit possible for millions of people and businesses around the globe dotheirbest work, whenever, however, andmost importantly,wherever.Were a pioneer in remote work technology and a driving force behind todays work-from-anywhere movement, and have become one of the worlds largest SaaS companies with tens of millions of active users, more than 3,500 global employees, over $1.2 billion in annual revenue and more than 2 million customers worldwide who use our software as an essential part of their daily lives.Were headquartered in Boston, Massachusetts with additional locations in North America, South America, Europe, Asia and Australia.

Industry

  • Telecommunications
Posted: 2021-06-11 Expires: 2021-07-12

For bold and creative individuals, LogMeIn provides limitless growth opportunities. We hire extraordinary talent who continually seek opportunities to tackle challenges. We pride ourselves on an inclusive culture and collaborative spirit. Speaking up and listening to others is not just encouraged here, but expected.

We thrive together and champion each other’s successes, providing our employees with rich experiences to help them develop resiliency and skills; positioning them to grow into future roles either inside or outside LogMeIn.

If you are interested in bringing your curiosity and courage to challenge the status quo, start your journey by applying below.

A position at LogMeIn will reward you with the opportunity to grow, innovate, have fun and do the best work of your career.

Sponsored by:
ADP Logo

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Senior Site Reliability Engineer

LogMeIn
Bengaluru, KA 560002

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast