Site Reliability Engineer

Location Singapore
Discipline
Job Reference BBBH84145_1573808215
Salary Negotiable
Consultant Email james.andalan@manpower.com.sg
EA License No. 02C3423


Site Reliability Engineer mentors the development team to write code, optimized for scale, automated operations, and graceful degradation. SRE is responsible for the operational support for the product (making sure it does not use more than 50% of their time). Works with the team to implement new features in the time remaining from their operations tasks. The SRE assists the team in implementing the DevOps strategy. This role is an expert in all systems used for monitoring and troubleshooting and assists the team in trouble shooting.

Key Responsibilities:

  • Responsible to develop the initial on-call playbook, and to keep it updated as the product evolves and as features are added, removed or changed
  • Responsible for change management on the team, develops actionable plans to minimize outage risks tied to changes by focusing on
    o Progressive rollouts
    o Quick and accurate troubleshooting
    o Efficient and reliable rollback of changes when required
  • Forecast demand and plan for adequate (optimal) capacity to satisfy natural product usage cycles and any surge demands, such as marketing campaigns, promotional campaigns, etc. This is done while not exceeding the computational budget agreed for the team.
  • Ensure load-shifting is completed as required to address usage variations and scheduled maintenance windows.

    Target Deliverable
    1. Define and standardize the NFR needs of business. Precise NFR definition for the Microservices and APIs.
    2. Establish process for Production incident report, post-mortem report and conduct retrospective sessions. Guide Prod-Ops for resolving critical incidents.
    3. Define critical performance KPIs, set alert rules and roll-out monitoring dashboards for Production with timely reporting to the stakeholders.
    4. Review the performance certification report before application go live and ensure the performance recommendations are part of the change request process.
    5. Create coding best practices for application development from performance perspective. Actively participate in the design and code review.
    6. Establish application performance benchmark with given infra spec and derive the tunable params.
    7. Engage with the Infra/ProdOps team to forecast capacity requirements.
    8. New ideas to create a Sustainable Efficiency
    a. Action items with appropriate timeline - Short Term vs Long Term

    Skills and Qualifications:

    At least 10 Years of hands on experience in banking domain with Application Development , DevOps & ProdOps process with end-to-end visibility of the system with 4 years of hands on experience as an SRE Engineer.)

Rvin James Murillo Andalan EA License No. 02C3423 Personnel Registration No. R1331697

Please note that your response to this advertisement and communications with us pursuant to this advertisement will constitute informed consent to the collection, use and/or disclosure of personal data by ManpowerGroup Singapore for the purpose of carrying out its business, in compliance with the relevant provisions of the Personal Data Protection Act 2012. To learn more about ManpowerGroup's Global Privacy Policy, please click here.