Job title: Site Reliability Engineer
Work Mode: 3 days office Mandatory
Location: 5 Broadgate, London EC2M 2QS, United Kingdom
Contract Duration: 12 months
We’re looking for a Site Reliability Engineer to:
· determine the reliability of our digital products, technology services, and the infrastructure that underpins them
· minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
· respond to production incidents to gain first-hand experience of operational hotspots and to identify the root causes of problems
· collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
· apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to automated testing, deployment, and operations
· ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements
Your expertise
ideally 5+ years of experience in an Application Support role within financial services industry
excellent verbal and written communication skills along with strong collaboration skills
experience in some scripting languages
experience in Application Performance Monitoring (APM) Tools
knowledge of Linux OS
knowledge in IP networking
knowledge of troubleshooting Java applications
knowledge of application and web servers (NGINX, Apache)
knowledge of visualization (Docker, K8S)
knowledge of provisioning cloud infrastructure using Terraform
knowledge about cloud computing and managing cloud environments (Azure preferred)
knowledge about fundamentals of CI/CD
drive automation to eliminate TOIL