SonicJobs Logo
Login
Left arrow iconBack to search

Site Reliability Engineer

Spectrum IT Recruitment
Posted a day ago, valid for 5 hours
Location

Southampton, Hampshire SO19 1BQ

Contract type

Full Time

Life Insurance
Employee Assistance

In order to submit this application, a Reed account will be created for you. As such, in addition to applying for this job, you will be signed up to all Reed’s services as part of the process. By submitting this application, you agree to Reed’s Terms and Conditions and acknowledge that your personal data will be transferred to Reed and processed by them in accordance with their Privacy Policy.

Sonic Summary

info
  • The Site Reliability Engineer position is based in Southampton HQ, requiring in-office attendance two times a week and a security clearance.
  • Candidates should have 3-6 years of hands-on experience in systems engineering, automation, and service reliability.
  • The role offers a competitive salary along with benefits including life insurance, private medical insurance, and a hybrid working model.
  • Key responsibilities include monitoring system performance, collaborating with developers, and implementing automated solutions for scalable systems.
  • Preferred qualifications include experience with Kubernetes, cloud platforms like AWS, and proficiency in programming languages such as Python or Go.

Site Reliability Engineer

Southampton HQ - 2 Times a week in Office

Cloud, SaaS, AWS,

Please be advised Security Clearance is required for this position

We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions.

In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications.

How You'll Contribute:

  • Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively
  • Collaborate closely with developers to enhance service quality through thorough testing and structured release practices
  • Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting
  • Design and implement automated solutions to build resilient, scalable systems
  • Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goals

You'll Stand Out If You Have:

  • Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus
  • Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo
  • Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck
  • Experience using configuration management platforms like Ansible, Puppet, or Chef
  • Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentials

Do You Have What It Takes?

  • 3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability
  • Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell
  • Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints
  • Practical experience using infrastructure-as-code tools like CloudFormation or Terraform
  • In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI
  • Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture
  • Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch
  • Excellent analytical and troubleshooting abilities, especially within complex distributed systems
  • Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outages

Benefits

  • Life Insurance - 4 x Annual Salary
  • Private Medical Insurance
  • Employee Assistance Programme
  • Hybrid Working - 3 Days from Home
  • GP Online Assistance Portal.
  • + Much More

Please click the "Apply" button to state your interest in this position.

Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.

Apply now in a few quick clicks

In order to submit this application, a Reed account will be created for you. As such, in addition to applying for this job, you will be signed up to all Reed’s services as part of the process. By submitting this application, you agree to Reed’s Terms and Conditions and acknowledge that your personal data will be transferred to Reed and processed by them in accordance with their Privacy Policy.