Site Reliability Engineer / SRE - Experienced Software Engineer / Developer - Hedge Fund - Market Leading Compensation

Location: London   Sectors   

This organization's engineers own a varied technology stack, end-to-end, and are in constant search of incremental improvements, new technologies, and ways of working to evolve our platform and give us a competitive edge.

They seek people who want to find unique solutions for optimizing efficiency and performance in a context where they are key enablers. The ideal candidate will need to have deep knowledge of Kubernetes as our platform is a growing presence and is critical to many parts of the business.

Responsibilities

  • Collaboratively architecting a rock-solid and secure Kubernetes platform that can handle the huge volumes of data and load of our diverse technology estate
  • Accelerate the migration strategy to more cloud-native, distributed applications
  • Enhance and simplify the on-prem stack and its integrations with our hybrid Kubernetes setup Create, implement, and evangelize the "Infrastructure as Code" mind-set and best practices across the environment
  • Eliminate the toil that emerges with large, distributed systems. Identify it, own it, automate it (where possible)
  • Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, and performance of the infrastructure
  • Key Skills and Technologies
  • Expert level scripting / coding skills in one or more languages (Python / Golang / Shell) Expert in cloud native and containerisation technologies (Kubernetes / Docker) Excellent Linux systems knowledge (experience with RHEL desirable)
  • Experience with configuration management tools such as (Ansible / Puppet / Kapitan / Terraform) Broad knowledge across network technologies, server virtualisation, storage
  • Experience with observability systems (Prometheus / ELK / Jaeger) Experience with distributed data platforms (Kafka / Flink / Airflow)
  • Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box
  • Focused on improving system availability, security and resilience through testing, standardisation and automation
  • Ability to simply articulate the "why" behind best practices and to build positive and collaborative relationships with colleagues across teams and geographies