Site Reliability Engineer / SRE - Experienced Software Engineer / Developer - Hedge Fund - Market Leading Compensation

Location: London Sectors

This organization's engineers own a varied technology stack, end-to-end, and are in constant search of incremental improvements, new technologies, and ways of working to evolve our platform and give us a competitive edge.

They seek people who want to find unique solutions for optimizing efficiency and performance in a context where they are key enablers. The ideal candidate will need to have deep knowledge of Kubernetes as our platform is a growing presence and is critical to many parts of the business.

Responsibilities

Collaboratively architecting a rock-solid and secure Kubernetes platform that can handle the huge volumes of data and load of our diverse technology estate
Accelerate the migration strategy to more cloud-native, distributed applications
Enhance and simplify the on-prem stack and its integrations with our hybrid Kubernetes setup Create, implement, and evangelize the "Infrastructure as Code" mind-set and best practices across the environment
Eliminate the toil that emerges with large, distributed systems. Identify it, own it, automate it (where possible)
Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, and performance of the infrastructure
Key Skills and Technologies
Expert level scripting / coding skills in one or more languages (Python / Golang / Shell) Expert in cloud native and containerisation technologies (Kubernetes / Docker) Excellent Linux systems knowledge (experience with RHEL desirable)
Experience with configuration management tools such as (Ansible / Puppet / Kapitan / Terraform) Broad knowledge across network technologies, server virtualisation, storage
Experience with observability systems (Prometheus / ELK / Jaeger) Experience with distributed data platforms (Kafka / Flink / Airflow)
Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box
Focused on improving system availability, security and resilience through testing, standardisation and automation
Ability to simply articulate the "why" behind best practices and to build positive and collaborative relationships with colleagues across teams and geographies