Who we are
TK Elevator (TKE) is a global leader in vertical transportation and urban mobility. We provide engineering that keeps the world moving, including design, installation, and maintenance of elevators, escalators, walkways, lifts, passenger boarding bridges, stairlifts, platform lifts and home elevators – including multi-brand modernization and service any place, any time. With TK Elevator’s AI and digital solutions there are no longer any limits to urban mobility. TK Elevator became independent following its separation from the thyssenkrupp group in 2020. The company achieved sales of €9.2 billion in fiscal year 2024/2025. With around 50,000 employees, 25,000 service technicians and over 1,000 support centers globally, we are moved by what moves people. TKE – Move Beyond.
To strengthen our global Digital Technology organization, we are looking for an experienced Senior Site Reliability Engineer (d/f/m) who will take ownership of System Health Monitoring across our digital ecosystem and the MAX IoT Platform. In this highly visible role, you will help establish reliability as a core capability across the organization by defining observability standards, driving monitoring excellence, and enabling engineering teams to build resilient, reliable solutions.
Drive Monitoring & Observability Excellence
-
Own and continuously evolve System Health Monitoring across products and platforms.
-
Define and govern observability standards, monitoring requirements, health models, and alerting strategies.
-
Establish a unified view of platform health utilizing Azure observability solutions, Central Log Analytics, Grafana, and DevOps monitoring tools.
-
Design, implement, and optimize Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budget frameworks.
Enhance Platform Reliability
-
Promote reliability engineering best practices, including health checks, error-budget management, and post-incident learning processes.
-
Analyze incident trends and identify opportunities to improve monitoring, alerting, synthetic testing, and operational resilience.
-
Transform monitoring data into actionable insights and predictive analytics that proactively identify risks and performance issues.
Enable Teams & Foster Collaboration
-
Partner closely with Product, Architecture, DevOps, and Incident Operations teams to improve monitoring coverage and alert quality.
-
Provide technical guidance, coaching, and mentorship to engineering teams across the organization.
-
Develop and maintain high-quality documentation, including runbooks, monitoring standards, and incident-response procedures.
-
Contribute actively to the global DevOps community by sharing knowledge, best practices, and lessons learned.
- Minimum 5 years of experience in Site Reliability Engineering, DevOps, Cloud Operations, or a related discipline.
-
Strong expertise in Microsoft Azure and cloud-native technologies.
-
Deep knowledge of:
-
Azure Monitor
-
Log Analytics / Kusto Query Language (KQL)
-
Application Insights
-
Grafana
-
Proven experience defining and managing SLIs, SLOs, error budgets, and reliability frameworks for large-scale distributed systems.
-
Strong understanding of distributed architectures, cloud platforms, and Azure PaaS services.
-
Experience with incident management, post-mortem processes, and continuous reliability improvement.
-
Solid scripting and automation skills using technologies such as C#/.NET, PowerShell, or Python.
-
Excellent analytical, problem-solving, and communication skills.
-
Fluency in English (written and spoken) is required.
Nice to Have
-
Experience with Power BI, Databricks, or AI-driven observability and analytics solutions.
-
Knowledge of Azure DevOps, CI/CD pipelines, Git, and Agile methodologies.
-
Bachelor's degree in Computer Engineering or a related technical discipline.
- Health and Safety –Highest standards and a wide range of health promotion and healthcare activities
-
Flexibility –We support, for example, through flexible yet regulated working hours and remote working options
-
Collaboration & diversity –Collegiality is of huge importance – we treat everyone with respect and appreciation
-
Development –Individual support to help you get started in your new job as well as training and education programs to help you develop professionally and personally
-
Creative leeway –We offer an environment in which you can try out new solutions in a no-blame-culture
-
Sustainability –We act with responsibility and environmental awareness
-
Work environment –We have modern workplaces and IT equipment, subsidized lunchtime meals in the canteen, free parking and discounted public transport tickets
Please apply online in English, including your notice period and salary expectation.
Talent Acquisition
Uwe Hüsken
Permanent Engineering & urban mobility Information Technology Experienced professionals