Job Description:
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Maintain services once they are live by measuring and monitoring.
- Troubleshoot and remediate issues with the services you manage.
Requirements:
- Experience with DevOps/SRE
- Expertise in key SRE Skills (Scalability, Reliability, and Observability).
- Experience in containerization and orchestration tools such as Docker and Kubernetes
- Experience with centralized configuration management tools such as Ansible.
- Experience with CI/CD tools and deployment processes such as (Gitlab or Azure DevOps).
- Strong understanding of Linux system administration.
- Familiarity with scripting or programming languages, e.g. Bash, Python.
- Experience with web servers such as Nginx.
- Expertise with monitoring and log aggregation tools (Prometheus, Grafana, ELK Stack).
- Strong communication skills and ability to work effectively across multiple technical teams.
- Good self-learning and research skills (ability to find an answer to a question or a solution to solve a problem).
- Good team-working.
- Strong documentation and reporting skills.