Site Reliability Engineer
1XHub (1XHub.com) is now seeking a new employee in the Czech Republic to support their multinational Cloud Operations Project.
The ideal candidate will have hands-on experience developing operational based tools, managing and supporting highly available, large Scale web applications in production.
In this role you will:
– Be part of the tectonic shift of the TV industry to over the top CloudTV.
– Oversee and own overall Production deployment, maintenance and enhancements processes, procedures, as well as availability, scalability, operability and assuring top notch SLA tracking.
– Be part of SRE team focused on introducing new technologies and systems, deploying services to multiple cloud environments and regions, and pushing our Production excellence and offering to the next levels.
– Solve technical problems, provide guidance to various teams (internal & external), and continually improve our systems, deployments, operations, and overall cloud activities and costs.
– Work Closely with DevOsS, R&D, support and product teams
- Cloud computing, virtualization and containers experience – Docker, K8S and more
- Networking knowledge- Load balancers, firewalls, VPNs, TCP/IP – troubleshooting, performance tuning
- Experience with hardware and storage architecture, Web/Application servers – Apache, Nginx, and so on
- Hands on experience administering and supporting high scale Production workloads
- Everything as code approach – 3 or more years of relevant work experience, including Linux systems and programming with one of languages like PowerShell, Python, Bash
- English language skills, Czech/Slovak language is also welcome
- Monitoring systems and SLA tracking
- Participate in the 24/7 on-call shifts (If needed. We have a bonus system in place if 24/7 shifts are called for)
Advantages (but not needed to start work) are:
- Experienced with OTT Cloud TV
- Experience with monitoring production workloads using cloud and open source tools (Grafana, Prometheus, Kibana)
- Experience and understanding of security and networking of production environments
- Experience with supporting open source tools such as RabbitMQ, Elastic search and Couchbase and such
- Understanding of current web and internet technologies like Apache, Tomcat, Nginx, DNS, Databases and so on
- Experience with managing large scale infrastructures with code – experience with tools like Ansible, Terraform, CloudFormation and such
- Ability to read, understand and debug programming languages (.NET, LUA)
- Experience with setting up and maintenance of CDN
This is a long-term contract position, so we are looking for people willing and able to work on a freelance agreement. The position is available immediately, but we can be flexible with start dates if needed. You will need to be based in the Czech Republic or very close by, to attend occasional on-site meetings but most of the work will be on a work from home basis.