This is a direct hire opportunity for a global O&G company.
Responsible for the reliability and uptime appropriate to users’ needs of the DELFI cloud solutions and services. Site Reliability Engineering is a discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Site Reliability Engineers are also responsible of engaging in and improving the whole lifecycle of services from inception and design, through deployment, operation and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain and improve services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Gauges the effectiveness and efficiency of existing systems and infrastructure; implements strategies for improving or further leveraging these systems within a geoscience workflow
- Effectively uses the service management systems, ensuring that best practices and lessons learned are made available to wider technical community
- Engaged in incident response and blameless postmortems.
- Kubernetes and Docker: 1+ years experience with Kubernetes and Docker infrastructure management and deployment. With strong knowledge in container and virtualization technologies
- Google Cloud Platform: GAE, GCE, GCS and mainly Google Cloud Deployment Manager) to develop and maintain a Google based cloud solution.
- If you do not have experience in GCP but have strong experience in other Cloud Platform like AWS or Azure, your experience in K8S will be a strong criteria for his adaptability to GCP
- Understanding network topologies and common network protocols and services (DNS, HTTP(S), SSH, FTP, SMTP).
- Experience as a DevOps Engineer in Cloud environment, automation experience with configuration management tools
- Good experience with tools for example Jenkins, Puppet, Chef, Ansible
- Strong scripting (e.g. Python, bash) and automation skills.
- Linux system administration and Experience with system monitoring tools: Stackdriver preferred
- Programming experience in a high level language
Previous Experience and Competencies:
- Bachelor’s degree in IT related discipline
- 10+ years seniority preferred