We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) with expertise in Microsoft Azure, Azure DevOps, and Terraform to join our dynamic IT team. The Senior SRE will play a key role in ensuring the reliability, scalability, and performance of our cloud infrastructure and applications hosted on the Azure platform. The ideal candidate will have a deep understanding of cloud technologies, infrastructure as code principles, and DevOps practices.
Infrastructure Automation and Orchestration: Design, implement, and maintain infrastructure as code (IaC) solutions using Terraform for provisioning and managing Azure resources. Develop automation scripts and workflows to streamline deployment, configuration, and monitoring tasks.
Azure DevOps Implementation: Establish and optimize CI/CD pipelines using Azure DevOps to automate build, test, and deployment processes for cloud-native applications and services. Collaborate with development teams to integrate DevOps practices into the software development lifecycle.
Monitoring and Alerting: Implement robust monitoring and alerting solutions using Azure monitoring tools and third-party solutions to proactively identify and address performance issues, system failures, and security threats. Configure dashboards, alerts, and metrics to ensure high availability and performance.
Incident Response and Troubleshooting: Lead incident response efforts to quickly identify and resolve system outages, performance degradation, and other operational issues. Conduct post-incident reviews to identify root causes and implement preventive measures.
Capacity Planning and Optimization: Monitor resource utilization, performance metrics, and cost trends to forecast capacity requirements and optimize resource allocation in Azure environments. Implement scaling strategies to accommodate growth and maintain performance.
Security and Compliance: Implement security best practices and compliance standards for Azure environments, including identity and access management, network security, encryption, and data protection. Conduct regular security assessments and audits to ensure compliance with regulatory requirements.
- Continuous Growth:Stay up-to-date on emerging technologies and industry trends related to cloud computing and infrastructure automation. HGV provides several options for e-learning, live instruction, and certifications to support your career growth.
Technical Leadership and Mentorship: Provide technical leadership and guidance to junior members of the SRE team. Mentor team members on best practices, tools, and technologies related to Azure, DevOps, and infrastructure as code.
- 5 years of proven experience as a Site Reliability Engineer or similar role, with a focus on Azure (preferred), AWS, and/or GCP cloud environments.Expertise in Azure cloud services, including compute, storage, networking, and security.
- Proficiency in infrastructure as code (IaC) tools such as Terraform for provisioning and managing cloud resources.
- Hands-on experience with Azure DevOps for CI/CD pipelines, release management, and version control.
- Strong understanding of monitoring and alerting tools such as Azure Monitor, Datadog, Pager Duty, or similar.
- Experience with incident response, troubleshooting, and performance optimization in distributed systems.
- Solid grasp of cloud security best practices, compliance standards, and data protection mechanisms.
- Strong scripting and automation skills, preferably in PowerShell, Terraform, Ansible or Python.
- Strong knowledge of software development and programming languages such as Python, Ruby, Powershell, or Bash scripting.
- Proven experience with containerization technologies (e.g., Docker, Kubernetes).
- Excellent problem-solving skills and the ability to troubleshoot complex technical issues.
- Strong communication and collaboration skills, with the ability to work effectively in a team environment.
- Excellent communication, collaboration, and leadership skills, with the ability to work effectively in a cross-functional team environment.
- Experience with DevOps and Agile practices as part of a scrum team.
We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) with expertise in Microsoft Azure, Azure DevOps, and Terraform to join our dynamic IT team. The Senior SRE will play a key role in ensuring the reliability, scalability, and performance of our cloud infrastructure and applications hosted on the Azure platform. The ideal candidate will have a deep understanding of cloud technologies, infrastructure as code principles, and DevOps practices. Infrastructure Automation and Orchestration: Design, implement, and maintain infrastructure as code (Ia. C) solutions using Terraform for provisioning and managing Azure resources. Develop automation scripts and workflows to streamline deployment, configuration, and monitoring tasks. Azure DevOps Implementation: Establish and optimize CI/ CD pipelines using Azure DevOps to automate build, test, and deployment processes for cloud-native applications and services. Collaborate with development teams to integrate DevOps practices into the software development lifecycle. Monitoring and Alerting: Implement robust monitoring and alerting solutions using Azure monitoring tools and third-party solutions to proactively identify and address performance issues, system failures, and security threats. Configure dashboards, alerts, and metrics to ensure high availability and performance. Incident Response and Troubleshooting: Lead incident response efforts to quickly identify and resolve system outages, performance degradation, and other operational issues. Conduct post-incident reviews to identify root causes and implement preventive measures. Capacity Planning and Optimization: Monitor resource utilization, performance metrics, and cost trends to forecast capacity requirements and optimize resource allocation in Azure environments. Implement scaling strategies to accommodate growth and maintain performance. Security and Compliance: Implement security best practices and compliance standards for Azure environments, including identity and access management, network security, encryption, and data protection. Conduct regular security assessments and audits to ensure compliance with regulatory requirements. Continuous Growth:Stay up-to-date on emerging technologies and industry trends related to cloud computing and infrastructure automation. HGV provides several options for e-learning, live instruction, and certifications to support your career growth. Technical Leadership and Mentorship: Provide technical leadership and guidance to junior members of the SRE team. Mentor team members on best practices, tools, and technologies related to Azure, DevOps, and infrastructure as code . years of proven experience as a Site Reliability Engineer or similar role, with a focus on Azure (preferred), AWS, and/or GCP cloud environments. Expertise in Azure cloud services, including compute, storage, networking, and security. Proficiency in infrastructure as code (Ia. C) tools such as Terraform for provisioning and managing cloud resources. Hands-on experience with Azure DevOps for CI/ CD pipelines, release management, and version control. Strong understanding of monitoring and alerting tools such as Azure Monitor, Datadog, Pager Duty, or similar. Experience with incident response, troubleshooting, and performance optimization in distributed systems. Solid grasp of cloud security best practices, compliance standards, and data protection mechanisms. Strong scripting and automation skills, preferably in PowerShell, Terraform, Ansible or Python. Strong knowledge of software development and programming languages such as Python, Ruby, Powershell, or Bash scripting. Proven experience with containerization technologies (e.g., Docker, Kubernetes). Excellent problem-solving skills and the ability to troubleshoot complex technical issues. Strong communication and collaboration skills, with the ability to work effectively in a team environment. Excellent communication, collaboration, and leadership skills, with the ability to work effectively in a cross-functional team environment. Experience with DevOps and Agile practices as part of a scrum team.
search terms: Liability+Reliability Engineer