ShipDevs supports fast-moving teams by connecting them with vetted remote professionals across technical and operational roles.
We are looking for a skilled Senior DevOps Engineer who can take ownership of complex SaaS infrastructure and keep critical products running reliably at scale. You will work across a large portfolio of SaaS products, many of which have inherited architecture, incomplete documentation, and production systems that need stronger reliability, automation, and monitoring.
This role is for someone who can quickly understand unfamiliar AWS environments, identify issues, fix urgent production problems, and build long-term solutions that prevent repeat failures. You should be comfortable working with live systems, making safe production changes, writing clear runbooks, and creating automation that works reliably in real-world conditions.
You will not only respond to alerts. You will improve how incidents are handled, write strong root cause analyses, design safer rollout plans, define rollback conditions, and build monitoring that catches problems before customers are affected. You will also use modern AI tools to speed up investigation, documentation, and automation work.
This is a hands-on engineering role, not a basic support position. You will be expected to make sound technical decisions, challenge risky changes, separate infrastructure issues from application bugs, and work with the right teams to deliver permanent fixes.
The ideal candidate has deep AWS experience, strong scripting skills, production database experience, and a proven ability to manage infrastructure for serious SaaS products.
Key Responsibilities
- Drive reliability and standardization of cloud infrastructure across a growing SaaS product portfolio.
- Implement robust monitoring, automation, and AWS best practices.
- Own infrastructure projects, incident response, RCAs, and production change requests.
- Build safe deployment and rollback processes for production environments.
- Identify infrastructure risks before they become incidents.
- Separate infrastructure-owned issues from application defects and route fixes correctly.
- Improve operational efficiency through automation, documentation, and AI-assisted workflows.
Candidate Requirements
- Deep AWS infrastructure expertise. AWS is the primary platform, so other cloud experience alone is not enough.
- Experience managing production infrastructure at the scale of hundreds of containers.
- Strong scripting experience with Python and Bash for day-to-day administration and automation.
- Experience managing and migrating production databases across multiple engines, including MySQL, PostgreSQL, Oracle, and MS SQL.
- Experience with infrastructure automation tools such as Terraform, Ansible, or CloudFormation.
- Ability to troubleshoot live production issues with confidence and structure.
- Strong understanding of monitoring, incident response, rollout safety, and root cause analysis.
- Ability to break complex infrastructure work into clear, manageable daily tasks.
- Practical experience using AI tools to accelerate investigation, documentation, and automation.
Experience Level
Senior level, typically 5+ years in DevOps, SRE, or platform infrastructure roles.
Work Type
Full-time Remote
Benefits / Why Join Us
- Work remotely with global teams and international clients
- Long-term contract opportunities with stable workloads
- Competitive compensation based on skills and experience
- Flexible and remote-first work environment
- Opportunity to work on real products and business operations
- Grow your experience with fast-moving companies and modern teams
Ready to apply for this position?
Complete the application form and share your resume. Our team will review your profile and reach out if there is a strong fit.
Apply Now