Sr. Site Reliability Engineer- Cloud GRC Automation Series A Start-Up - Remote
We are looking for a Sr. Site Reliability Engineer who will be responsible for ensuring system integrity, availability, and Non-Functional Requirements including performance for designing and implementing the defined tactics for CI and CD platform operations.
Hyper Growth Series A Firm with strong recent funding which has an enterprise cloud security solution which automates cloud governance, protecting enterprise data, controlling risk, and accelerating success in the cloud.
The business is backed by a VC group which have had a prolific track record of success in the cyber security market.
● Ensuring holistic system health across the web frontend, API services, and backend services
● Lead processes to promote reliability on Kubernetes and AWS infrastructure
● Create Monitoring and Observability platforms to identify SLA / SLO metrics for platform health and customer usage
● Design and collaborate with development to aggregate user and session logs, and infrastructure health logs
● Design and implement application and infrastructure logging for monitoring, operating and debugging
● Collaborate with engineering to identify necessary logging workflows, log levels, and end-to-end user interaction monitoring (RUM)
● Disaster recovery and replication strategy to provide zero downtime failovers
● Establish Authentication and Authorization using IAM principals and OIDC
● Generate RBAC Security model using principle of least privilege
● Multi-AZ deployment, disaster recovery, replication strategy
● Security posture; container scanning, vulnerability remediation, WAF and SIEM configuration
This is an exciting opportunity to join a growth start-up in a hyper growth market.