Ashwini Gaddagi
DevOps | SRE | Platform Engineering
π§ ashwinigaddagiwork@gmail.com | π ashwiniag.com
π +91 8095259645
π LinkedIn | GitHub
π Creator & maintainer of gokakashi
Experience
πSite Reliability Engineer at Hasura.io (Feb 2023 - Present)
Platform Reliability & On-Call Ownership
- Led on-call for multi-tenant, multi-cloud infrastructure; owned incident response, debugging, RCA, and driving long-term fixes.
- Reduced alert noise from ~400/day β single digits, shifting to actionable, cause-based alerting.
- Defined and improved SLIs/SLOs for tenant-facing systems, improving uptime and reducing operator fatigue.
Release Engineering & CI/CD
- Owned end-to-end release management for microservices across AWS/GCP/Azure multi-tenant tiers (free/paid/enterprise).
- Rebuilt CI/CD pipelines (ArgoCD / GitHub Actions), improving deployment reliability and enabling fully self-serve releases for teams.
- Automated repetitive release and infra tasks (cert renewals, cluster setup scripts, migration tooling), cutting hours of manual work to minutes.
Security Engineering & Automation
Built container vulnerability management from scratch and open-sourced it as goKakashi:(something I'm proud about)
- Automated image scans across different registries providers with Linear ticketing integration and more.
- Delivered 10x cost reduction (from $10K/year β $420/year) and ~1 month β 10 minutes developer security setup time.
- Reduced MTTD to 8 hours, resolving 60+ vulnerabilities/month.
Implemented secure release workflows, improving compliance posture for enterprise onboarding (SOC-2 focused).
Infrastructure, Performance & Cost Optimization
- Improved autoscaling reliability (connection-based ASG tuning, CPU-signal scaling, tenant isolation), reducing multi-tenant performance issues.
- Enhanced observability (Grafana/Honeycomb dashboards) for gateway traffic, eventing, and region-wise tenant health.
Optimized cloud infra costs:
- 31% savings on RDS, 33% on Redis via reserved instances and right-sizing.
- Removed unused limits and reclaimed orphaned infrastructure across clusters.
Engineering Productivity & Internal Tooling
- Built internal automation: jumpbox setup tooling, dotfiles system, multi-cluster K8s configuration CLI, and mass tenant-migration automation.
- Enabled faster onboarding and reduced tribal knowledge through SOPs, runbooks, and internal tooling.
Customer & Cross-Functional Collaboration
- Supported enterprise customers with onboarding, debugging complex infra issues, feature fix and support.
- Collaborated across infra, backend, product, and support teams to drive release quality, and operational stability.
Conducted culture-fit interviews.
Site Reliability Engineer at Last9.io (Aug 2021 - Sept 2022):
- Restructured and automated customer onboarding, by launching relevant kubernetes and AWS resources, handling deployments via run deck.
- Re-architect major components of the system to make the overall system more reliable and reduced the operational costs by 40-50%.
- Responsible for setting up and maintaining deployment process (CI/CD) with GitHub workflows and self hosted runners
- Setup pipeline to build and publish docker images using Github actions
- Setup cloudstream+kinesis to send data from cloudwatch in Opentelemetry format
- Contributed writing ingester modules for querying data from cloudwatch.
- Responsible for making system observability intuitive and effective. By setting up availability and latency SLOs on critical components of the our system and clientβs system. And determining right SLOs, SLIs and right/level type of alerting
- Maintaining the reliability of the overall system by exposing custom performance metrics and setting up observability and alerting on Grafana and Last9 tool.
- Responsible for ensuring uptime, health, performance and reliability of the Last9 systems.
- Responsible for managing cloud infrastructure provisioning and setup via Terraform and Ansible on Kubernetes EKS platform.
- Handling AWS cost-optimisation
- Shared responsibility SOC2
DevOps|Systems and Network Engineering at CleverTap.com (May 2020) :
As an SNE, the responsibilities broadly included team communication and identifying bottlenecks, and solving problems for business and developers while adhering to security and uptime of the system.
- Team lead:
- Led a team of 13 interns for writing terraform providers using Golang for SaaS applications like Zoom, Expensify, Mongodb, etc. The team was new to industry experience, and thus, heavily involved in mentoring them on git, documentation, worksheets for audit and internal knowledge, code reviews, linting code and adhering to standard practices, Authentication, unit and acceptance testing.
- Mentored a team of 8 interns to clear AWS certifications including Cloud-practitioner, Sysops, and DevOps Engineer-professional. Built a learning curriculum, and designed strategy to keep the team on track.
- Agile/Scrum: Planned sprints, automated Jira workflows including change management, time tracking and sprint reports for review.
- CI/CD: Setup Bamboo pipelines for few infra such as AMI provisioning, docker deployments prometheus, sensu.
- Provisioning: Cloudformation templates using Go Formation, Terraform, Packer for AMIs (AMD and ARM), Docker for services and shell scripts for bootstrapping ec2.
- Security:
- Owned Soc-2 type-2 audit, and assisted information security team for audit related queries.
- Partially automated security compliance of the overall platform
- Role based access control (RBAC) on 60+ services such as Atlassian Crowd, IAM on AWS, Expensify, Outreach, Slack, etc. via Redhat SSO (Keycloak) using SAML 2.0
- Cloudflare login, firewall rules and authentication for access control to applications.
- AWS SSM controlling access to ec2 instances in multiple regions.
- Automated provisioning and de-provisioning users/employees via terraform providers with terraform cloud.
- Lifecycle of AMI including vulnerability management using Tenable, CIS Amazon Linux Benchmark, and audit friendly documentation of relevant changes.
- Operations:
- Sharding of internal database engine built on top of Mongodb, Data migration of terabytes, ad-hoc tasks involving EC2, Auto-scaling, ECS, Fargate, SSM automation documents, Run commands, etc.
- Implemented processes organisation wide, documented Standard Operation Procedures on Confluence and automated Jira workflow.
- Monitoring and alerting:
- Extensively worked on Pagerduty and OpsGenie alerts setup, from Sensu.
- Worked on Docker based setup for Sensu, Prometheus.
- On-call: Troubleshooting production issues, building the relevant toolset and runbooks for debugging, writing structured RCAs, and communicating timely to teams.
- DevOps at Vayana Network (Aug 2019) :
- Extensively worked terraform to provision infrastructure and used shell scripts to auto-generate few terraform files, bootstrapping etc.
- Production grade container deployment using ECS Fargate on multiple micro-services
- Deep dive into AWS IAM for granular control aligned with Indian fintech compliance rules
- Cross region VPC peering, creating new VPCs, subnets, NAT, etc. for desired micro-services.
- Codebuild to build artefacts from bitbucket (git), building docker image to store in ECR and deploying it into fargate.
- Automated RDS creation using terraform with deep understanding of networking layer involved.
- Netowork engineer at NESS Technologies (Feb 2019) :
- Solve technical issues and architect solutions of ITNM (IBM Tivoli network manager) to clients.
- Network engineer at Trimax (Jan 2018) :
- Configure, maintain and troubleshoot static and default routing.
- Handling issues related to Cisco switch - 2960, 3550 and Cisco Routers -1800,2800, WAN links.
- Developed bangaloreruralnic.in (uses wordpress and custom theme)
- Developed feedback module for digital NIC using Cake php and Postgres.
- Software ownership of entire network system of Karnataka Elections '18 of NIC.
- Designing and maintaining smooth video conferences of officials.
- Management of support issues and delegating to team.
- Ownership of planning network layer and digital security of elections.
- Design, maintenance, management of election process at a polling booth.
- Responsible for SLO and SLA of end to end election process at polling booth. The intense process involves training of voters, reading ballot paper, counting number of votes manually and digitally, keeping the result safe and ensuring no errors, releasing it to media after confirmation of election result.
Education:
- Computer Science and Engg. from KNS Institute of Technology, 2017.
Further Activities:
- Ex-Theatre artist | Poet | Avid reader | Travel | Boardgames