Amazon - Software Development Engineer
- Present
- Architected a unified notification platform from scratch (AWS Step Functions), consolidating two legacy services — reduced new notification type onboarding from 1 month to 1 week.
- Root-caused recurring Sev-2 incidents (4 months, 23+ incidents) in distributed stream processing, then migrated to event-driven serverless architecture — zero incidents post-migration.
- Diagnosed an org-wide caching race condition causing 9,000+ errors/hour — fixed in one day, improved availability from 97.45% to 99.999% across all regions.
- Reduced a cross-region compliance migration outage window from 6 hours to 15 minutes by designing a pre-staged deployment strategy across 3 teams in 3 time zones.
- Discovered years of silent message loss through unmonitored dead-letter queues during a service migration — triggered a team-wide observability audit.
- Investigated a 50% metric drop, traced through the full analytics pipeline to identify bot accounts generating 87% of traffic — proved no customer regression, delivered permanent fix.