Work Experience

Notion

Staff Software Engineer, Infrastructure & Product Security · - Now
  • Product Security since 2024-11: Led the design and implementation of critical encryption infrastructure that enabled product adoption by world-class customers. (reference)
  • Multi-region infra: Unblocked millions of ARR growth by designing and implementing the core infrastructure enabling multi-region scaling and GDPR compliance support. (reference)
  • Datastore SLA: Established and achieved 99.99% reliability metrics across the sharded PostgreSQL by DB connection proxy isolation, dynamic connection pool tuning, Postgres query/index optimization.
  • Data Migration: Enabled 31 engineering projects to transform billions of Postgres records in the first year safely and reliably by building and deploying a data migration system.

Peloton Interactive

Site Reliability Engineering Manager · -
  • Hands-on TLM: Managed enterprise SRE team of 7 direct reports and 2 offshore teams supporting AWS cloud engineering.
  • Assisted with Amazon B2B platform launch that boosted stock price by 20.36%. SAP ERP system integration. Experienced with SOX and GDPR compliance in pre-IPO organization.
  • Established company-wide standards on S3/SFTP data ingestion and HTTP custom headers.
  • Established org-wide prod/dev data separation practice and team SLA with intake process.
Senior Data Engineer (Data Infra Pod Lead) · -
  • Built ETL and data lake streaming solution with: Kubernetes/EMR/Airflow/Hudi/Spark/DBT/data catalog.
  • Airflow Kubernetes Migration: Migrated 200+ ETL jobs / data pipelines to self-hosted Airflow on Kubernetes with 34 developers onboarded without business interruption.
Senior Software Engineer, Infrastructure (Datastore & Observability) · -
  • Kubernetes Migration Observability Leader: Migrated all backend services from EC2/ECS to Kubernetes without downtime. Worked with 50+ engineers to ensure metric/trace/sentry/logging pipeline/Datadog mcrouter agent consistency.
  • ORM Storage Migration: Led 4 people to migrate ORM's underlying storage from Redis to 66 Postgres tables without downtime.

Facebook

Production Engineer (Messaging Infrastructure) · -

Messenger Messaging Infrastructure: Responsible for monitoring and live troubleshooting the world-class highly scalable and reliable pub/sub system powering Messenger, encompassing storage and caching, real-time message routing and delivery, and messaging endpoints for browsers and mobile devices.

Uber

Software Engineer, Infrastructure (Observability) · -
  • Part of Uber's observability infra group that powered metrics ingestion/storage/monitoring/alerting.
  • Built Uber's "Datadog/Pagerduty" that supports 100k alerts with multi-region setup.
  • Migrated the first microservice from Uber to AWS (DNS/load balancer/tcpdump ngrep Wireshark/CSRF attack/OneLogin OAuth SSO integration).
LinkedIn
Site Reliability Engineer Intern · -

Streamed Linux CPU performance profile FlameGraph to frontend UI.

Jiepang.com / Guohe Ad
Software Engineer, Infrastructure / Lead Database Admin · -
  • Migrated 50M photo metadata from MongoDB to MySQL with 0 downtime.
  • Migrated 50M photo content with a high-performance file replication service in C (uses Linux syscall inotify and epoll).
Nomura Research Institute
System Engineer · -

Managed 7-Eleven IT infra: Apache, DNS, DHCP, Squid, Mail, Sphinx.