Multi-Agent Chatbot for Cloud Infrastructure
Product Manager · Tencent · Dec 2025 – May 2026
166 → 5 minSRE diagnosis time (97%↓)
75%Scope cut to ship
50%Faster pilot launch
Problem Brief
Site reliability engineers spent hours manually tracing errors across distributed cloud systems — a bottleneck confirmed across 10+ enterprise customer visits. The diagnosis process was manual, slow, and dependent on tribal knowledge.
Solution Brief
A multi-agent AI assistant for root-cause diagnosis, unifying 10+ agent entry points behind a single chat interface with progressive context loading. Shipped concept → pilot under a tight deadline by cutting scope to the top 3 validated pain points.
Scope — What I Owned
- 0→1 product definition as one of two PMs; secured client buy-in before any build, validated via customer discovery + competitive benchmarking across 8+ AIOps products.
- Defined the multi-agent architecture and the prioritization call to cut 75% of planned scope to ship the pilot 50% faster.
- Coordinated cross-dependencies across 5 private-cloud teams; replaced long spec reviews with rapid prototyping to compress alignment.
- Shipped a self-evolution mechanism (guided authoring, LLM-judge scoring) so non-technical SREs could improve agent accuracy independently.
Client identity and internal architecture omitted for confidentiality. Metrics are pilot-trial results.