Projects | Kim Shi Tong

Multi-Agent Chatbot for Cloud Infrastructure

Product Manager · Tencent · Dec 2025 – May 2026

166 → 5 minSRE diagnosis time (97%↓)

75%Scope cut to ship

50%Faster pilot launch

Problem Brief

Site reliability engineers spent hours manually tracing errors across distributed cloud systems — a bottleneck confirmed across 10+ enterprise customer visits. The diagnosis process was manual, slow, and dependent on tribal knowledge.

Solution Brief

A multi-agent AI assistant for root-cause diagnosis, unifying 10+ agent entry points behind a single chat interface with progressive context loading. Shipped concept → pilot under a tight deadline by cutting scope to the top 3 validated pain points.

Scope — What I Owned

0→1 product definition as one of two PMs; secured client buy-in before any build, validated via customer discovery + competitive benchmarking across 8+ AIOps products.
Defined the multi-agent architecture and the prioritization call to cut 75% of planned scope to ship the pilot 50% faster.
Coordinated cross-dependencies across 5 private-cloud teams; replaced long spec reviews with rapid prototyping to compress alignment.
Shipped a self-evolution mechanism (guided authoring, LLM-judge scoring) so non-technical SREs could improve agent accuracy independently.

Client identity and internal architecture omitted for confidentiality. Metrics are pilot-trial results.

AI Article Generator + IoT Dashboard

Technical Product Lead · Smart Air (B-Corp, 17+ countries) · Jan 2024 – Dec 2025

80%Faster content production

300 → 1,800Cities covered

$1M+Contract value

67%Client energy savings

Problem Brief

A lean B2B startup couldn't scale content or product surface area: the marketing team spent 40% of its time on manual article updates, and B2B clients had unmet needs (mould-risk detection, system automation) no competitor addressed.

Solution Brief

A 0→1 RAG article generator with hallucination guardrails, plus client-driven IoT dashboard features. Built the content pipeline and guardrails myself before mature AI tooling existed, then scaled the product portfolio as sole technical product lead.

Scope — What I Owned

Scaled the portfolio from 5 to 8 software products for 20+ B2B clients (1M+ monthly users) as the sole technical product lead.
Shipped the 0→1 RAG generator: built scrapers, pipeline, and double-check verification + inline citation guardrails for production quality.
Turned B2B customer discovery into shipped features — mould-risk detection and on/off automation — driving adoption and contract value.
Took over full-stack technical ownership after the tech lead left — managed frontend, backend, VMs, DNS, and on-call for 8 products.
Rebuilt infrastructure: migrated from systemd to Docker + Traefik, set up Restic backup pipelines, hardened with Cloudflare WAF + fail2ban.
Raised production uptime from ~80% to 99% through monitoring, alerting, and incident response.

Client names and proprietary internals omitted for confidentiality.

Agent-Native Business Simulation Platform

Product Engineer · NUS ISEM (Faculty-Led Project) · Oct 2025 – May 2026

2Game engines shipped

MCPAgent-native protocol

Problem Brief

Traditional business simulations required web UIs that added friction for AI agent interaction. Faculty needed a platform where students' AI agents could compete directly in economic games without navigating browser interfaces.

Solution Brief

An agent-native simulation platform built on MCP (Model Context Protocol), enabling AI agents to interact with game engines through tool calls rather than web interfaces. Shipped with pluggable game engines.

Scope — What I Owned

Built FastAPI backend with pluggable game engine architecture — new games require only 3 files (params, actions, engine).
Integrated MCP for native AI agent interaction; agents call tools directly instead of parsing web pages.
Shipped 2 game engines: Retailer (pricing strategy under uncertainty) and 2048 (grid puzzle with stochastic spawns).

Architecture — Agent-Native Game Platform

AI Grading System

AI Product Engineer · NUS Computing · May 2025 – Dec 2025

70%Grading time reduced

82%Accuracy (w/ human review)

0Eng dependency for users

Problem Brief

Teaching assistants for a core module spent excessive time grading 6,600+ submissions a year across 600 students. The goal was to reduce grading time without removing human judgment.

Solution Brief

A RAG grading pipeline with a human-in-the-loop review-and-override workflow. The first version over-scoped and required engineers to tune anything; I made the call to rebuild around a simple prompt-engineering dashboard that non-technical professors and TAs could operate themselves.

Scope — What I Owned

Led 0→1 product definition and the RAG pipeline build with a small team.
Made the hard call to rebuild after months of work once the first version proved unusable for non-technical staff.
Designed the key trade-off: 82% accuracy + TA override, rather than chasing 95% — because the goal was saving time, not replacing TAs.

Architecture — Human-in-the-Loop Grading Pipeline

AI Data Processing App

Software Engineer · Trivi Data (Data Consultancy) · May 2023 – Aug 2023

Automated data cleaning for consultants — user-interviewed, then built.

Problem Brief

Data consultants spent hours on manual data cleaning — handling missing values and duplicates by hand — a bottleneck surfaced through direct user interviews.

Solution Brief

An AI-assisted data processing app that automated the most repetitive cleaning steps, identified and prioritized after interviewing the consultants who lived the problem daily.

Scope — What I Owned

Ran user interviews with data consultants to locate the real bottleneck before writing code.
Built the data-cleaning automation that handled missing values and duplicate detection.

Early-career engineering role. Client details omitted for confidentiality.