APPLIED AI ENGINEER // LLM SYSTEMS · RAG · EVALUATION

AI agents built with
production discipline.

Most agents just sound right. I build the ones that are right — guardrails in the tool layer, human approval where it matters, and evals that grade the database, not just the wording.

6+YRS // PRODUCTION AI

AGENTSTOOLS · GUARDRAILS · HITL

EVALSDETERMINISTIC + JUDGE

RAGHYBRID RETRIEVAL · NL2SQL

View Work

// Profile

Six years building AI that ships — banking NL2SQL, then GenAI for regulated finance, now agents I run end to end. The model is the easy part. I sweat the rest: grounding, tool boundaries, evals, and owning the system after launch.

Selected work

// click any card for the brief

One project shows the reliability pattern companies need. One proves I can ship a real product end to end.

Production Patterns · Google ADK

RetailOps — a support agent that obeys policy and proves it

Live on Cloud Run. It won't leak internal data, grounds policy answers in Vertex AI RAG, and pauses large refunds for human approval. Don't take my word for it — ask it something. →

Guardrailsenforced in tools, not prompts

HITLhuman approval for large refunds

Evalsgraded on database state + LLM judge

Google ADKGCPVertex AI RAGHITL approvalTool guardrailsDB-state evals

Shared sandbox — data may reflect other visitors and resets periodically.

live · talking to the deployed agent

Live Product

Jupiter Career Agent

A live career agent over 10,000+ jobs — intent routing, hybrid retrieval, tiered model routing. Built as a real product, not a notebook.

LangGraphFastAPIMilvusBM25Cohere RerankRedis

Live · jupiterpath.dev ↗

Teaching · RAG & Fine-tuning

RAG & Fine-tuning, taught hands-on

My YouTube channel — code-first walkthroughs of RAG, fine-tuning, and evaluation. Topics include prompt distillation from a 120B teacher into a 30B student model, local RAG stacks, and practical ways to assess whether an LLM works.

RAGFine-tuningUnsloth · TinkerLoRA · DPODistillationLLM evaluation

Watch ↗

Track record

// the depth behind the work

// Background

Six years, two phases — ~4.5 in industry, ~2 building independently. Today: live AI products, most recently Jupiter, alongside hands-on technical education and collaboration on fine-tuning and agent workflows.

Before that, ~4.5 years as NLP lead at Aunalytics, a data & analytics company — GenAI for regulated community banking: Text-to-SQL, domain fine-tuning, tool routing, retrieval, shipped where compliance wasn't optional. M.A. Statistics, Columbia · ECML PKDD 2021 (Springer LNCS).

What I build

// for product teams

Agent Reliability Audit

I stress-test your agent for the ways they actually fail — tool misuse, policy leaks, fabricated confirmations — and hand back a prioritized fix plan.

Workflow Agent Build

A focused agent around one real business workflow — support, retail, internal ops, analytics — with tool boundaries and human approval where it matters.

Eval Harness + Guardrails

Test cases, verifier facts, tool-path checks, LLM-judge rubrics, and state-based outcomes — so agent failures become measurable instead of vibes.

I teach this too

// @LLMImplementation

Hands-on LLM engineering — RAG, agents, fine-tuning, evaluation

Fine-tuning

Prompt Distillation: Fine-tuning a 30B Model

Fine-tuning walkthrough

Deployment

Deploy Google ADK Agents to Vertex AI & Cloud Run

Deployment walkthrough

Agents

Local AI Agent with LangGraph + Ollama (Full Tutorial)

Agent workflow walkthrough

LET'S TALK

Let's build something reliable.

Whether you're shipping an agent, auditing one, or just want to compare notes on doing it right — I'm happy to talk.

shane.xia.work@gmail.com LinkedIn ↗