Senior AI Engineer · LLM Agents · RAG · NLP

I build production AI
for enterprise.

Senior AI Engineer with 6+ years building production AI systems from zero to one. At Aunalytics, created the Text-to-SQL training pipeline, Support RAG system, and Elasticsearch autocomplete from scratch; led BERT → GPT-4 migration (~20% accuracy gain, zero production hotfixes). Now building agentic systems independently — Jupiter is live. YouTube @LLMImplementation (2K+ subscribers, 35+ videos). Columbia University, M.A. Statistics.

YouTube LinkedIn GitHub Email Jupiter

01 — About

Engineer, not evangelist.

I'm an AI engineer who builds production AI systems from zero to one — not demos, not wrappers. My career has been spent at the intersection of NLP, data science, and enterprise software, building tools that real users depend on daily.

At Aunalytics, I created the Text-to-SQL training pipeline, Support RAG system, and Elasticsearch autocomplete from scratch; led BERT → GPT-4 migration (~20% accuracy gain, zero production hotfixes). Mentored junior data scientists and partnered cross-functionally with Product/DevOps/QA.

Now building agentic systems independently — Jupiter (jupiterpath.dev) is live. I also publish hands-on walkthroughs on YouTube @LLMImplementation (2K+ subscribers, 35+ videos) and fine-tune/align LLMs using LoRA, DPO, and prompt distillation.

Years NLP experience

35+

Tutorials published

2K+

YouTube subscribers

98%

Intent classification accuracy

02 — Experience

Where I've built.

Jul 2024 – Present

Senior AI Engineer | Remote

Independent · YouTube @LLMImplementation

Jupiter Career Agent (jupiterpath.dev) — Live production app: Built LangGraph agentic system with intent-based routing and hybrid RAG retrieval (Milvus + BM25 + Cohere Rerank). Stack: FastAPI, PostgreSQL, Milvus, Redis.
Tiered model routing: lightweight models for intent classification, reasoning models for synthesis. Indexed 5,300+ jobs with semantic + keyword hybrid search. 98% intent classification accuracy (N=120).
Rolling out premium features: Text-to-SQL, web search (Tavily), data analysis dashboards, multi-step planning (Planner → Executor → Reflector → Composer), and artifact generation pipeline.
Technical Content & Consulting:
Fine-tuned and aligned LLMs using LoRA, DPO, and prompt distillation (120B → 30B for $0.62). Published hands-on walkthroughs (2K+ subs, 35+ videos; top tutorial: 16K+ views).
Built the channel using an AI-native content pipeline: LLMs for transcript generation and video structure, TTS voice cloning from a single voice sample, and human-in-the-loop review for technical accuracy.
Delivered paid industry collaboration building end-to-end SFT + RL fine-tuning demos using partner SDK.

Jan 2022 – Jul 2024

Data Scientist, NLP Tech Lead

Aunalytics · South Bend, Indiana

NLP Tech Lead for a Generative AI product serving community banks. Led migration from BERT/LSTM to LLM APIs (Codex → GPT-3.5 → GPT-4), improving SQL accuracy ~20% and reducing client onboarding from ~4 weeks to 1 sprint. Owned release lifecycle with zero hotfixes; mentored junior data scientists; partnered cross-functionally with Product/DevOps/QA.
Text-to-SQL: Vector-based schema pruning cutting token costs ~60%; business term disambiguation; compliance-first SQL validator with strict whitelisting; self-improving few-shot example store.
Financial Agent: Multi-tool agent with context-aware routing achieving >95% precision/recall, eliminating tool-hallucination blockers. Validation & retry for JSON payloads (+20% execution success).
Support RAG: Identified bottleneck via stakeholder interviews (100+ issues/week). Indexed Jira tickets + logs into Elasticsearch vector search with LLM summarization — triage reduced from hours to <1 min.

Feb 2020 – Dec 2021

Data Scientist, NLP / NL2SQL

Aunalytics · South Bend, Indiana

Co-authored "An Optimized NL2SQL System for Enterprise Data Mart" (ECML PKDD 2021, Springer LNCS). Created synthetic training data from scratch when no banking NL-to-SQL datasets existed — grew to 150K pairs.
Semantic value-matching (exact-match accuracy: ~1% → ~45%).
Re-architected autoregressive Transformer prototype into low-latency Elasticsearch autocomplete (seconds → milliseconds) for production banking clients.

03 — Selected Projects

What I've shipped.

Production · Banking

Enterprise NL2SQL Engine

Context-aware LLM-based natural language to SQL engine for community bank analysts. Vector-based schema pruning (60% token savings), business term disambiguation, compliance-first SQL validation with whitelisting, and a self-improving few-shot example store.

GPT-4PythonElasticsearchFastAPIDockerSentence-BERT

Production · Banking

Financial Agent Engine

Multi-tool analytics agent with intelligent routing for banking workflows. Context-aware tool selection (>95% Precision/Recall), JSON payload validation with retry, and dynamic context injection. Eliminating tool-hallucination blockers to client adoption.

LLM AgentsTool RoutingPythonJSON ValidationFastAPI

Live Product

Jupiter Career Agent — jupiterpath.dev

Live production agentic system with intent-based routing and hybrid RAG retrieval (Milvus + BM25 + Cohere Rerank). Tiered model routing: lightweight models for intent classification, reasoning models for synthesis. 5,300+ jobs indexed with semantic + keyword hybrid search. 98% intent classification accuracy (N=120). Rolling out premium features: Text-to-SQL, web search (Tavily), data analysis dashboards, multi-step planning, and artifact generation.

LangGraphFastAPINext.jsPostgreSQLMilvusRedisBM25Cohere RerankTavily

Alignment

DPO Safety — Llama 3.2

Compared PPO-RLHF vs DPO for safety alignment. DPO achieved 100% jailbreak refusal at ~72% lower cost. Custom eval harness with LLM-as-a-judge rubric and adversarial testing.

DPORLHFLoRAUnslothEvaluation

RAG

Local RAG Agent — Zero Cost

Fully local RAG with LangGraph + Ollama and self-correcting retrieval. EmbeddingGemma with dimension truncation (768→256). 1.3s latency — 60% faster than GPT-4o.

LangGraphOllamasqlite-vecEmbeddingGemma

Fine-tuning

$0.62 Prompt Distillation

Knowledge distillation from 120B teacher to Qwen 30B MoE student via LoRA on Tinker. Synthetic data generation + distributed training for under $1. Tutorial reached 16K+ views.

TinkerLoRAQwen 30B MoEDistillation

Evaluation

LLM Evaluation Pipeline

Perplexity diagnostics, Functional Correctness (pass@k), Semantic Similarity (embeddings), and LLM-as-a-Judge with position bias testing. Three-pillar prompt design for robust scoring.

Pythonsentence-transformersHugging FaceOpenRouter

05 — Technical Stack

What I work with.

Agents & RAG

LangGraph / LangChain
Google ADK / MCP
Tool Routing
RAG Design
LLM Evaluation
Prompt Engineering

Training

PyTorch / Hugging Face
Unsloth / Tinker
LoRA / QLoRA
SFT / DPO
4-bit Quantization
Weights & Biases

Engineering

Python / FastAPI
Docker / CI/CD
SQL / Elasticsearch
Milvus / sqlite-vec
GCP (Vertex AI, Cloud Run)
LangSmith / Langfuse

Models

GPT-4 / GPT-OSS
Gemini 2.5 / 3
Claude
Llama 3 / Qwen / DeepSeek
IBM Granite
Sentence-BERT / EmbeddingGemma

Data Science

NL2SQL / Semantic Parsing
Statistical Modeling
scikit-learn / XGBoost
Pandas / NumPy
Experiment Design
Semantic Search

Workflow

Git / GitHub
uv / pip
Google Colab
VS Code / Cline
LangGraph Studio
macOS / Linux

I build production AIfor enterprise.

LLM Implementation

I build production AI
for enterprise.