Building real-time intelligent systems day to day — voice agents that hold a coherent interview turn-by-turn under 800ms, LLM evaluation engines with grounded rubrics, and the guardrail layers that keep them safe in production.
This quarter the focus is evaluation infrastructure: moving past single-prompt scoring into multi-pass rubric pipelines, then measuring the consistency gap against human raters. In parallel, rebuilding the semantic caching layer to push LLM inference costs down without bleeding quality.
On the side I am writing more about the unglamorous parts of shipping AI — guardrails, embeddings precision, voice latency budgets, and the multi-tenant SaaS plumbing that lets these systems run for thousands of paying users. New posts land on the blog most weeks.
Reading: Designing Data-Intensive Applications (re-read, this time with vector databases in mind) and the OpenAI Realtime API release notes — voice latency is the next ceiling worth pushing.
Open to senior AI / product engineering conversations, deep technical collaborations, talks, and writing opportunities.