Hybrid search without hand-waving
How I would structure Azure AI Search for real products: BM25, vector retrieval, semantic ranker, filters, scoring profiles, and evaluation loops that keep relevance honest.
Read the search blueprintFounding engineer building AI products across voice agents, evaluation systems, semantic search, and multi-tenant SaaS. I care about the part where models become reliable products. Currently leading engineering at HyrecruitAI.
A tighter set of technical essays and case-study notes: search, recommendations, self-hosted systems, and relevance loops that show how I think through products end to end.
How I would structure Azure AI Search for real products: BM25, vector retrieval, semantic ranker, filters, scoring profiles, and evaluation loops that keep relevance honest.
Read the search blueprintA practical map of a self-hosted media automation stack: Jellyfin, Jellyseerr, Sonarr, Radarr, indexers, download clients, naming rules, and failure boundaries.
Read the stack notesA case-study style breakdown of Flash, a talent search and recommendation platform built around candidate signals, recruiter workflows, ranking, and scale.
Open the case studyWhy search quality is not just a model choice. The product needs feedback capture, judgment labels, exploration, offline evals, and a way to improve without surprising users.
Read the relevance essayPosts on the parts of shipping AI that don't fit in a tweet — guardrails, voice pipelines, semantic caching, LLM evals, embeddings, and the multi-tenant SaaS plumbing.
Building real-time intelligent systems day to day — voice agents that hold a coherent interview turn-by-turn under 800ms, LLM evaluation engines with grounded rubrics, and the guardrail layers that keep them safe in production.
This quarter the focus is evaluation infrastructure: moving past single-prompt scoring into multi-pass rubric pipelines, then measuring the consistency gap against human raters. In parallel, rebuilding the semantic caching layer to push LLM inference costs down without bleeding quality.
On the side I am writing more about the unglamorous parts of shipping AI — guardrails, embeddings precision, voice latency budgets, and the multi-tenant SaaS plumbing that lets these systems run for thousands of paying users. New posts land on the blog most weeks.
Reading: Designing Data-Intensive Applications (re-read, this time with vector databases in mind) and the OpenAI Realtime API release notes — voice latency is the next ceiling worth pushing.
Open to senior AI / product engineering conversations, deep technical collaborations, talks, and writing opportunities.
Strongest fit: teams building real-time agents, LLM evaluation systems, AI guardrails, or developer-facing AI infrastructure where product quality and systems judgment both matter.