AI engineer building
reliable AI products.

Founding engineer building AI products across voice agents, evaluation systems, semantic search, and multi-tenant SaaS. I care about the part where models become reliable products. Currently leading engineering at HyrecruitAI.

See the systems I've shipped Talk shop

03 / Engineering notes

Systems worth reading into

Writing archive

A tighter set of technical essays and case-study notes: search, recommendations, self-hosted systems, and relevance loops that show how I think through products end to end.

Hybrid search without hand-waving

How I would structure Azure AI Search for real products: BM25, vector retrieval, semantic ranker, filters, scoring profiles, and evaluation loops that keep relevance honest.

Read the search blueprint

The home ARR stack

A practical map of a self-hosted media automation stack: Jellyfin, Jellyseerr, Sonarr, Radarr, indexers, download clients, naming rules, and failure boundaries.

Read the stack notes

Flash: talent search at scale

A case-study style breakdown of Flash, a talent search and recommendation platform built around candidate signals, recruiter workflows, ranking, and scale.

Open the case study

Ranking loops that learn

Why search quality is not just a model choice. The product needs feedback capture, judgment labels, exploration, offline evals, and a way to improve without surprising users.

Read the relevance essay

04 · Writing

Selected writing

Full archive

Posts on the parts of shipping AI that don't fit in a tweet — guardrails, voice pipelines, semantic caching, LLM evals, embeddings, and the multi-tenant SaaS plumbing.

vaaniflow: The Open-Source Dictation Platform for Windows

How VaaniFlow turns a global shortcut, chunked speech recognition, deterministic text processing, personal vocabulary, and Windows text injection into a system-wide dictation workflow.

14m

Hybrid Search on Azure AI Search: Retrieval, RRF, and Relevance Debugging

A practical Azure AI Search implementation guide covering index design, lexical and vector retrieval, filters, reciprocal rank fusion, semantic ranking, and evaluation.

12m

The Home ARR Lab: Operating Jellyfin, Seerr, Sonarr, and Radarr

A systems guide to a lawful self-hosted media workflow: request state, container path identity, atomic imports, permissions, transcoding capacity, observability, and recovery.

11m

Flash: Designing Talent Search as a Decision System

A domain-first architecture for talent search: evidence-backed candidate profiles, hard eligibility rules, multi-stage retrieval, explainable ranking, and recruiter feedback.

11m

Ranking Loops That Learn Without Fooling Themselves

How to improve search and recommendations with judged queries, impression logging, debiased behavior signals, exploration, online experiments, and rollback.

11m

Building a Real-Time AI Interviewer: Turn-Taking, Barge-In, and Latency

A systems design for browser voice agents: media transport, voice activity detection, revisable transcripts, conversational state, interruption, latency budgets, and graceful recovery.

12m

Designing LLM Evaluation Systems: Datasets, Graders, and Release Gates

A rigorous blueprint for evaluating LLM behavior with versioned datasets, explicit rubrics, layered graders, calibration, fairness checks, and auditable release decisions.

11m

Semantic Caching for LLMs: Reuse Without Crossed Contexts

How to reduce repeated LLM work with exact and semantic cache layers, strict eligibility partitions, calibrated thresholds, dependency-aware invalidation, and quality monitoring.

11m

Operating Multi-Tenant AI SaaS: Isolation, Quotas, and Usage Ledgers

A Next.js architecture for carrying trusted tenant context through requests, PostgreSQL, object storage, queues, rate limits, and AI usage accounting.

12m

05 · Now

Current focus

Building real-time intelligent systems day to day — voice agents that hold a coherent interview turn-by-turn under 800ms, LLM evaluation engines with grounded rubrics, and the guardrail layers that keep them safe in production.

This quarter the focus is evaluation infrastructure: moving past single-prompt scoring into multi-pass rubric pipelines, then measuring the consistency gap against human raters. In parallel, rebuilding the semantic caching layer to push LLM inference costs down without bleeding quality.

On the side I am writing more about the unglamorous parts of shipping AI — guardrails, embeddings precision, voice latency budgets, and the multi-tenant SaaS plumbing that lets these systems run for thousands of paying users. New posts land on the blog most weeks.

Reading: Designing Data-Intensive Applications (re-read, this time with vector databases in mind) and the OpenAI Realtime API release notes — voice latency is the next ceiling worth pushing.

Open to senior AI / product engineering conversations, deep technical collaborations, talks, and writing opportunities.

06 · Talk shop

Open to senior AI / product engineer roles.

Strongest fit: teams building real-time agents, LLM evaluation systems, AI guardrails, or developer-facing AI infrastructure where product quality and systems judgment both matter.

Email about a role View work proof