Anton Glenbovitch

Senior AI Engineer • LLM Systems • RAG • AWS Architecture

I design and deploy production-grade AI systems using LLMs and retrieval architectures. Focus on reliability, performance, cost optimization, and real-world deployment.

View Projects Read Articles

20+ Years Building Enterprise Systems

Backend architecture, data systems, and AI applications. Experience at Yale University, Health Net, and independent consulting.

Core Expertise

  • LLM Systems: Production RAG, prompt engineering, agent orchestration
  • Retrieval Architecture: Vector databases, hybrid search, evaluation frameworks
  • AWS Deployment: Lambda, Bedrock, OpenSearch, cost optimization
  • System Design: Reliability, observability, guardrails against hallucination

Engineering Philosophy

In production AI systems, the main challenges are not model selection, but:

  • Data quality and preparation
  • Retrieval accuracy and ranking
  • System design and integration
  • Evaluation and continuous monitoring

Reliable systems require guardrails, evaluation metrics, and rigorous testing to control hallucinations and maintain consistency at scale.

Featured Projects

Production-grade AI systems showcasing RAG architecture, retrieval optimization, and scalable deployment.

Enterprise Claim AI Platform

📅 2024-2025 🏢 Health Insurance Domain 📊 50K+ claims/day
RAG LangChain Pinecone GPT-4 AWS Lambda Python

Overview: Production AI system for analyzing insurance claims using retrieval-augmented generation (RAG). Processes structured and unstructured claim data, extracts key information, and generates summaries with 94% accuracy.

Accuracy 94%
Latency <2s per claim
Cost Optimization 66% reduction ($0.12 → $0.04/claim)
Scale 50,000+ claims daily

Key Technical Decisions

  • Hybrid Retrieval: Combined semantic search + BM25 ranking for 8% accuracy improvement over semantic-only approach
  • Evaluation Framework: Automated metrics (ROUGE, BERTScore) + human QA labels for ground truth validation
  • Cost Optimization: Prompt caching (30% reduction), batch processing, cheaper embedding models
  • Guardrails: Hallucination detection, conservative "I don't know" responses for out-of-domain queries

Impact

  • Reduced claim analysis time: 2 hours → 10 minutes per claim
  • Cost savings: $1.2M annually (processing cost reduction)
  • Team capacity: Handle 3× more claims with same headcount
  • Quality: 94% accuracy on hold claim classifications
Read Full Project Writeup View on GitHub

RAG Evaluation Framework

📅 2024 🔬 Research & Production ⭐ Open Source
Evaluation LLM Metrics Python Pytest

Overview: Comprehensive evaluation framework for assessing RAG pipeline quality. Combines automated metrics with human-in-the-loop validation to measure retrieval accuracy, generation quality, and hallucination rates.

Metrics Supported 12+ (ROUGE, BERTScore, custom)
Hallucination Detection Automated + human labeling
Benchmark Datasets SQuAD, Natural Questions, custom
Read Full Writeup View on GitHub

Health Insurance Member Q&A Chatbot

📅 2025 💬 Full-Stack 🏥 Healthcare Domain
React Node.js RAG AWS TypeScript

Overview: Full-stack conversational AI system helping health insurance members answer questions about coverage, claims, benefits. Combines React frontend, Node.js backend, and RAG pipeline for accurate, compliant responses.

Response Accuracy 92%
User Satisfaction 4.2/5.0
Hallucination Rate <2%
Cost $0.04/query
Read Full Writeup View on GitHub

Recent Articles

Technical deep dives on RAG, LLM systems, and production AI architecture.

Building Production RAG: Cost Optimization Strategies

How to reduce RAG pipeline costs by 66% without sacrificing quality. Covers prompt caching, embedding model selection, batch processing, and cost-per-query optimization.

Read on Medium →

Evaluating RAG Systems: Beyond Automated Metrics

Why automated metrics alone fail for RAG evaluation. The case for human-in-the-loop validation, building labeled datasets, and continuous monitoring in production.

Read on Dev.to →

Conversational AI in Healthcare: Domain-Specific Challenges

Lessons from building health insurance chatbots. PII handling, regulatory compliance, conservative response strategies, and maintaining accuracy in regulated domains.

Read on Medium →

Architecture Lessons from 20 Years: Systems Thinking at Scale

Evolutionary lessons building systems from monoliths to microservices to serverless. Why governance matters, cost is architecture, and observability is a first-class concern.

Read on Dev.to →
View All Articles

About

Experience

20+ years building enterprise systems across backend platforms, data systems, and AI applications.

  • ITA Consulting (2025–present): Strategic consultant on technology & operations for healthcare, manufacturing, defense
  • Yale University (2013–2025): Led architecture decisions and enterprise transformation initiatives serving 15K+ users
  • Health Net (1999–2010): Architected enterprise backend systems for claims processing, member/provider workflows

Core Strengths

  • Technical: 20 years systems architecture, Java/Python, AWS, databases, microservices
  • AI/LLM: 1+ year production RAG, LangChain, vector DBs, evaluation frameworks
  • Domain: 10 years health insurance (claims, compliance, enterprise integration)
  • Leadership: Cross-functional team leadership, stakeholder communication, technical strategy

Let's Work Together

Interested in building production AI systems? Have a RAG or LLM architecture question? Let's talk.

Email: a.glenbovitch@gmail.com

Phone: (203) 540-7348