Health Insurance Member Q&A Chatbot

Quick Stats

Response Accuracy 92%

User Satisfaction 4.2/5.0 (200+ users)

Hallucination Rate <2%

Cost/Query $0.04

Response Time <2 seconds

Overview

A full-stack conversational AI system helping health insurance members answer questions about coverage, claims, benefits, and enrollment. Combines a React frontend, Node.js backend, and RAG pipeline to deliver accurate, compliant responses in real-time.

Problem Statement

Support Volume: 50K+ member questions monthly; current support team is overloaded
Response Time: Average wait time: 24 hours. Members need instant answers.
Cost: Support agent fully-loaded cost: $75K/year per agent. Need to reduce per-query cost.
Compliance: Healthcare domain requires PII handling, regulatory accuracy, clear disclaimers
Consistency: Agents give different answers to same question. System must be consistent.

Solution: Conversational AI on AWS

Frontend: React app with real-time chat UI, typing indicators, message history
Backend: Node.js + Express, REST API, WebSocket support for real-time updates
AI Layer: RAG pipeline (LangChain + Pinecone + GPT-4) for member question answering
Infrastructure: AWS Lambda, API Gateway, DynamoDB for scalability

Architecture

System Design

┌─────────────────┐
│   React Frontend│ (Chat UI, message history, typing indicators)
└────────┬────────┘
         │ WebSocket
         ▼
┌─────────────────────────────────────┐
│  Node.js/Express Backend API        │
│ (Request validation, auth, logging) │
└────────┬────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  RAG Pipeline (LangChain)           │
│  1. Vector retrieval (Pinecone)     │
│  2. Generate response (GPT-4)       │
│  3. Guardrails (PII detection)      │
└────────┬────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│  AWS Infrastructure                 │
│  • Lambda (compute)                 │
│  • DynamoDB (conversation history)  │
│  • CloudWatch (logging/monitoring)  │
└─────────────────────────────────────┘

Tech Stack

React 18 TypeScript Node.js Express LangChain Pinecone GPT-4 AWS Lambda DynamoDB WebSocket

Key Technical Decisions

1. Full-Stack Implementation (React + Node.js)

Why: Building chatbot UI from scratch saves $100K+ in third-party services (Intercom, Drift, etc.). Node.js TypeScript provides type safety and code sharing between frontend and backend.

2. WebSocket for Real-Time Responses

Why: HTTP polling causes delays and high server load. WebSocket enables streaming responses (show text as it's generated), improving perceived performance.

3. RAG for Domain-Specific Knowledge

Why: GPT-4 alone hallucinates about plan benefits. RAG retrieves plan documents, ensuring accuracy. Reduces hallucination from 15% to <2%.

4. PII Detection & Redaction

Why: Members ask about personal claims data. System detects PII (member ID, SSN, dates) and redacts from responses per HIPAA.

5. Guardrails: Conservative Response Strategy

Why: In healthcare, saying "I don't know" is better than guessing. System flags uncertain responses and offers human agent escalation.

Key Features

For Members

24/7 Availability: Instant answers anytime, no wait times
Clear Answers: Plain English explanations of coverage, benefits, claims
Escalation: Unclear answers are escalated to human agent with context
Privacy: PII is handled securely; responses don't leak personal data
Conversation History: Members can revisit past questions

For Operations

Reduced Support Load: System handles 60% of routine questions (coverage, claims, enrollment)
Cost Reduction: $0.04/query vs. $24 for agent handling (~600x cheaper)
Escalation Insights: Track which topics confuse members; update plan docs
Compliance Audit Trail: All queries logged for regulatory review
Monitoring Dashboard: Accuracy, satisfaction, response time metrics

Results & Impact

Quantified Metrics

Metric	Value
Response Accuracy	92% (on gold standard test set)
User Satisfaction	4.2/5.0 (200+ user ratings)
Hallucination Rate	<2% (continuous monitoring)
Query Cost	$0.04 (vs. $24 agent cost)
Response Time	<2 seconds
Questions Handled	60% of routine questions

Business Impact

Support Cost Reduction: 60% of routine questions handled by AI (vs. $24/agent query)
Response Time: 24 hours → 2 seconds
Member Satisfaction: 4.2/5.0 rating; 87% prefer AI for quick answers
Support Team Capacity: Agents focus on complex cases, escalations; handle 3x more nuanced issues

Lessons Learned

What Worked

Conservative response strategy: Better to say "I'm uncertain" than hallucinate. Builds trust.
Domain-specific evaluation: Generic chatbot metrics (BLEU, ROUGE) don't capture insurance accuracy. Used domain expert labels.
Escalation workflow: System hands off to human agent gracefully. Members appreciate the option.
Monitoring from day 1: Caught hallucinations early with continuous evaluation.

Key Takeaways

Healthcare chatbots need PII handling + compliance from day 1, not as afterthought
Conversational AI + domain expertise = better outcomes than AI alone
Users prefer honest "I don't know" over confident hallucination
Continuous human review (1% of queries) catches drift early

Code & Resources

GitHub Repository: github.com/AntonGlenbovitch/health-insurance-qa

Includes:

React chat UI component
Node.js/Express backend
RAG pipeline setup (LangChain + Pinecone)
PII detection and redaction
Evaluation framework integration
AWS Lambda deployment
Unit and integration tests

Related Articles: