ARIA
Team consisting of a Dell AI Engineer, a Health AI Founder, and a UT Austin MS Data Engineer, expert in LLMs, Computer Vision, and production ETL/AWS.
YouTube Video
Project Description
ARIA - Adaptive Retail Intelligence Agent
ARIA is a stable, multimodal retail agent that speaks naturally, detects emotional triggers in real-time, and delivers personalized product recommendations. It demonstrates how voice-powered AI can transform retail interactions through empathetic, context-aware conversations that boost customer satisfaction and sales.
Core Functionality
- Voice-first conversational AI powered by Groq LLM with natural cashier-style dialogue
- Trigger detection engine analyzing 16+ emotional/behavioral cues (fatigue, stress, hunger, cravings)
- Smart recommendation system mapping triggers + customer history to relevant products
- Adaptive customer profiles that learn from acceptance patterns over time
- Real-time analytics showing live trigger weights, recommendations, and conversation flow
Working Prototype Stability
- Fully functional FastAPI backend with WebSocket-based real-time conversations
- Voice I/O through ElevenLabs TTS + browser speech-to-text transcription
- Tested across three demo personas with stable multi-turn dialogues
- Graceful error handling and fallback logic throughout the pipeline
Technical Complexity & Multimodal Orchestration
ARIA orchestrates a complete agentic loop:
Browser mic → Speech transcription → Trigger analysis → LLM reasoning → Product recommendation → Voice synthesis → Email automation
This pipeline combines symbolic rules, cloud LLM inference, real-time speech processing, and async orchestration—coordinating multiple AI services into one cohesive agent.
Innovation & Real-World Impact
- Psychology-aware conversations that detect subtle emotional states beyond explicit requests
- Dynamic personalization adapting recommendation strategy based on individual acceptance patterns
- Low-latency voice interaction making AI feel natural and human-like
- Real-world applications: Smart retail kiosks, staffing-lite stores, drive-through automation, accessibility assistance
ARIA shows how voice agents can reduce decision fatigue, increase conversions, and enhance customer experience in retail environments.
Theme Alignment: Browsers + Voices + Cloud + Tools = Cohesive Agent
- 🌐 Browsers: Web Speech API + real-time WebSocket UI displaying live triggers, recommendations, and conversation state
- 🗣️ Voices: ElevenLabs TTS + STT create natural bidirectional voice conversations; Anam.ai provides conversational agent persona face
- ☁️ Cloud: Groq LLM for conversational reasoning + n8n workflows for email automation
- 🛠️ Tools: Custom trigger analyzer, recommendation engine, customer profiler, and product catalog form the agent’s decision layer
These components work as one unified multimodal agent that listens, thinks, speaks, and acts autonomously.
Technologies Used
Backend: FastAPI, WebSockets, Python asyncio, httpx
AI/ML: Groq API (Llama 3.3-70B), ElevenLabs API (TTS + STT), Anam.ai (conversational persona)
Frontend: Vanilla JavaScript, HTML/CSS, Web Speech API
Automation: n8n workflow engine, Docker
Data: Python dotenv, JSON-based profiles
Audio: MediaRecorder API, WebM/Opus codec