🎓 LinguaAI System Architecture

Language Learning Platform with Conversational AI Tutor and Real-Time Pronunciation Feedback

📌 How to Use These Diagrams

View Online: Open this HTML file in any browser (Chrome, Firefox, Safari)
Export to PNG/SVG:
1. Visit mermaid.live
2. Copy the Mermaid code from the .mermaid files
3. Paste into the editor
4. Click "Actions" → "Export as PNG/SVG"
Edit Diagrams: Modify the .mermaid files to customize colors, labels, or structure
For Presentation: Export as high-resolution PNG and insert into PowerPoint/Google Slides

1️⃣ Complete System Architecture

Overview: This diagram shows the complete LinguaAI system with all 5 architectural layers:

Presentation Layer: React Web App + Browser Extension
Application Layer: FastAPI Gateway + Socket.io Server
AI/ML Layer: Whisper (ASR) + Fine-tuned Whisper (MDD) + PER Calculator + LLM
Processing Modules: OCR, Profanity Filter, Waveform Generator, Gamification Engine
Data Layer: PostgreSQL (Supabase)

%%{init: {'flowchart': {'nodeSpacing': 150, 'rankSpacing': 120}}}%% graph TB subgraph PresentationLayer["PRESENTATION LAYER
(Client-Side)"] ReactWeb["React Web Application
• Pronunciation feedback UI
• Contextual scenarios
• Gamification (badges, streaks)
• Analytics dashboard (D3.js)
• Flashcards, OCR scanning"] BrowserExt["Browser Extension
• Word selection from web
• One-click dictionary add
• Context capture"] end subgraph ApplicationLayer["APPLICATION LAYER
(Backend Services)"] FastAPI["FastAPI Gateway
• Authentication & Authorization (JWT)
• Request routing & validation
• OWASP compliance
• Rate limiting & load balancing

Endpoints:
/api/auth/* | /api/user/*
/api/dictionary/* | /api/scenarios/*
/api/flashcards/* | /api/progress/*
/api/ocr/scan"] SocketIO["Socket.io Server
• Real-time WebSocket
• Live pronunciation feedback
• Waveform streaming
• Session management"] end subgraph AIMLLayer["AI/ML LAYER
(Machine Learning Services)"] ASR["ASR Service
(Whisper - OpenAI)
• Speech-to-text transcription
• Open conversation handling
• 95% accuracy with noise cancel

DSAI: Collection Phase
• Audio data collection
• Transcription storage"] MDD["MDD Service
(Fine-tuned Whisper)
• Phoneme-level analysis
• Mispronunciation detection
• Native language adaptation
(e.g., Arabic B/P confusion)

DSAI: Analysis & Modeling
• Error categorization
• Pattern recognition"] PER["PER Calculator
(Custom Algorithm)
• Phone Error Rate calculation
• Phoneme comparison
• Pronunciation score (0-100%)
• Error identification

DSAI: Evaluation Phase
• F1-score ≥ 0.90"] LLM["Flashcard Generation
(LLM)
• Generate from conversation
• Extract key vocabulary
• Contextual examples
• Spaced repetition logic

DSAI: Analysis Phase
• Text analysis
• Vocabulary contextualization"] end subgraph ProcessingLayer["PROCESSING MODULES LAYER
(Specialized Services)"] Profanity["Profanity Filter
• Content filtering
• Safe learning environment
• Text processing"] OCR["OCR Module
(Tesseract.js)
• Text extraction from images
• Server-side processing
• Vocabulary extraction"] Waveform["Waveform Generator
(FFmpeg)
• Audio visualization
• Server-side rendering
• Real-time waveform"] Gamification["Gamification Engine
(Gamify.js)
• Points, badges, streaks
• Engagement tracking
• Reward system
• Observer pattern
(Separate Microservice)"] end subgraph DataLayer["DATA LAYER
(Persistent Storage)"] PostgreSQL["PostgreSQL Database
(Supabase)
• ACID-compliant relational DB
• GDPR-compliant anonymization
• Encrypted storage with RBAC

Tables:
• users (profiles, native language)
• dictionaries (vocabulary, context)
• conversations (transcripts, scenarios)
• pronunciation_logs (PER scores, errors)
• flashcards (vocab, repetition schedule)
• progress (session, accuracy trends)
• gamification (points, badges, streaks)
• mistake_tracking (error patterns)

DSAI: Data Storage & Retrieval
• Time-series analytics
• Mistake pattern insights"] end ReactWeb -->|"HTTPS/REST"| FastAPI ReactWeb -->|"WebSocket"| SocketIO BrowserExt -->|"HTTPS/REST"| FastAPI FastAPI -->|"HTTP"| ASR FastAPI -->|"HTTP"| MDD SocketIO -->|"HTTP"| MDD SocketIO -->|"HTTP"| Waveform ASR -->|"HTTP"| MDD MDD -->|"HTTP"| PER PER -->|"HTTP"| MDD MDD -->|"HTTP"| SocketIO MDD -->|"HTTP"| FastAPI FastAPI -->|"HTTP"| LLM LLM -->|"HTTP"| FastAPI FastAPI -->|"HTTP"| Profanity Profanity -->|"HTTP"| FastAPI FastAPI -->|"HTTP"| OCR OCR -->|"HTTP"| FastAPI FastAPI -->|"HTTP"| Gamification Gamification -->|"HTTP"| FastAPI Waveform -->|"HTTP"| SocketIO FastAPI -->|"PostgreSQL"| PostgreSQL ASR -->|"PostgreSQL"| PostgreSQL MDD -->|"PostgreSQL"| PostgreSQL LLM -->|"PostgreSQL"| PostgreSQL OCR -->|"PostgreSQL"| PostgreSQL Gamification -->|"PostgreSQL"| PostgreSQL PostgreSQL -->|"PostgreSQL"| FastAPI classDef presentationStyle fill:#E3F2FD,stroke:#1976D2,stroke-width:3px,color:#000 classDef applicationStyle fill:#F3E5F5,stroke:#7B1FA2,stroke-width:3px,color:#000 classDef aimlStyle fill:#E8F5E9,stroke:#388E3C,stroke-width:3px,color:#000 classDef processingStyle fill:#FFF3E0,stroke:#F57C00,stroke-width:3px,color:#000 classDef dataStyle fill:#FCE4EC,stroke:#C2185B,stroke-width:3px,color:#000 class ReactWeb,BrowserExt presentationStyle class FastAPI,SocketIO applicationStyle class ASR,MDD,PER,LLM aimlStyle class Profanity,OCR,Waveform,Gamification processingStyle class PostgreSQL dataStyle

2️⃣ Pronunciation Feedback Flows

Two Different Processing Paths:

Flow 1 (Predefined Text): For practice exercises where users read given text. Direct phoneme analysis for faster, focused feedback.
Flow 2 (Open Conversation): For contextual scenarios (restaurant, travel). Includes ASR transcription followed by pronunciation analysis.

%%{init: {'flowchart': {'nodeSpacing': 150, 'rankSpacing': 100, 'curve': 'basis'}}}%% flowchart TB subgraph Flow1["FLOW 1: Predefined Text Practice
(Pronunciation Exercises)"] direction LR User1["👤 User
Reads given text"] MDD1["MDD Service
(Fine-tuned Whisper)
Phoneme Detection"] PER1["PER Calculator
Calculate Phone
Error Rate"] Feedback1["✅ Feedback
• Pronunciation score
• Phoneme errors
• Corrections"] User1 -->|"🎤 Audio"| MDD1 MDD1 -->|"Phoneme
predictions"| PER1 PER1 -->|"PER score
Error details"| Feedback1 end spacer1[" "]:::spacer subgraph Flow2["FLOW 2: Open Conversation
(Contextual Scenarios)"] direction LR User2["👤 User
Speaks freely"] ASR2["ASR Service
(Whisper)
Speech Recognition"] Text2["📝 Text
Transcription"] MDD2["MDD Service
(Fine-tuned Whisper)
Phoneme Detection"] PER2["PER Calculator
Calculate Phone
Error Rate"] Feedback2["✅ Feedback
• Transcription
• Pronunciation score
• Phoneme errors
• Corrections"] User2 -->|"🎤 Audio"| ASR2 ASR2 -->|"Text"| Text2 Text2 -->|"Transcription
+ Audio"| MDD2 MDD2 -->|"Phoneme
predictions"| PER2 PER2 -->|"PER score
Error details"| Feedback2 end Flow1 ~~~ spacer1 spacer1 ~~~ Flow2 classDef userStyle fill:#BBDEFB,stroke:#1976D2,stroke-width:2px,color:#000 classDef asrStyle fill:#C8E6C9,stroke:#388E3C,stroke-width:2px,color:#000 classDef mddStyle fill:#C8E6C9,stroke:#388E3C,stroke-width:2px,color:#000 classDef perStyle fill:#FFF9C4,stroke:#F57F17,stroke-width:2px,color:#000 classDef feedbackStyle fill:#C5E1A5,stroke:#689F38,stroke-width:2px,color:#000 classDef textStyle fill:#E1BEE7,stroke:#7B1FA2,stroke-width:2px,color:#000 classDef spacer fill:transparent,stroke:transparent,color:transparent class User1,User2 userStyle class ASR2 asrStyle class MDD1,MDD2 mddStyle class PER1,PER2 perStyle class Feedback1,Feedback2 feedbackStyle class Text2 textStyle class spacer1 spacer

3️⃣ DSAI Data Lifecycle

Complete Data Science Workflow: This diagram demonstrates the end-to-end data lifecycle aligned with DSAI program requirements, showing how data flows through collection, analysis, modeling, evaluation, and deployment phases.

graph TD subgraph Phase1["PHASE 1: COLLECTION"] C1["ASR Service
📊 Data Sources:
• Audio recordings
• User speech samples
• Conversation transcripts
• User interaction logs"] end subgraph Phase2["PHASE 2: ANALYSIS"] A1["MDD Service
🔍 Analysis:
• Phoneme-level error categorization
• Native language influence patterns
• Mistake pattern clustering"] A2["Flashcard Service
📚 Analysis:
• Vocabulary extraction
• Contextualization
• Text analysis from transcripts"] end subgraph Phase3["PHASE 3: MODELING"] M1["Fine-tuned Whisper
🤖 Machine Learning:
• Fine-tuning for phoneme detection
• Native language adaptation
• Error pattern recognition models

📈 Validation:
• 10-fold cross-validation
• 10K audio samples"] end subgraph Phase4["PHASE 4: EVALUATION"] E1["PER Calculator
📊 Metrics:
• Phone Error Rate (PER)
• Model accuracy ≥ 90%
• F1-score ≥ 0.90
• Pronunciation accuracy ≥ 85%
• Vocabulary retention ≥ 75%

🛠️ Tools:
• Confusion matrices
• ROC curves"] end subgraph Phase5["PHASE 5: DEPLOYMENT & MONITORING"] D1["Production System
🚀 Deployment:
• Real-time pronunciation feedback
• Progress analytics dashboards
• Continuous performance tracking

📡 Monitoring:
• 99% uptime
• Latency < 2s"] D2["PostgreSQL Database
💾 Storage:
• Time-series analytics
• Structured data for insights
• Aggregation pipelines
• Mistake pattern insights"] end C1 -->|"Raw audio data
Transcripts"| A1 C1 -->|"Conversation
transcripts"| A2 A1 -->|"Error patterns
Categorized data"| M1 A2 -->|"Extracted
vocabulary"| M1 M1 -->|"Model predictions
Phoneme detections"| E1 E1 -->|"Performance metrics
Validated models"| D1 E1 -->|"Evaluation results"| D2 D1 -->|"Production data"| D2 D2 -.->|"Feedback loop
Continuous improvement"| C1 classDef collectionStyle fill:#E3F2FD,stroke:#1976D2,stroke-width:3px,color:#000 classDef analysisStyle fill:#E8F5E9,stroke:#388E3C,stroke-width:3px,color:#000 classDef modelingStyle fill:#FFF3E0,stroke:#F57C00,stroke-width:3px,color:#000 classDef evaluationStyle fill:#FCE4EC,stroke:#C2185B,stroke-width:3px,color:#000 classDef deploymentStyle fill:#F3E5F5,stroke:#7B1FA2,stroke-width:3px,color:#000 class C1 collectionStyle class A1,A2 analysisStyle class M1 modelingStyle class E1 evaluationStyle class D1,D2 deploymentStyle

🎯 Key Architecture Highlights

Microservices Architecture: Gamification Engine as separate module for scalability
Real-time Communication: Socket.io WebSocket for live pronunciation feedback
AI/ML Pipeline: Whisper (ASR) + Fine-tuned Whisper (MDD) + PER Calculator
Native Language Adaptation: Fine-tuned model handles language-specific challenges (e.g., Arabic B/P)
Security First: JWT authentication, OWASP compliance, GDPR data handling
Performance Targets: < 2s latency, 99% uptime, 95% ASR accuracy

💻 Technology Stack

Frontend

React (Web)
Chrome Extension API
D3.js (Visualization)

Backend

FastAPI (Python)
Socket.io

AI/ML

Whisper (ASR)
Fine-tuned Whisper (MDD)
LLM (Flashcard Gen)

Processing

Tesseract.js (OCR)
FFmpeg (Audio/Waveform)
Gamify.js

Database

Supabase (PostgreSQL)