🎓 LinguaAI System Architecture
Language Learning Platform with Conversational AI Tutor and Real-Time Pronunciation Feedback
📌 How to Use These Diagrams
- View Online: Open this HTML file in any browser (Chrome, Firefox, Safari)
- Export to PNG/SVG:
- Visit mermaid.live
- Copy the Mermaid code from the .mermaid files
- Paste into the editor
- Click "Actions" → "Export as PNG/SVG"
- Edit Diagrams: Modify the .mermaid files to customize colors, labels, or structure
- For Presentation: Export as high-resolution PNG and insert into PowerPoint/Google Slides
1️⃣ Complete System Architecture
Overview: This diagram shows the complete LinguaAI system with all 5 architectural layers:
- Presentation Layer: React Web App + Browser Extension
- Application Layer: FastAPI Gateway + Socket.io Server
- AI/ML Layer: Whisper (ASR) + Fine-tuned Whisper (MDD) + PER Calculator + LLM
- Processing Modules: OCR, Profanity Filter, Waveform Generator, Gamification Engine
- Data Layer: PostgreSQL (Supabase)
%%{init: {'flowchart': {'nodeSpacing': 150, 'rankSpacing': 120}}}%%
graph TB
subgraph PresentationLayer["PRESENTATION LAYER
(Client-Side)"]
ReactWeb["React Web Application
• Pronunciation feedback UI
• Contextual scenarios
• Gamification (badges, streaks)
• Analytics dashboard (D3.js)
• Flashcards, OCR scanning"]
BrowserExt["Browser Extension
• Word selection from web
• One-click dictionary add
• Context capture"]
end
subgraph ApplicationLayer["APPLICATION LAYER
(Backend Services)"]
FastAPI["FastAPI Gateway
• Authentication & Authorization (JWT)
• Request routing & validation
• OWASP compliance
• Rate limiting & load balancing
Endpoints:
/api/auth/* | /api/user/*
/api/dictionary/* | /api/scenarios/*
/api/flashcards/* | /api/progress/*
/api/ocr/scan"]
SocketIO["Socket.io Server
• Real-time WebSocket
• Live pronunciation feedback
• Waveform streaming
• Session management"]
end
subgraph AIMLLayer["AI/ML LAYER
(Machine Learning Services)"]
ASR["ASR Service
(Whisper - OpenAI)
• Speech-to-text transcription
• Open conversation handling
• 95% accuracy with noise cancel
DSAI: Collection Phase
• Audio data collection
• Transcription storage"]
MDD["MDD Service
(Fine-tuned Whisper)
• Phoneme-level analysis
• Mispronunciation detection
• Native language adaptation
(e.g., Arabic B/P confusion)
DSAI: Analysis & Modeling
• Error categorization
• Pattern recognition"]
PER["PER Calculator
(Custom Algorithm)
• Phone Error Rate calculation
• Phoneme comparison
• Pronunciation score (0-100%)
• Error identification
DSAI: Evaluation Phase
• F1-score ≥ 0.90"]
LLM["Flashcard Generation
(LLM)
• Generate from conversation
• Extract key vocabulary
• Contextual examples
• Spaced repetition logic
DSAI: Analysis Phase
• Text analysis
• Vocabulary contextualization"]
end
subgraph ProcessingLayer["PROCESSING MODULES LAYER
(Specialized Services)"]
Profanity["Profanity Filter
• Content filtering
• Safe learning environment
• Text processing"]
OCR["OCR Module
(Tesseract.js)
• Text extraction from images
• Server-side processing
• Vocabulary extraction"]
Waveform["Waveform Generator
(FFmpeg)
• Audio visualization
• Server-side rendering
• Real-time waveform"]
Gamification["Gamification Engine
(Gamify.js)
• Points, badges, streaks
• Engagement tracking
• Reward system
• Observer pattern
(Separate Microservice)"]
end
subgraph DataLayer["DATA LAYER
(Persistent Storage)"]
PostgreSQL["PostgreSQL Database
(Supabase)
• ACID-compliant relational DB
• GDPR-compliant anonymization
• Encrypted storage with RBAC
Tables:
• users (profiles, native language)
• dictionaries (vocabulary, context)
• conversations (transcripts, scenarios)
• pronunciation_logs (PER scores, errors)
• flashcards (vocab, repetition schedule)
• progress (session, accuracy trends)
• gamification (points, badges, streaks)
• mistake_tracking (error patterns)
DSAI: Data Storage & Retrieval
• Time-series analytics
• Mistake pattern insights"]
end
ReactWeb -->|"HTTPS/REST"| FastAPI
ReactWeb -->|"WebSocket"| SocketIO
BrowserExt -->|"HTTPS/REST"| FastAPI
FastAPI -->|"HTTP"| ASR
FastAPI -->|"HTTP"| MDD
SocketIO -->|"HTTP"| MDD
SocketIO -->|"HTTP"| Waveform
ASR -->|"HTTP"| MDD
MDD -->|"HTTP"| PER
PER -->|"HTTP"| MDD
MDD -->|"HTTP"| SocketIO
MDD -->|"HTTP"| FastAPI
FastAPI -->|"HTTP"| LLM
LLM -->|"HTTP"| FastAPI
FastAPI -->|"HTTP"| Profanity
Profanity -->|"HTTP"| FastAPI
FastAPI -->|"HTTP"| OCR
OCR -->|"HTTP"| FastAPI
FastAPI -->|"HTTP"| Gamification
Gamification -->|"HTTP"| FastAPI
Waveform -->|"HTTP"| SocketIO
FastAPI -->|"PostgreSQL"| PostgreSQL
ASR -->|"PostgreSQL"| PostgreSQL
MDD -->|"PostgreSQL"| PostgreSQL
LLM -->|"PostgreSQL"| PostgreSQL
OCR -->|"PostgreSQL"| PostgreSQL
Gamification -->|"PostgreSQL"| PostgreSQL
PostgreSQL -->|"PostgreSQL"| FastAPI
classDef presentationStyle fill:#E3F2FD,stroke:#1976D2,stroke-width:3px,color:#000
classDef applicationStyle fill:#F3E5F5,stroke:#7B1FA2,stroke-width:3px,color:#000
classDef aimlStyle fill:#E8F5E9,stroke:#388E3C,stroke-width:3px,color:#000
classDef processingStyle fill:#FFF3E0,stroke:#F57C00,stroke-width:3px,color:#000
classDef dataStyle fill:#FCE4EC,stroke:#C2185B,stroke-width:3px,color:#000
class ReactWeb,BrowserExt presentationStyle
class FastAPI,SocketIO applicationStyle
class ASR,MDD,PER,LLM aimlStyle
class Profanity,OCR,Waveform,Gamification processingStyle
class PostgreSQL dataStyle
2️⃣ Pronunciation Feedback Flows
Two Different Processing Paths:
- Flow 1 (Predefined Text): For practice exercises where users read given text. Direct phoneme analysis for faster, focused feedback.
- Flow 2 (Open Conversation): For contextual scenarios (restaurant, travel). Includes ASR transcription followed by pronunciation analysis.
%%{init: {'flowchart': {'nodeSpacing': 150, 'rankSpacing': 100, 'curve': 'basis'}}}%%
flowchart TB
subgraph Flow1["FLOW 1: Predefined Text Practice
(Pronunciation Exercises)"]
direction LR
User1["👤 User
Reads given text"]
MDD1["MDD Service
(Fine-tuned Whisper)
Phoneme Detection"]
PER1["PER Calculator
Calculate Phone
Error Rate"]
Feedback1["✅ Feedback
• Pronunciation score
• Phoneme errors
• Corrections"]
User1 -->|"🎤 Audio"| MDD1
MDD1 -->|"Phoneme
predictions"| PER1
PER1 -->|"PER score
Error details"| Feedback1
end
spacer1[" "]:::spacer
subgraph Flow2["FLOW 2: Open Conversation
(Contextual Scenarios)"]
direction LR
User2["👤 User
Speaks freely"]
ASR2["ASR Service
(Whisper)
Speech Recognition"]
Text2["📝 Text
Transcription"]
MDD2["MDD Service
(Fine-tuned Whisper)
Phoneme Detection"]
PER2["PER Calculator
Calculate Phone
Error Rate"]
Feedback2["✅ Feedback
• Transcription
• Pronunciation score
• Phoneme errors
• Corrections"]
User2 -->|"🎤 Audio"| ASR2
ASR2 -->|"Text"| Text2
Text2 -->|"Transcription
+ Audio"| MDD2
MDD2 -->|"Phoneme
predictions"| PER2
PER2 -->|"PER score
Error details"| Feedback2
end
Flow1 ~~~ spacer1
spacer1 ~~~ Flow2
classDef userStyle fill:#BBDEFB,stroke:#1976D2,stroke-width:2px,color:#000
classDef asrStyle fill:#C8E6C9,stroke:#388E3C,stroke-width:2px,color:#000
classDef mddStyle fill:#C8E6C9,stroke:#388E3C,stroke-width:2px,color:#000
classDef perStyle fill:#FFF9C4,stroke:#F57F17,stroke-width:2px,color:#000
classDef feedbackStyle fill:#C5E1A5,stroke:#689F38,stroke-width:2px,color:#000
classDef textStyle fill:#E1BEE7,stroke:#7B1FA2,stroke-width:2px,color:#000
classDef spacer fill:transparent,stroke:transparent,color:transparent
class User1,User2 userStyle
class ASR2 asrStyle
class MDD1,MDD2 mddStyle
class PER1,PER2 perStyle
class Feedback1,Feedback2 feedbackStyle
class Text2 textStyle
class spacer1 spacer
3️⃣ DSAI Data Lifecycle
Complete Data Science Workflow: This diagram demonstrates the end-to-end data lifecycle aligned with DSAI program requirements, showing how data flows through collection, analysis, modeling, evaluation, and deployment phases.
graph TD
subgraph Phase1["PHASE 1: COLLECTION"]
C1["ASR Service
📊 Data Sources:
• Audio recordings
• User speech samples
• Conversation transcripts
• User interaction logs"]
end
subgraph Phase2["PHASE 2: ANALYSIS"]
A1["MDD Service
🔍 Analysis:
• Phoneme-level error categorization
• Native language influence patterns
• Mistake pattern clustering"]
A2["Flashcard Service
📚 Analysis:
• Vocabulary extraction
• Contextualization
• Text analysis from transcripts"]
end
subgraph Phase3["PHASE 3: MODELING"]
M1["Fine-tuned Whisper
🤖 Machine Learning:
• Fine-tuning for phoneme detection
• Native language adaptation
• Error pattern recognition models
📈 Validation:
• 10-fold cross-validation
• 10K audio samples"]
end
subgraph Phase4["PHASE 4: EVALUATION"]
E1["PER Calculator
📊 Metrics:
• Phone Error Rate (PER)
• Model accuracy ≥ 90%
• F1-score ≥ 0.90
• Pronunciation accuracy ≥ 85%
• Vocabulary retention ≥ 75%
🛠️ Tools:
• Confusion matrices
• ROC curves"]
end
subgraph Phase5["PHASE 5: DEPLOYMENT & MONITORING"]
D1["Production System
🚀 Deployment:
• Real-time pronunciation feedback
• Progress analytics dashboards
• Continuous performance tracking
📡 Monitoring:
• 99% uptime
• Latency < 2s"]
D2["PostgreSQL Database
💾 Storage:
• Time-series analytics
• Structured data for insights
• Aggregation pipelines
• Mistake pattern insights"]
end
C1 -->|"Raw audio data
Transcripts"| A1
C1 -->|"Conversation
transcripts"| A2
A1 -->|"Error patterns
Categorized data"| M1
A2 -->|"Extracted
vocabulary"| M1
M1 -->|"Model predictions
Phoneme detections"| E1
E1 -->|"Performance metrics
Validated models"| D1
E1 -->|"Evaluation results"| D2
D1 -->|"Production data"| D2
D2 -.->|"Feedback loop
Continuous improvement"| C1
classDef collectionStyle fill:#E3F2FD,stroke:#1976D2,stroke-width:3px,color:#000
classDef analysisStyle fill:#E8F5E9,stroke:#388E3C,stroke-width:3px,color:#000
classDef modelingStyle fill:#FFF3E0,stroke:#F57C00,stroke-width:3px,color:#000
classDef evaluationStyle fill:#FCE4EC,stroke:#C2185B,stroke-width:3px,color:#000
classDef deploymentStyle fill:#F3E5F5,stroke:#7B1FA2,stroke-width:3px,color:#000
class C1 collectionStyle
class A1,A2 analysisStyle
class M1 modelingStyle
class E1 evaluationStyle
class D1,D2 deploymentStyle
🎯 Key Architecture Highlights
- Microservices Architecture: Gamification Engine as separate module for scalability
- Real-time Communication: Socket.io WebSocket for live pronunciation feedback
- AI/ML Pipeline: Whisper (ASR) + Fine-tuned Whisper (MDD) + PER Calculator
- Native Language Adaptation: Fine-tuned model handles language-specific challenges (e.g., Arabic B/P)
- Security First: JWT authentication, OWASP compliance, GDPR data handling
- Performance Targets: < 2s latency, 99% uptime, 95% ASR accuracy
💻 Technology Stack
Frontend
- React (Web)
- Chrome Extension API
- D3.js (Visualization)
Backend
- FastAPI (Python)
- Socket.io
AI/ML
- Whisper (ASR)
- Fine-tuned Whisper (MDD)
- LLM (Flashcard Gen)
Processing
- Tesseract.js (OCR)
- FFmpeg (Audio/Waveform)
- Gamify.js