AlphaWave TTS - Production-Ready Offline Text-to-Speech

A complete, production-grade offline Text-to-Speech web application built with FastAPI and pyttsx3.

🚀 Features

⚡ Ultra Fast Processing - Optimized TTS engine with singleton pattern
🔒 100% Offline & Secure - No internet required, complete privacy
🎙 Studio Quality Output - 192kbps MP3 audio files via pydub
🎯 Smart Voice Selection - Automatic male/female voice detection
⚙️ Speed Control - Adjustable speech rate (100-200 WPM)
🧹 Auto Cleanup - Background task removes files older than 24 hours
📊 Production Ready - Async operations, logging, error handling
🔄 Non-Blocking - Async-friendly audio generation
📝 Request Logging - Detailed timing and status tracking

🛠 Tech Stack

Backend

FastAPI - Modern async web framework
pyttsx3 - Offline TTS engine (singleton pattern)
pydub - WAV → MP3 conversion (192kbps)
Uvicorn - ASGI production server
Pydantic - Request/response validation

Frontend

Pure HTML/CSS/JavaScript
Modern dark theme UI
Responsive design
Fetch API integration

📋 Prerequisites

Python 3.8+
FFmpeg (required for pydub MP3 conversion)

Install FFmpeg

Windows:

Download from https://ffmpeg.org/download.html
Extract to C:\ffmpeg
Add C:\ffmpeg\bin to PATH environment variable
Restart terminal

Linux:

sudo apt-get update
sudo apt-get install ffmpeg

macOS:

brew install ffmpeg

🔧 Installation

Step 1: Navigate to project

cd AlphaWave

Step 2: Set up backend

cd backend
python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

pip install -r requirements.txt

Step 3: Run backend (Production Mode)

python run.py

Backend will run on: http://localhost:8000

Step 4: Open frontend

Simply open frontend/index.html in your browser, or serve it:

cd frontend
python -m http.server 3000

Then visit: http://localhost:3000

📡 API Documentation

Once backend is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

`GET /`

Root endpoint with API info

`GET /health`

Health check endpoint

{
  "status": "healthy",
  "version": "1.0.0"
}

`GET /voices`

Returns all available system voices with gender detection

Response:

{
  "voices": [
    {
      "id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
      "name": "Microsoft David Desktop",
      "gender": "male"
    },
    {
      "id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
      "name": "Microsoft Zira Desktop",
      "gender": "female"
    }
  ]
}

`POST /generate`

Generates MP3 audio from text

Request:

{
  "text": "Hello, this is AlphaWave TTS",
  "voice_id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
  "rate": 170
}

Response:

{
  "success": true,
  "audio_url": "/static/audio/abc123-def456.mp3"
}

⚙️ Configuration

Edit backend/app/config.py:

MAX_TEXT_LENGTH = 300000              # Maximum characters
AUDIO_BITRATE = "192k"              # MP3 quality
FILE_RETENTION_HOURS = 24           # Auto-delete after
CLEANUP_INTERVAL_SECONDS = 3600     # Cleanup frequency

📁 Project Structure

AlphaWave/
├── backend/
│   ├── app/
│   │   ├── main.py           # FastAPI app with endpoints
│   │   ├── tts_engine.py     # Singleton TTS engine
│   │   ├── models.py         # Pydantic models
│   │   └── config.py         # Configuration
│   ├── static/
│   │   └── audio/            # Generated MP3 files
│   ├── requirements.txt      # Python dependencies
│   └── run.py                # Production runner
├── frontend/
│   ├── index.html            # Main UI
│   ├── style.css             # Dark theme styles
│   └── script.js             # API integration
└── README.md

🎯 Usage

Enter text in the textarea (max 300000 characters)
Select voice from dropdown (grouped by gender)
Adjust speed using slider (100-200 WPM)
Click Generate button
Listen to audio preview
Download MP3 file

Keyboard Shortcut: Ctrl + Enter to generate

🏗 Architecture Highlights

Backend Features

✅ Singleton Pattern - TTS engine reused across requests
✅ Async Operations - Non-blocking audio generation with asyncio.to_thread
✅ Background Tasks - Automatic file cleanup every hour
✅ Request Logging - Timing and status for every request
✅ Error Handling - Graceful failures with proper messages
✅ CORS Enabled - Cross-origin requests supported
✅ Input Validation - Pydantic models with sanitization
✅ WAV → MP3 - High-quality conversion via pydub

Frontend Features

✅ Character Counter - Real-time text length tracking
✅ Voice Grouping - Organized by Male/Female/Other
✅ Loading States - Visual feedback during generation
✅ Error Display - User-friendly error messages
✅ Audio Preview - Built-in player
✅ Download - One-click MP3 download
✅ Responsive - Works on all screen sizes

🐛 Troubleshooting

No voices available

Windows: Check Settings → Time & Language → Speech
Linux: Install espeak or festival
```
sudo apt-get install espeak
```

FFmpeg not found

# Verify installation
ffmpeg -version

# If not found, reinstall and add to PATH

Port already in use

Change port in run.py:

uvicorn.run("app.main:app", port=8001)

CORS errors

Ensure backend is running on port 8000
Check API_BASE in frontend/script.js

Module not found

cd backend
pip install -r requirements.txt

🚀 Production Deployment

Using Gunicorn (Linux)

pip install gunicorn
gunicorn app.main:app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Using Docker

FROM python:3.10-slim

RUN apt-get update && apt-get install -y ffmpeg

WORKDIR /app

COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY backend/ .

EXPOSE 8000

CMD ["python", "run.py"]

Build and run:

docker build -t alphawave-tts .
docker run -p 8000:8000 alphawave-tts

🔒 Security Features

✅ Text length limited to 300000 characters
✅ Input sanitization via Pydantic validators
✅ Invalid voice IDs rejected
✅ Files auto-deleted after 24 hours
✅ No external API calls (fully offline)
✅ No sensitive data stored

📊 Performance

Generation Time: 2-5 seconds average
Concurrent Requests: Supported via async operations
Memory Footprint: Minimal (singleton pattern)
File Cleanup: Automatic background task

📝 Development Mode

For development with auto-reload:

cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

🤝 Contributing

Feel free to submit issues and enhancement requests!

📄 License

MIT License - Free to use in your projects

💡 Tips

Use shorter text for faster generation
Select appropriate voice for your use case
Adjust speed for natural-sounding speech
Check logs for debugging: backend/ directory

Built with ❤️ using FastAPI, pyttsx3, and pydub

For support, check API docs at /docs or application logs.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AlphaWave TTS - Production-Ready Offline Text-to-Speech

🚀 Features

🛠 Tech Stack

Backend

Frontend

📋 Prerequisites

Install FFmpeg

🔧 Installation

Step 1: Navigate to project

Step 2: Set up backend

Step 3: Run backend (Production Mode)

Step 4: Open frontend

📡 API Documentation

API Endpoints

GET /

GET /health

GET /voices

POST /generate

⚙️ Configuration

📁 Project Structure

🎯 Usage

🏗 Architecture Highlights

Backend Features

Frontend Features

🐛 Troubleshooting

No voices available

FFmpeg not found

Port already in use

CORS errors

Module not found

🚀 Production Deployment

Using Gunicorn (Linux)

Using Docker

🔒 Security Features

📊 Performance

📝 Development Mode

🤝 Contributing

📄 License

💡 Tips

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`GET /health`

`GET /voices`

`POST /generate`

Packages