Skip to content

SadiqCodex/AlphaWave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Visit Live App

AlphaWave TTS - Production-Ready Offline Text-to-Speech

A complete, production-grade offline Text-to-Speech web application built with FastAPI and pyttsx3.

πŸš€ Features

  • ⚑ Ultra Fast Processing - Optimized TTS engine with singleton pattern
  • πŸ”’ 100% Offline & Secure - No internet required, complete privacy
  • πŸŽ™ Studio Quality Output - 192kbps MP3 audio files via pydub
  • 🎯 Smart Voice Selection - Automatic male/female voice detection
  • βš™οΈ Speed Control - Adjustable speech rate (100-200 WPM)
  • 🧹 Auto Cleanup - Background task removes files older than 24 hours
  • πŸ“Š Production Ready - Async operations, logging, error handling
  • πŸ”„ Non-Blocking - Async-friendly audio generation
  • πŸ“ Request Logging - Detailed timing and status tracking

πŸ›  Tech Stack

Backend

  • FastAPI - Modern async web framework
  • pyttsx3 - Offline TTS engine (singleton pattern)
  • pydub - WAV β†’ MP3 conversion (192kbps)
  • Uvicorn - ASGI production server
  • Pydantic - Request/response validation

Frontend

  • Pure HTML/CSS/JavaScript
  • Modern dark theme UI
  • Responsive design
  • Fetch API integration

πŸ“‹ Prerequisites

  • Python 3.8+
  • FFmpeg (required for pydub MP3 conversion)

Install FFmpeg

Windows:

  1. Download from https://ffmpeg.org/download.html
  2. Extract to C:\ffmpeg
  3. Add C:\ffmpeg\bin to PATH environment variable
  4. Restart terminal

Linux:

sudo apt-get update
sudo apt-get install ffmpeg

macOS:

brew install ffmpeg

πŸ”§ Installation

Step 1: Navigate to project

cd AlphaWave

Step 2: Set up backend

cd backend
python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

pip install -r requirements.txt

Step 3: Run backend (Production Mode)

python run.py

Backend will run on: http://localhost:8000

Step 4: Open frontend

Simply open frontend/index.html in your browser, or serve it:

cd frontend
python -m http.server 3000

Then visit: http://localhost:3000

πŸ“‘ API Documentation

Once backend is running, visit:

API Endpoints

GET /

Root endpoint with API info

GET /health

Health check endpoint

{
  "status": "healthy",
  "version": "1.0.0"
}

GET /voices

Returns all available system voices with gender detection

Response:

{
  "voices": [
    {
      "id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
      "name": "Microsoft David Desktop",
      "gender": "male"
    },
    {
      "id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
      "name": "Microsoft Zira Desktop",
      "gender": "female"
    }
  ]
}

POST /generate

Generates MP3 audio from text

Request:

{
  "text": "Hello, this is AlphaWave TTS",
  "voice_id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
  "rate": 170
}

Response:

{
  "success": true,
  "audio_url": "/static/audio/abc123-def456.mp3"
}

βš™οΈ Configuration

Edit backend/app/config.py:

MAX_TEXT_LENGTH = 300000              # Maximum characters
AUDIO_BITRATE = "192k"              # MP3 quality
FILE_RETENTION_HOURS = 24           # Auto-delete after
CLEANUP_INTERVAL_SECONDS = 3600     # Cleanup frequency

πŸ“ Project Structure

AlphaWave/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py           # FastAPI app with endpoints
β”‚   β”‚   β”œβ”€β”€ tts_engine.py     # Singleton TTS engine
β”‚   β”‚   β”œβ”€β”€ models.py         # Pydantic models
β”‚   β”‚   └── config.py         # Configuration
β”‚   β”œβ”€β”€ static/
β”‚   β”‚   └── audio/            # Generated MP3 files
β”‚   β”œβ”€β”€ requirements.txt      # Python dependencies
β”‚   └── run.py                # Production runner
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html            # Main UI
β”‚   β”œβ”€β”€ style.css             # Dark theme styles
β”‚   └── script.js             # API integration
└── README.md

🎯 Usage

  1. Enter text in the textarea (max 300000 characters)
  2. Select voice from dropdown (grouped by gender)
  3. Adjust speed using slider (100-200 WPM)
  4. Click Generate button
  5. Listen to audio preview
  6. Download MP3 file

Keyboard Shortcut: Ctrl + Enter to generate

πŸ— Architecture Highlights

Backend Features

  • βœ… Singleton Pattern - TTS engine reused across requests
  • βœ… Async Operations - Non-blocking audio generation with asyncio.to_thread
  • βœ… Background Tasks - Automatic file cleanup every hour
  • βœ… Request Logging - Timing and status for every request
  • βœ… Error Handling - Graceful failures with proper messages
  • βœ… CORS Enabled - Cross-origin requests supported
  • βœ… Input Validation - Pydantic models with sanitization
  • βœ… WAV β†’ MP3 - High-quality conversion via pydub

Frontend Features

  • βœ… Character Counter - Real-time text length tracking
  • βœ… Voice Grouping - Organized by Male/Female/Other
  • βœ… Loading States - Visual feedback during generation
  • βœ… Error Display - User-friendly error messages
  • βœ… Audio Preview - Built-in player
  • βœ… Download - One-click MP3 download
  • βœ… Responsive - Works on all screen sizes

πŸ› Troubleshooting

No voices available

  • Windows: Check Settings β†’ Time & Language β†’ Speech
  • Linux: Install espeak or festival
    sudo apt-get install espeak

FFmpeg not found

# Verify installation
ffmpeg -version

# If not found, reinstall and add to PATH

Port already in use

Change port in run.py:

uvicorn.run("app.main:app", port=8001)

CORS errors

  • Ensure backend is running on port 8000
  • Check API_BASE in frontend/script.js

Module not found

cd backend
pip install -r requirements.txt

πŸš€ Production Deployment

Using Gunicorn (Linux)

pip install gunicorn
gunicorn app.main:app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Using Docker

FROM python:3.10-slim

RUN apt-get update && apt-get install -y ffmpeg

WORKDIR /app

COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY backend/ .

EXPOSE 8000

CMD ["python", "run.py"]

Build and run:

docker build -t alphawave-tts .
docker run -p 8000:8000 alphawave-tts

πŸ”’ Security Features

  • βœ… Text length limited to 300000 characters
  • βœ… Input sanitization via Pydantic validators
  • βœ… Invalid voice IDs rejected
  • βœ… Files auto-deleted after 24 hours
  • βœ… No external API calls (fully offline)
  • βœ… No sensitive data stored

πŸ“Š Performance

  • Generation Time: 2-5 seconds average
  • Concurrent Requests: Supported via async operations
  • Memory Footprint: Minimal (singleton pattern)
  • File Cleanup: Automatic background task

πŸ“ Development Mode

For development with auto-reload:

cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

🀝 Contributing

Feel free to submit issues and enhancement requests!

πŸ“„ License

MIT License - Free to use in your projects

πŸ’‘ Tips

  • Use shorter text for faster generation
  • Select appropriate voice for your use case
  • Adjust speed for natural-sounding speech
  • Check logs for debugging: backend/ directory

Built with ❀️ using FastAPI, pyttsx3, and pydub

For support, check API docs at /docs or application logs.

About

πŸŽ™οΈ Production-Ready Offline Text-to-Speech Web Application built with FastAPI & pyttsx3. Generate high-quality MP3 audio completely offline with smart voice selection, speed control, and auto cleanup.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors