A complete, production-grade offline Text-to-Speech web application built with FastAPI and pyttsx3.
- β‘ Ultra Fast Processing - Optimized TTS engine with singleton pattern
- π 100% Offline & Secure - No internet required, complete privacy
- π Studio Quality Output - 192kbps MP3 audio files via pydub
- π― Smart Voice Selection - Automatic male/female voice detection
- βοΈ Speed Control - Adjustable speech rate (100-200 WPM)
- π§Ή Auto Cleanup - Background task removes files older than 24 hours
- π Production Ready - Async operations, logging, error handling
- π Non-Blocking - Async-friendly audio generation
- π Request Logging - Detailed timing and status tracking
- FastAPI - Modern async web framework
- pyttsx3 - Offline TTS engine (singleton pattern)
- pydub - WAV β MP3 conversion (192kbps)
- Uvicorn - ASGI production server
- Pydantic - Request/response validation
- Pure HTML/CSS/JavaScript
- Modern dark theme UI
- Responsive design
- Fetch API integration
- Python 3.8+
- FFmpeg (required for pydub MP3 conversion)
Windows:
- Download from https://ffmpeg.org/download.html
- Extract to
C:\ffmpeg - Add
C:\ffmpeg\binto PATH environment variable - Restart terminal
Linux:
sudo apt-get update
sudo apt-get install ffmpegmacOS:
brew install ffmpegcd AlphaWavecd backend
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate
pip install -r requirements.txtpython run.pyBackend will run on: http://localhost:8000
Simply open frontend/index.html in your browser, or serve it:
cd frontend
python -m http.server 3000Then visit: http://localhost:3000
Once backend is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Root endpoint with API info
Health check endpoint
{
"status": "healthy",
"version": "1.0.0"
}Returns all available system voices with gender detection
Response:
{
"voices": [
{
"id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
"name": "Microsoft David Desktop",
"gender": "male"
},
{
"id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
"name": "Microsoft Zira Desktop",
"gender": "female"
}
]
}Generates MP3 audio from text
Request:
{
"text": "Hello, this is AlphaWave TTS",
"voice_id": "HKEY_LOCAL_MACHINE\\SOFTWARE\\...",
"rate": 170
}Response:
{
"success": true,
"audio_url": "/static/audio/abc123-def456.mp3"
}Edit backend/app/config.py:
MAX_TEXT_LENGTH = 300000 # Maximum characters
AUDIO_BITRATE = "192k" # MP3 quality
FILE_RETENTION_HOURS = 24 # Auto-delete after
CLEANUP_INTERVAL_SECONDS = 3600 # Cleanup frequencyAlphaWave/
βββ backend/
β βββ app/
β β βββ main.py # FastAPI app with endpoints
β β βββ tts_engine.py # Singleton TTS engine
β β βββ models.py # Pydantic models
β β βββ config.py # Configuration
β βββ static/
β β βββ audio/ # Generated MP3 files
β βββ requirements.txt # Python dependencies
β βββ run.py # Production runner
βββ frontend/
β βββ index.html # Main UI
β βββ style.css # Dark theme styles
β βββ script.js # API integration
βββ README.md
- Enter text in the textarea (max 300000 characters)
- Select voice from dropdown (grouped by gender)
- Adjust speed using slider (100-200 WPM)
- Click Generate button
- Listen to audio preview
- Download MP3 file
Keyboard Shortcut: Ctrl + Enter to generate
- β Singleton Pattern - TTS engine reused across requests
- β
Async Operations - Non-blocking audio generation with
asyncio.to_thread - β Background Tasks - Automatic file cleanup every hour
- β Request Logging - Timing and status for every request
- β Error Handling - Graceful failures with proper messages
- β CORS Enabled - Cross-origin requests supported
- β Input Validation - Pydantic models with sanitization
- β WAV β MP3 - High-quality conversion via pydub
- β Character Counter - Real-time text length tracking
- β Voice Grouping - Organized by Male/Female/Other
- β Loading States - Visual feedback during generation
- β Error Display - User-friendly error messages
- β Audio Preview - Built-in player
- β Download - One-click MP3 download
- β Responsive - Works on all screen sizes
- Windows: Check Settings β Time & Language β Speech
- Linux: Install
espeakorfestivalsudo apt-get install espeak
# Verify installation
ffmpeg -version
# If not found, reinstall and add to PATHChange port in run.py:
uvicorn.run("app.main:app", port=8001)- Ensure backend is running on port 8000
- Check
API_BASEinfrontend/script.js
cd backend
pip install -r requirements.txtpip install gunicorn
gunicorn app.main:app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000FROM python:3.10-slim
RUN apt-get update && apt-get install -y ffmpeg
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY backend/ .
EXPOSE 8000
CMD ["python", "run.py"]Build and run:
docker build -t alphawave-tts .
docker run -p 8000:8000 alphawave-tts- β Text length limited to 300000 characters
- β Input sanitization via Pydantic validators
- β Invalid voice IDs rejected
- β Files auto-deleted after 24 hours
- β No external API calls (fully offline)
- β No sensitive data stored
- Generation Time: 2-5 seconds average
- Concurrent Requests: Supported via async operations
- Memory Footprint: Minimal (singleton pattern)
- File Cleanup: Automatic background task
For development with auto-reload:
cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Feel free to submit issues and enhancement requests!
MIT License - Free to use in your projects
- Use shorter text for faster generation
- Select appropriate voice for your use case
- Adjust speed for natural-sounding speech
- Check logs for debugging:
backend/directory
Built with β€οΈ using FastAPI, pyttsx3, and pydub
For support, check API docs at /docs or application logs.