videowipe

Video inpainting library powered by STTN.
Remove hardcoded subtitles, watermarks, and text overlays. pip install videowipe and go.

中文

What it does

videowipe uses a Spatial-Temporal Transformer Network to erase hardcoded subtitles from video. You provide a video and a mask image marking the region to erase, or let the built-in detector generate one. The model fills in the background using temporal information from surrounding frames.

Install

Requires Python 3.8+ and either ONNX Runtime or PyTorch.

# If you already have PyTorch:
pip install videowipe

# Lightweight ONNX Runtime backend:
pip install videowipe[onnx]

# Or the PyTorch backend:
pip install videowipe[torch]

# Optional: OCR text recognition for better detection accuracy
pip install videowipe[ocr]

Model weights download automatically on first run to ~/.videowipe/weights/. No manual setup needed.

Usage

Python API

from videowipe import remove_text

# Mask is optional — subtitle regions are auto-detected if omitted
remove_text(
    video="input.mp4",
    output="result/",
)

# Or provide your own mask for full control
remove_text(
    video="input.mp4",
    mask="mask.png",
    output="result/",
)

Clean command

Use task="clean" for the full detection pipeline with target selection, intent parsing, and OCR:

from videowipe import WipeEngine

engine = WipeEngine(task="clean", detect_mode="balanced", ocr="auto")
engine.process(
    video="input.mp4",
    targets=["subtitle", "watermark"],
    regions=["bottom"],
    intent="remove Chinese subtitles and logo watermark",
    output="result/",
)
engine.cleanup()

Batch processing

Reuse the engine to avoid reloading the model:

from videowipe import WipeEngine

engine = WipeEngine(task="detext")
engine.process(video="clip1.mp4", output="result/")
engine.process(video="clip2.mp4", mask="mask.png", output="result/")
engine.cleanup()

CLI

# Auto-detect and remove all text overlays (recommended)
videowipe clean input.mp4 -o result/

# Legacy command: auto-detect subtitles only
videowipe detext -v input.mp4 -o result/

# With manual mask
videowipe detext -v input.mp4 -m mask.png -o result/

`clean` command options

# Only remove specific target types
videowipe clean input.mp4 --target subtitle
videowipe clean input.mp4 --target watermark

# Target a specific screen region
videowipe clean input.mp4 --region bottom
videowipe clean input.mp4 --region top-right

# Natural language intent
videowipe clean input.mp4 --intent "remove bottom Chinese subtitles"

# Preview detection results without processing
videowipe clean input.mp4 --preview -o result/

# Interactively confirm detected targets
videowipe clean input.mp4 --confirm

Flag	Description	Default
`--target`	Target type to clean (can repeat): `subtitle`, `timestamp`, `watermark`, `logo`	auto-detect all
`--region`	Screen region (can repeat): `top`, `bottom`, `top-left`, `top-right`, `bottom-left`, `bottom-right`, `center`	all regions
`--intent`	Natural-language cleanup intent	—
`--preview`	Write detection artifacts only (no inpainting)	off
`--confirm`	Show detected targets and confirm before processing	off
`--detect-mode`	Detection preset: `fast` (24 frames), `balanced` (50), `sensitive` (80)	`balanced`
`--ocr`	OCR text recognition: `auto`, `off`, `rapidocr`	`auto`
`--agent`	Local LLM CLI for intent-based selection (e.g., `claude`, `codex`)	—
`--external-command`	External inpainting command (bypasses built-in STTN)	—
`-g, --gap`	Segment length per pass; higher = better quality, slower	`200`
`-d, --dual`	Show original video side-by-side in output	off

`detext` command arguments

Flag	Description	Default
`-v, --video`	Input video path	required
`-m, --mask`	Mask image path (auto-detect if omitted)	auto
`-o, --output`	Output directory	`result/`
`-w, --weight`	Model weight path. PyTorch accepts `.pth`/`.pt`; ONNX expects a prefix path ending in `.onnx` with matching `_encoder`, `_transformer`, and `_decoder` files.	auto
`-g, --gap`	Segment length per pass; higher = better quality, slower	`200`
`-d, --dual`	Show original video side-by-side in output	off
`--external-command`	External inpainting command (bypasses built-in STTN)	—

External models

Pass --external-command to use any third-party inpainting model instead of the built-in STTN. The command receives <video> <mask> <output_dir> and must produce an output video in the output directory.

ProPainter has been validated as a higher-quality alternative. A ready-to-use wrapper is included:

# Clone ProPainter outside this repo first
git clone https://github.com/sczhou/ProPainter.git ../models/ProPainter

# Use via the wrapper (requires CUDA PyTorch + fp16)
videowipe clean input.mp4 --external-command "python scripts/propainter_wipe.py"

Note: ProPainter requires a GPU with ~16GB VRAM for 480p video and is licensed under NTU S-Lab License 1.0 (non-commercial).

Quality comparison: ProPainter vs STTN

Tested on a multilingual music video (Korean + Burmese subtitles, 852x480, 10s clip). Both models used the same mask.

Original	ProPainter (GPU fp16)	STTN (CPU ONNX)

ProPainter removes all text including overlaid text on moving objects. STTN misses text on moving objects and shows visible blur in restored regions. Full evaluation details in plans/candidate-eval-propainter.md.

Preview

Subtitle removal

Before	After

Watch video

Auto-detection accuracy

Built-in detector locates text regions across multilingual content without manual masks:

Video	Candidates	Selected	Types
Chinese drama	4	2	top subtitle, bottom subtitle
English clip	2	2	bottom subtitle
Music video (Korean + Burmese)	7	5	top watermark, bottom multilingual subtitles

Tested with --detect-mode balanced (50 sampled frames). Green boxes show selected regions for inpainting.

How it works

The model is an STTN (Spatial-Temporal Transformer Network) with 8 stacked transformer blocks operating on multi-scale patches. It encodes video frames with a CNN backbone, runs temporal attention across neighboring and reference frames, then decodes the inpainted result.

Key optimizations in this fork: AMP mixed-precision inference and channels_last memory layout. A 23-second test clip processes in 125s (down from 200s in the original).

Docker

No Python? No problem. Run videowipe directly with Docker.

CPU:

docker pull ghcr.io/kkenny0/videowipe:latest
docker run --rm -v "$(pwd)":/data ghcr.io/kkenny0/videowipe clean /data/input.mp4 -o /data/result/

# Legacy detext command
docker run --rm -v "$(pwd)":/data ghcr.io/kkenny0/videowipe detext -v /data/input.mp4 -o /data/result/

GPU (requires NVIDIA Container Toolkit):

docker pull ghcr.io/kkenny0/videowipe:gpu
docker run --rm --gpus all -v "$(pwd)":/data ghcr.io/kkenny0/videowipe:gpu clean /data/input.mp4 -o /data/result/

Or use the included wrapper script (auto-detects GPU):

./scripts/docker-videowipe.sh detext -v input.mp4 -o result/

Image	Size	GPU	Notes
`videowipe:latest`	~480 MB	No	CPU only, smallest image
`videowipe:gpu`	~1.4 GB	Yes	ONNX Runtime with CUDA

Build from source

Use --target to select the image variant:

# CPU
docker build --target runtime-cpu -t videowipe:latest .

# GPU (requires NVIDIA Container Toolkit at build time for base image)
docker build --target runtime-gpu --build-arg VARIANT=gpu -t videowipe:gpu .

Note: The GPU image requires a machine with NVIDIA runtime to verify CUDA execution. Without it, ONNX Runtime silently falls back to CPU.

Run after building:

# CPU
docker run --rm -v "$(pwd)":/data videowipe:latest detext -v /data/input.mp4 -o /data/result/

# GPU
docker run --rm --gpus all -v "$(pwd)":/data videowipe:gpu detext -v /data/input.mp4 -o /data/result/

Credits

This project builds on STTN and the original Video-Auto-Wipe implementation. The built-in text detection model is from OnnxOCR.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
input/detext_examples		input/detext_examples
pics		pics
scripts		scripts
src/videowipe		src/videowipe
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

videowipe

What it does

Install

Usage

Python API

Clean command

Batch processing

CLI

`clean` command options

`detext` command arguments

External models

Preview

Subtitle removal

Auto-detection accuracy

How it works

Docker

Build from source

Credits

License

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

videowipe

What it does

Install

Usage

Python API

Clean command

Batch processing

CLI

clean command options

detext command arguments

External models

Preview

Subtitle removal

Auto-detection accuracy

How it works

Docker

Build from source

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`clean` command options

`detext` command arguments

Packages