Zir (a diacritic cleaner)

Zir is a lightweight, command-line utility designed to identify and clean unwanted Persian diacritics (like Kasra and Tanvin) and other junk characters from your text files. It is particularly useful for cleaning up text generated by Large Language Models (LLMs).

🚀 Why Zir?

As AI models often introduce hidden characters or incorrect diacritics into Persian text, Zir helps you maintain the quality and consistency of your documentation and source code files by identifying these issues quickly.

🛠 Usage

You can use the following commands to scan your project.

1. Identify files with diacritics and count occurrences

This command lists all affected files and shows the number of matches in each:

grep -rIE "$(printf '[\u0650\u064D]')" . --exclude-dir={_build,build,logs,venv,.venv,.git} --files-with-matches | xargs grep -cE "$(printf '[\u0650\u064D]')"

2. Detailed scan with line numbers

This command displays the exact line, content, and the total count of matches:

grep --color=always -rnIE "$(printf '[\u0650\u064D]')" . --exclude-dir={_build,build,logs,venv,.venv,.git} | tee /dev/tty | wc -l

⚙️ Configuration

The tool automatically ignores common directories like venv, build, and .git. You can add more directories or customize the character list directly within the commands.

🤝 Contributing

Feel free to open an issue or submit a pull request if you want to add support for more character types or improve the scanning logic.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zir (a diacritic cleaner)

🚀 Why Zir?

🛠 Usage

1. Identify files with diacritics and count occurrences

2. Detailed scan with line numbers

⚙️ Configuration

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Zir (a diacritic cleaner)

🚀 Why Zir?

🛠 Usage

1. Identify files with diacritics and count occurrences

2. Detailed scan with line numbers

⚙️ Configuration

🤝 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages