Sensei Knows is an Akinator engine that powers our official Anime Akinator. If you've ever wondered how Akinator works or wanted to build your own 20 Questions character guessing game, this repository is for you.
- Why we Built It ?
- Current Implementation
- The Architecture
- Dataset Mathematical Blueprint
- Usage & Installation
- License
It all started with a specific goal in mind: we wanted to build an Anime Akinator.
At first, we engineered the core guessing logic using C++ because of its performance. However, when the time came to hook this engine up to a live website as a backend service, that's when we faced many problems. Building a C++ web server was turning into an absolute nightmare as we had no experience it and could not find any framework which provided all what we needed.
We decided to ditch the C++ backend and switch entirely to Golang. Go's built-in concurrency, simple HTTP standard library, and ease of deployment made it the absolute perfect choice for us.
Right now, we're have a live instance of this exact engine for our original project running: Sensei Knows. You can play around with it and try to beat the engine here: 👉 anime-akinator.vercel.app
Here's a sneak peek of the frontend we hooked up to our Go engine:
But here is the best part: this engine is completely dataset-agnostic.
You can literally just swap out the dataset to make it guess Hollywood actors, cars, programming languages, or even your own friends.
If you end up using this engine for a different dataset or project, please let us know! (Do share your project link with us :) ) We'd absolutely love to see what kind of crazy things you build with it.
While decision trees and binary search algorithms are foundational to computer science for exact matching, building a robust guessing engine requires dealing with human uncertainty. Standard binary trees fail gracefully; a single incorrect answer or a "Don't Know" response can lead to an unrecoverable collapse of the search space.
Drawing inspiration from existing literature on probabilistic classifiers (such as Naive Bayes), our engine is built as a dynamic statistical classification system. This allows the engine to:
- Operate on a continuous probability space (
Yes= 1.0,Probably= 0.75,Don't Know= 0.50,Probably Not= 0.25,No= 0.0). - Handle noise and human error. A contradictory answer does not eliminate a candidate; it merely applies a mathematical penalty to their posterior probability.
- Dynamically compute the optimal sequence of questions based on the active state of the knowledge base, rather than relying on a static, pre-computed graph.
The engine's predictions are powered by several core mathematical concepts:
1. Probabilistic State Representation
Instead of boolean values, the knowledge base stores two parameters for every question associated with a character: a continuous weight
2. Posterior Probability & Beta-Prior Smoothing
When a user provides an answer
We compute a dynamic lower bound (epsilon) based on these Beta parameters. The final match score is bounded by this epsilon, ensuring the engine can always recover from anomalies.
3. Shannon Entropy & Information Gain
To determine which question to ask next, the engine does not look at a tree. It computes the current Shannon Entropy (
For every eligible question, the engine simulates the expected entropy
4. UCB (Upper Confidence Bound) Exploration To balance exploiting known good questions and exploring new ones, the engine applies a UCB bonus. Questions that have a low selection count receive a slight mathematical boost, ensuring the engine tests new pathways and doesn't get stuck in local optima.
Output Format When the engine has exhausted its question cap or hit its guess threshold, it calculates and outputs the final Top Candidates List along with their exact computed percentage confidence scores directly in the terminal.
A static database quickly becomes outdated. To solve this, the engine employs a continuous reinforcement learning mechanism. Every time a game concludes, whether the engine guessed correctly or was corrected by the user, the system learns from the session.
For the correct character, the engine updates its stored weights (
One of the biggest challenges in crowdsourced learning is malicious data poisoning (trolls). If users intentionally give wrong answers, they could corrupt the dataset. We built two layers of defense to prevent this:
1. Adversarial Flagging (Z-Collapse Detection)
During a session, the engine monitors the sum of probabilities (
2. Staging & Review System
When a user adds a new character or corrects the engine, the change does not go live immediately. Instead, it is pushed to a PendingCorrections queue. A correction must gather a minimum number of community consensus votes (MinVotesToPromote) from independent sessions before it is mathematically averaged and officially promoted into the live dataset.
To make the engine guess accurately in under 20 questions, your custom dataset needs a highly efficient structure. Based on our experience, here is what we followed our own dataset:
1. The "Golden Ratio" (Questions to Characters) For optimal convergence, you should aim for a 1:2 or 1:3 question-to-character ratio. For example, if your dataset has 100 characters, aiming for about 45-50 well-framed questions is the sweet spot.
2. Semantic vs. Specific Questions The secret to a fast guessing engine is maximizing the Expected Information Gain (EIG) early on.
- Broad/Semantic Questions: Roughly 40-50% of your dataset should be broad semantic questions designed to split the candidate pool by at least 20%. These are mathematically vital for early-game variance.
- Specific/Niche Questions: Keep highly specific questions to under 10% of your dataset. Reserve these exclusively for distinguishing between extremely similar characters deep in the search tree. The rest questions should be category based.
- Avoid Too Many Niche Questions: Keeping a niche question for each/most character(s) will make it eventually impossible to guess the character within 20 questions.
If you want to run the engine locally in your terminal, the process is incredibly simple. Make sure you have Go (1.21+) installed.
To compile the akinator-go binary, simply run:
make buildTo run a standard game session where the engine tries to guess your character based on your custom knowledge.json, pass the file path as an argument to the binary:
./akinator-go path/to/your/knowledge.jsonRemember our Anti-Troll system from Section 4.4? As people play, their corrections get stored in the queue. If you want to review the pending queue and manually promote corrections to the live database, pass the --review flag:
./akinator-go --review path/to/your/knowledge.jsonThis project is completely open-sourced under the AGPL 3.0 License.
Koro will meet you soon in any of his other forms! Don't forget to try and beat him at: https://anime-akinator.vercel.app/




















