fuzzy_nucleo: Add a tail proximity bonus to fix the double hits not sorted higher bonus.#59224
Open
feitreim wants to merge 4 commits into
Open
fuzzy_nucleo: Add a tail proximity bonus to fix the double hits not sorted higher bonus.#59224feitreim wants to merge 4 commits into
feitreim wants to merge 4 commits into
Conversation
9d5608a to
bf0dd5e
Compare
Contributor
Author
|
Okay I have realized that this doesn't work for the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Objective
Fixes #55195
The issue is that if a path contains some part of the query multiple times, like
x/x.pyit may not outrank results likey/x.py(for the queryx), despite the fact that it should.Solution
I added a very small bonus that rewards the proximity of the matches to the tail-end of the path, this is easy because nucleo naturally favors the tail, so the tail most match for a given query atom will be the one that is found. This bonus does need to be small, though not too small, in order to not disrupt any other scoring orders.
The main benefit of this over a different approach like counting occurrences of an atom is speed, this approach is constant time and just overall very low overhead, but performance would be much more of a concern with other approaches.
Testing
All the file finder tests still pass.
Directly showing off the difference on the case from the original issue:

This case shows that it doesn't overpower the other bonuses, like the length bonus.

Self-Review Checklist:
Release Notes: