Skip to content

Account for multiple filters matching on the same token#105

Open
stevepolitodesign wants to merge 3 commits into
mainfrom
sp-51
Open

Account for multiple filters matching on the same token#105
stevepolitodesign wants to merge 3 commits into
mainfrom
sp-51

Conversation

@stevepolitodesign
Copy link
Copy Markdown
Contributor

@stevepolitodesign stevepolitodesign commented May 14, 2026

Relates to: #51

Prior to this commit, if multiple filters matched the same token, filter
precedence would account for the labeling:

result = TopSecret::Text.filter("My name is Austin, and I live in Austin TX.")

result.output
# => "My name is [PERSON_1], and I live in [PERSON_1] TX."

This commit ensures each filter is labeled correctly.

result = TopSecret::Text.filter("My name is Austin, and I live in Austin TX.")

result.output
# => "My name is [PERSON_1], and I live in [LOCATION_1] TX."

Copilot AI review requested due to automatic review settings May 14, 2026 13:26
let(:austin_location) { build_entity(text: "Austin", tag: :location) }

before do
stub_ner_entities(austin_person, austin_location)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should check to see what happens if we swap the order:

Suggested change
stub_ner_entities(austin_person, austin_location)
stub_ner_entities(austin_location, austin_person)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does _ner_ mean in the method name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named entity recognition. I learned about it from using MITIE Ruby.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! You've probably explained it before, which makes me think, could this method use the full name instead of the acronym so it's more accessible to the reader?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I wanted to be consistent with how MITIE Ruby names things, since it has a Mitie::NER class.

Comment on lines +103 to +106
result = TopSecret::Text.filter(
"Primary 192.168.1.1, backup 192.168.1.1.",
custom_filters: [ip_filter, server_filter]
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Suggested change
result = TopSecret::Text.filter(
"Primary 192.168.1.1, backup 192.168.1.1.",
custom_filters: [ip_filter, server_filter]
)
result = TopSecret::Text.filter(
"Primary 192.168.1.1, backup 192.168.1.1.",
custom_filters: [server_filter, ip_filter]
)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates TopSecret::Text redaction so that when the same token is matched by multiple filters, each occurrence can be labeled according to the filter that matched it (rather than having the last-applied filter label overwrite all occurrences).

Changes:

  • Added new specs covering “same value matched by multiple filters” for both NER and regex filters.
  • Replaced the previous one-pass substitution logic with a new TopSecret::Text::Substitution helper.
  • Documented the fix in the Unreleased changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
spec/top_secret/text_spec.rb Adds regression tests for multi-filter / same-token matching cases.
lib/top_secret/text/substitution.rb Introduces new substitution strategy for handling multiple labels per value.
lib/top_secret/text.rb Routes substitution through the new Substitution class.
CHANGELOG.md Notes the behavioral fix under Unreleased.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/top_secret/text/substitution.rb
Comment thread lib/top_secret/text/substitution.rb Outdated
Comment on lines +59 to +61
return labels.last(1) if occurrences.zero?

labels.last(occurrences)
Comment thread spec/top_secret/text_spec.rb
This edge-case was uncovered when running the `/review` command.
Comment on lines +113 to +120
ip_filter = TopSecret::Filters::Regex.new(
label: "IP_ADDRESS",
regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)
server_filter = TopSecret::Filters::Regex.new(
label: "SERVER",
regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it clearer that these are identical expressions, what do you think of this?

Suggested change
ip_filter = TopSecret::Filters::Regex.new(
label: "IP_ADDRESS",
regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)
server_filter = TopSecret::Filters::Regex.new(
label: "SERVER",
regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)
regex = /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
ip_filter = TopSecret::Filters::Regex.new(
label: "IP_ADDRESS",
regex:
)
server_filter = TopSecret::Filters::Regex.new(
label: "SERVER",
regex:
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants