Skip to content

InsensitiveSet: reimplement without OrderedSet#1369

Merged
risicle merged 4 commits into
mainfrom
ris-new-insensitive-set
Jun 16, 2026
Merged

InsensitiveSet: reimplement without OrderedSet#1369
risicle merged 4 commits into
mainfrom
ris-new-insensitive-set

Conversation

@risicle

@risicle risicle commented Jun 10, 2026

Copy link
Copy Markdown
Member

This should allow a significantly faster implementation, using modern python's order-preserving dict, storing normalized items as keys and the original items as their corresponding values.

Behaviour is a compromise between what I deem Sensible, yet following the original InsensitiveSet's behaviour closely enough to pass the unmodified test suite. For example, this implements equality comparison against plain Iterables that requires ordering to match, even though I think it's a bit silly.

Operations and reverse-operations should work, even against a plain Iterable.

The existing implementation of InsensitiveSet is a little bit odd/incomplete and has quite an inefficient implementation of __contains__ (which may be the most heavily used of its set operations). A more robust implementation of InsensitiveSet will allow us to confidently make more use of it within Template & RecipientCSV for performance improvements there.

Note the expected results in the expanded tests are flexible in the accepted intersecting un-normalised values for __or__, __ror__ and __ior__ (i.e. do they come from the left side or right side?). This is because I'd really prefer to be free to choose whatever algorithm is fastest for these operations and not have users depend heavily on the exact behaviour. Instead of this, I'm strictly declaring that un-normalised values will come from the right-hand-side of an operation and the last occurrence in a value stream containing duplicates. This means I had to make an alteration to Field.placeholder to make it prioritise the first occurrence of duplicated placeholders.

A very rudimentary test based on an expanded version of test_markdown_in_templates called in a loop showed a modest ~9% speedup in template processing. Given InsensitiveSet isn't currently very heavily used in template processing, I think this is a decent result.

The new InsensitiveSet is also heavily type-hinted.

@risicle risicle marked this pull request as ready for review June 10, 2026 14:53
@risicle risicle requested a review from quis June 10, 2026 14:53
Comment thread tests/test_insensitive_dict.py
Comment thread notifications_utils/insensitive_dict.py
Comment thread notifications_utils/insensitive_dict.py
Comment thread notifications_utils/insensitive_dict.py
@risicle

risicle commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

🤔 I'm perhaps going to reconsider the "not wanting to guarantee the un-normalized results of a union" thing because the admin tests won't pass without me ensuring the left-hand-side wins.

@risicle

risicle commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Thinking further, it feels like the "right" thing to do is actually to always let the later value "win", because this is how .update() always works for duplicates within an .update() item-pair stream. The only way to make it consistently let the first value win would be to stick a generator into each .update() item-pair stream to do a weird reach-around check..

@quis

quis commented Jun 11, 2026

Copy link
Copy Markdown
Member

The reason we care (and the reason why that admin test fails) is because we want the form label to match the first instance of the placeholder in the template exactly:

image

@risicle

risicle commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

We could always feed the placeholders into the InsensitiveSet in reverse.

@quis

quis commented Jun 11, 2026

Copy link
Copy Markdown
Member

I looked into it further and the problem isn’t just with unions:

InsensitiveSet(('NAME', 'name'))
>>> InsensitiveSet(['name'])

I guess Field.placeholders could do a double-reverse but I don’t love it…

Comment thread notifications_utils/insensitive_dict.py Outdated
@risicle

risicle commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Yep, the workaround I've put in is a double-reverse, which I don't love either, but remember previously we were constructing a new InsensitiveDict for each __contains__ call. This is only one additional dict construction.

@risicle risicle force-pushed the ris-new-insensitive-set branch 3 times, most recently from 0667710 to 31f1e1b Compare June 11, 2026 13:56
@risicle risicle requested a review from quis June 11, 2026 14:00
@risicle risicle force-pushed the ris-new-insensitive-set branch from 31f1e1b to 7398c09 Compare June 11, 2026 14:13
@risicle risicle marked this pull request as draft June 12, 2026 10:33
@risicle

risicle commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

We need to agree on exact desired behaviour.

@risicle

risicle commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Chris sez he would ideally like it to work with first and left-most un-normalised values take precedence over later and right-most ones. This makes sense in some ways but it's awkward to make a fast implementation.

risicle added 4 commits June 15, 2026 16:22
this should allow a significantly faster implementation,
using modern python's order-preserving dict, storing
normalized items as keys and the original items as their
corresponding values.

behaviour is a compromise between what i deem Sensible, yet
following the original InsensitiveSet's behaviour closely
enough to pass the unmodified test suite. for example, this
implements equality comparison against plain Iterables that
requires ordering to match, even though i think it's a bit
silly.

operations and reverse-operations (other on LHS, other on
RHS) should work, even againsta plain Iterable.

non-normalised values should always be taken from the LHS
in operations where duplicates are included. this is
contrary to the behaviour of dict.update(...), which always
prefers later values when there are duplicate keys in an
item-pair iterable. so instead of using dict.update(...)
we have to create our own _add_inner_pairs(...) method
which will entirely skip already-present values.
@risicle risicle force-pushed the ris-new-insensitive-set branch from 7398c09 to 577d4b0 Compare June 15, 2026 15:28
@risicle

risicle commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Have pushed a version that is left-prioritising.

@risicle risicle marked this pull request as ready for review June 15, 2026 15:30
@risicle risicle merged commit 9a6342b into main Jun 16, 2026
6 checks passed
@risicle risicle deleted the ris-new-insensitive-set branch June 16, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants