InsensitiveSet: reimplement without OrderedSet#1369
Conversation
|
🤔 I'm perhaps going to reconsider the "not wanting to guarantee the un-normalized results of a union" thing because the admin tests won't pass without me ensuring the left-hand-side wins. |
|
Thinking further, it feels like the "right" thing to do is actually to always let the later value "win", because this is how |
|
We could always feed the placeholders into the |
|
I looked into it further and the problem isn’t just with unions: InsensitiveSet(('NAME', 'name'))
>>> InsensitiveSet(['name'])I guess |
|
Yep, the workaround I've put in is a double-reverse, which I don't love either, but remember previously we were constructing a new |
0667710 to
31f1e1b
Compare
31f1e1b to
7398c09
Compare
|
We need to agree on exact desired behaviour. |
|
Chris sez he would ideally like it to work with first and left-most un-normalised values take precedence over later and right-most ones. This makes sense in some ways but it's awkward to make a fast implementation. |
this should allow a significantly faster implementation, using modern python's order-preserving dict, storing normalized items as keys and the original items as their corresponding values. behaviour is a compromise between what i deem Sensible, yet following the original InsensitiveSet's behaviour closely enough to pass the unmodified test suite. for example, this implements equality comparison against plain Iterables that requires ordering to match, even though i think it's a bit silly. operations and reverse-operations (other on LHS, other on RHS) should work, even againsta plain Iterable. non-normalised values should always be taken from the LHS in operations where duplicates are included. this is contrary to the behaviour of dict.update(...), which always prefers later values when there are duplicate keys in an item-pair iterable. so instead of using dict.update(...) we have to create our own _add_inner_pairs(...) method which will entirely skip already-present values.
7398c09 to
577d4b0
Compare
|
Have pushed a version that is left-prioritising. |

This should allow a significantly faster implementation, using modern python's order-preserving dict, storing normalized items as keys and the original items as their corresponding values.
Behaviour is a compromise between what I deem Sensible, yet following the original
InsensitiveSet's behaviour closely enough to pass the unmodified test suite. For example, this implements equality comparison against plainIterables that requires ordering to match, even though I think it's a bit silly.Operations and reverse-operations should work, even against a plain
Iterable.The existing implementation of
InsensitiveSetis a little bit odd/incomplete and has quite an inefficient implementation of__contains__(which may be the most heavily used of its set operations). A more robust implementation ofInsensitiveSetwill allow us to confidently make more use of it withinTemplate&RecipientCSVfor performance improvements there.Note the expected results in the expanded tests are flexible in the accepted intersecting un-normalised values forInstead of this, I'm strictly declaring that un-normalised values will come from the right-hand-side of an operation and the last occurrence in a value stream containing duplicates. This means I had to make an alteration to__or__,__ror__and__ior__(i.e. do they come from the left side or right side?). This is because I'd really prefer to be free to choose whatever algorithm is fastest for these operations and not have users depend heavily on the exact behaviour.Field.placeholderto make it prioritise the first occurrence of duplicated placeholders.A very rudimentary test based on an expanded version of
test_markdown_in_templatescalled in a loop showed a modest ~9% speedup in template processing. GivenInsensitiveSetisn't currently very heavily used in template processing, I think this is a decent result.The new
InsensitiveSetis also heavily type-hinted.