Skip to content

Add canEncode validator for use in callback urls and bearer tokens#5920

Open
whpearson wants to merge 2 commits into
mainfrom
add_error_for_non_latin_url
Open

Add canEncode validator for use in callback urls and bearer tokens#5920
whpearson wants to merge 2 commits into
mainfrom
add_error_for_non_latin_url

Conversation

@whpearson
Copy link
Copy Markdown
Contributor

@whpearson whpearson commented May 12, 2026

To validate that we can encode with latin-1.

This is needed to stop unicode errors we are seeing in notifications-api when attempting
to send to these callback urls

Comment thread app/main/forms.py Outdated
r"(?:#[\w\-._~%!$&'()*+,;=:@/?]*)?$",
message="Must be a valid https URL",
),
CanEncode(message="Urls have to be capable of being encoded in latin-1"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CanEncode(message="Urls have to be capable of being encoded in latin-1"),
CanEncode(message="URL cannot include Unicode characters"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don‘t know how much we care about technical correctness here but URLs can contain Unicode characters. Unicode contains basically all characters, including boring ones like a or 1.

When people try to use emoji in text messages we don’t get into the weeds about character sets. Instead we tell them which characters are the problem:

raise ValidationError(
"You cannot use {} in text messages. {} will not display properly on some phones.".format(
formatted_list(non_sms_characters, conjunction="or", before_each="", after_each=""),
("It" if len(non_sms_characters) == 1 else "These characters"),
)
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode characters have to be percent encoded, so perhaps that should be the correct error message? Happy to highlight the characters that should be.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if the error message was something like

🤪 and ŵ must be percent-encoded in URLs

that would be better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You cannot use the following characters in URIs. These characters, ∆ or 📲, might be mis-encoded."

I kept close to the sms error. What do you think @quis @karlchillmaid ?

Copy link
Copy Markdown
Contributor

@karlchillmaid karlchillmaid Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

You cannot use ∆ in a web address. You must use percent encoding if you want to include this character in a URL.

You cannot use ∆ or 📲 in a web address. You must use percent encoding if you want to include these characters in a URL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with calling it a web address in in both places (for ease of coding/consistency)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the repetition now. So it is consistent with @karlchillmaid 's advice (in slack)

@whpearson whpearson force-pushed the add_error_for_non_latin_url branch from 705b164 to a7a9d8e Compare May 29, 2026 11:34
@whpearson whpearson changed the title WIP on Error for non-latin in URLs Add canEncode validator for use in callback urls and bearer tokens May 29, 2026
@whpearson whpearson marked this pull request as ready for review May 29, 2026 11:41
@whpearson whpearson force-pushed the add_error_for_non_latin_url branch 3 times, most recently from 0609e2a to be3a3b9 Compare June 2, 2026 12:29
@whpearson whpearson requested review from karlchillmaid and quis June 2, 2026 15:10
@whpearson whpearson force-pushed the add_error_for_non_latin_url branch from be3a3b9 to 0df34e1 Compare June 4, 2026 11:27
To validate that we can encode with latin-1.

This is needed to stop unicode errors we are seeing in notifications-api when attempting
to send to these callback urls.

Edited the validator to return a list of characters that cannot be validated.

With @karlchillmaid for content work
@whpearson whpearson force-pushed the add_error_for_non_latin_url branch from 0df34e1 to efb4252 Compare June 5, 2026 11:47
assert mock_field.error_summary_messages == ["No sequences in %s please"]


@pytest.mark.parametrize(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are good, but they only check that the CanEncode validator works as expected. They don’t test that it’s being used on the bearer token field.

Copy link
Copy Markdown
Contributor Author

@whpearson whpearson Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed up a version that checks whether CallbackForms have the CanEncode validators attached to the fields. I'd rather not test the functionality twice (to avoid duplication of effort if it changed).

I couldn't find examples of this kind of code (our forms tests are quite small), so pointers on how to do it properly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants