Add canEncode validator for use in callback urls and bearer tokens by whpearson · Pull Request #5920 · alphagov/notifications-admin

whpearson · 2026-05-12T15:02:23Z

To validate that we can encode with latin-1.

This is needed to stop unicode errors we are seeing in notifications-api when attempting
to send to these callback urls

karlchillmaid · 2026-05-13T09:28:34Z

                r"(?:#[\w\-._~%!$&'()*+,;=:@/?]*)?$",
                message="Must be a valid https URL",
            ),
+            CanEncode(message="Urls have to be capable of being encoded in latin-1"),


Suggested change

CanEncode(message="Urls have to be capable of being encoded in latin-1"),

CanEncode(message="URL cannot include Unicode characters"),

I don‘t know how much we care about technical correctness here but URLs can contain Unicode characters. Unicode contains basically all characters, including boring ones like a or 1.

When people try to use emoji in text messages we don’t get into the weeds about character sets. Instead we tell them which characters are the problem:

notifications-admin/app/main/validators.py

Lines 166 to 171 in 06fe9c3

raise ValidationError(

"You cannot use {} in text messages. {} will not display properly on some phones.".format(

formatted_list(non_sms_characters, conjunction="or", before_each="", after_each=""),

("It" if len(non_sms_characters) == 1 else "These characters"),

)

)

Unicode characters have to be percent encoded, so perhaps that should be the correct error message? Happy to highlight the characters that should be.

Yeah, if the error message was something like

🤪 and ŵ must be percent-encoded in URLs

that would be better

"You cannot use the following characters in URIs. These characters, ∆ or 📲, might be mis-encoded."

I kept close to the sms error. What do you think @quis @karlchillmaid ?

How about this?

You cannot use ∆ in a web address. You must use percent encoding if you want to include this character in a URL.

You cannot use ∆ or 📲 in a web address. You must use percent encoding if you want to include these characters in a URL.

I went with calling it a web address in in both places (for ease of coding/consistency)

Removed the repetition now. So it is consistent with @karlchillmaid 's advice (in slack)

@karlchillmaid

To validate that we can encode with latin-1. This is needed to stop unicode errors we are seeing in notifications-api when attempting to send to these callback urls. Edited the validator to return a list of characters that cannot be validated. With @karlchillmaid for content work

quis · 2026-06-05T12:52:13Z

    assert mock_field.error_summary_messages == ["No sequences in %s please"]
+
+
+@pytest.mark.parametrize(


These tests are good, but they only check that the CanEncode validator works as expected. They don’t test that it’s being used on the bearer token field.

Pushed up a version that checks whether CallbackForms have the CanEncode validators attached to the fields. I'd rather not test the functionality twice (to avoid duplication of effort if it changed).

I couldn't find examples of this kind of code (our forms tests are quite small), so pointers on how to do it properly appreciated.

To avoid it getting left off.

karlchillmaid reviewed May 13, 2026

View reviewed changes

whpearson force-pushed the add_error_for_non_latin_url branch from 705b164 to a7a9d8e Compare May 29, 2026 11:34

whpearson changed the title ~~WIP on Error for non-latin in URLs~~ Add canEncode validator for use in callback urls and bearer tokens May 29, 2026

whpearson marked this pull request as ready for review May 29, 2026 11:41

whpearson force-pushed the add_error_for_non_latin_url branch 3 times, most recently from 0609e2a to be3a3b9 Compare June 2, 2026 12:29

whpearson requested review from karlchillmaid and quis June 2, 2026 15:10

whpearson force-pushed the add_error_for_non_latin_url branch from be3a3b9 to 0df34e1 Compare June 4, 2026 11:27

whpearson force-pushed the add_error_for_non_latin_url branch from 0df34e1 to efb4252 Compare June 5, 2026 11:47

quis reviewed Jun 5, 2026

View reviewed changes

Testing that CanEncode is applied to CallbackForms

e4ae7d9

To avoid it getting left off.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add canEncode validator for use in callback urls and bearer tokens#5920

Add canEncode validator for use in callback urls and bearer tokens#5920
whpearson wants to merge 2 commits into
mainfrom
add_error_for_non_latin_url

whpearson commented May 12, 2026 •

edited

Loading

Uh oh!

karlchillmaid May 13, 2026

Uh oh!

quis Jun 1, 2026

Uh oh!

whpearson Jun 2, 2026

Uh oh!

quis Jun 2, 2026

Uh oh!

whpearson Jun 2, 2026

Uh oh!

karlchillmaid Jun 3, 2026 •

edited

Loading

Uh oh!

whpearson Jun 4, 2026

Uh oh!

whpearson Jun 5, 2026

Uh oh!

quis Jun 5, 2026

Uh oh!

whpearson Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	CanEncode(message="Urls have to be capable of being encoded in latin-1"),
	CanEncode(message="URL cannot include Unicode characters"),

	raise ValidationError(
	"You cannot use {} in text messages. {} will not display properly on some phones.".format(
	formatted_list(non_sms_characters, conjunction="or", before_each="", after_each=""),
	("It" if len(non_sms_characters) == 1 else "These characters"),
	)
	)

		assert mock_field.error_summary_messages == ["No sequences in %s please"]


		@pytest.mark.parametrize(

Conversation

whpearson commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karlchillmaid Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whpearson Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

whpearson commented May 12, 2026 •

edited

Loading

karlchillmaid Jun 3, 2026 •

edited

Loading

whpearson Jun 5, 2026 •

edited

Loading