fix: never silently emit or overwrite a file with truncated XML#230
fix: never silently emit or overwrite a file with truncated XML#230gronke wants to merge 1 commit into
Conversation
3b059e1 to
3b9af28
Compare
ale-rt
left a comment
There was a problem hiding this comment.
Thanks a lot again for your contributions!
Please have a look at my comment and do not forget to add a line in the history file.
|
|
||
| def test_malformed_xml_raises_instead_of_truncating(self): | ||
| """Malformed XML must raise rather than be silently truncated.""" | ||
| text = ( |
There was a problem hiding this comment.
Currently zpretty would fix this XML:
$ cat tmp.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>text</b>
</root>
$ .venv/bin/zpretty tmp.xml
<?xml version="1.0" encoding="utf-8"?>
<root>
<a>text</a>
</root>I do not think it is a good idea to lose this feature.
Can you come up with a better test?
Even better if you can provide a file that would cause such an error when running zpretty path/to/file.xml.
There was a problem hiding this comment.
Thanks, fixed. The guard now refuses only genuine truncation (content after the root element, which recover mode silently drops), so <a>text</b> is still repaired to <a>text</a> as before. Added a fixture that trips it via zpretty path/to/file.xml, plus the HISTORY line ^
zpretty parses XML in lxml's recover mode, which drops nodes from malformed input and emits the truncated result with exit code 0. Guard the silent-truncation class of bug: - Refuse on content loss: re-check the input and raise `ContentLossError` when lxml reports `ERR_DOCUMENT_END`, i.e. content after the root element that recover mode would drop. Recoverable input is still repaired as before, so `<a>x</b>` keeps becoming `<a>x</a>`. Scoped to standalone XML documents; fragments and non-XML are untouched. - Atomic `--inplace`: replace `open(path, "w")`, which truncates before writing, with a temp file plus `os.replace()`, so a failing or interrupted write cannot leave a half-written file. The CLI then exits non-zero and writes nothing when the guard fires.
3b9af28 to
2a9ba3a
Compare
ale-rt
left a comment
There was a problem hiding this comment.
As mentioned this PR does 2 things:
- validates the file before running zpretty
- performs an atomic operation when replacing the file contents
I think we should have two issues to discuss if we want them, as both these features have implications and there is no urgency to have these fixed.
If we want to have these features, they should be implemented in the right way.
To name a few:
- this PR will not display a formatted file even if there is no write operation.
- what is going to happen when we have a symlink
- what is going to happen to permissions and file attributes
| return sorted(good_paths) | ||
|
|
||
| @staticmethod | ||
| def _atomic_write(path, content): |
There was a problem hiding this comment.
I am failing to see the practical need to write the file to a temporary location.
Defense-in-depth for the silent-truncation class of bugs; complements the prolog-blank-line fix #229.
0. It now re-checks the input with a non-recovering parser and raisesContentLossErrorinstead: the CLI exits non-zero and writes nothing. Scoped to standalone XML documents (fragments and plain text are unaffected).--inplace— replaceopen(path, "w")(which truncates before writing) with a temp file +os.replace(), so an interrupted or failing write can't leave a half-written file.Recommend merging #229 first; with it in place the guard stays quiet for valid documents and only fires on genuinely malformed input.