Skip to content

fix: never silently emit or overwrite a file with truncated XML#230

Open
gronke wants to merge 1 commit into
collective:masterfrom
gronke:fix/xml-data-loss-guard-and-atomic-write
Open

fix: never silently emit or overwrite a file with truncated XML#230
gronke wants to merge 1 commit into
collective:masterfrom
gronke:fix/xml-data-loss-guard-and-atomic-write

Conversation

@gronke

@gronke gronke commented Jun 10, 2026

Copy link
Copy Markdown

Defense-in-depth for the silent-truncation class of bugs; complements the prolog-blank-line fix #229.

  • Refuse on content loss — zpretty parses in lxml recover mode, which drops nodes from malformed input and emits the truncated result with exit 0. It now re-checks the input with a non-recovering parser and raises ContentLossError instead: the CLI exits non-zero and writes nothing. Scoped to standalone XML documents (fragments and plain text are unaffected).
  • Atomic --inplace — replace open(path, "w") (which truncates before writing) with a temp file + os.replace(), so an interrupted or failing write can't leave a half-written file.

Recommend merging #229 first; with it in place the guard stays quiet for valid documents and only fires on genuinely malformed input.

@coveralls

coveralls commented Jun 10, 2026

Copy link
Copy Markdown

Coverage Status

Coverage is 96.613%gronke:fix/xml-data-loss-guard-and-atomic-write into collective:master. No base build found for collective:master.

@gronke gronke force-pushed the fix/xml-data-loss-guard-and-atomic-write branch from 3b059e1 to 3b9af28 Compare June 10, 2026 10:35

@ale-rt ale-rt left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot again for your contributions!
Please have a look at my comment and do not forget to add a line in the history file.

Comment thread zpretty/tests/test_xml.py Outdated

def test_malformed_xml_raises_instead_of_truncating(self):
"""Malformed XML must raise rather than be silently truncated."""
text = (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently zpretty would fix this XML:

$ cat tmp.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <a>text</b>
</root>
$ .venv/bin/zpretty tmp.xml 
<?xml version="1.0" encoding="utf-8"?>
<root>
  <a>text</a>
</root>

I do not think it is a good idea to lose this feature.
Can you come up with a better test?
Even better if you can provide a file that would cause such an error when running zpretty path/to/file.xml.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed. The guard now refuses only genuine truncation (content after the root element, which recover mode silently drops), so <a>text</b> is still repaired to <a>text</a> as before. Added a fixture that trips it via zpretty path/to/file.xml, plus the HISTORY line ^

zpretty parses XML in lxml's recover mode, which drops nodes from
malformed input and emits the truncated result with exit code 0. Guard
the silent-truncation class of bug:

- Refuse on content loss: re-check the input and raise `ContentLossError`
  when lxml reports `ERR_DOCUMENT_END`, i.e. content after the root
  element that recover mode would drop. Recoverable input is still
  repaired as before, so `<a>x</b>` keeps becoming `<a>x</a>`. Scoped to
  standalone XML documents; fragments and non-XML are untouched.
- Atomic `--inplace`: replace `open(path, "w")`, which truncates before
  writing, with a temp file plus `os.replace()`, so a failing or
  interrupted write cannot leave a half-written file.

The CLI then exits non-zero and writes nothing when the guard fires.
@gronke gronke force-pushed the fix/xml-data-loss-guard-and-atomic-write branch from 3b9af28 to 2a9ba3a Compare June 11, 2026 13:09

@ale-rt ale-rt left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned this PR does 2 things:

  1. validates the file before running zpretty
  2. performs an atomic operation when replacing the file contents

I think we should have two issues to discuss if we want them, as both these features have implications and there is no urgency to have these fixed.

If we want to have these features, they should be implemented in the right way.

To name a few:

  1. this PR will not display a formatted file even if there is no write operation.
  2. what is going to happen when we have a symlink
  3. what is going to happen to permissions and file attributes

Comment thread zpretty/cli.py
return sorted(good_paths)

@staticmethod
def _atomic_write(path, content):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am failing to see the practical need to write the file to a temporary location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants