From 57be3a671e9d733fce529698d07111602ec04ad8 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Tue, 23 Jun 2026 17:29:08 -0700 Subject: [PATCH 01/11] 0.5.1 P0 + P1: preserve mid-statement comments + strip trailing whitespace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **P0 — Mid-statement comment data loss in variable_declarator.** The 0.5.0 AST emitter for variable_declarator used child_by_field_name to skip directly to the 'value' field, bypassing any line_comment / block_comment children sitting between '=' and the value. In source like: String recordDefinition = // @highlight region="recordDefinition" // @highlight type="italic" region="recordDefinition" """ ... """; both @highlight comments were silently dropped, leaving the @end region marker downstream unpaired and breaking 'mvn javadoc:javadoc' under JDK 21 in sz-sdk-java's demo files. The bug was idempotent on the broken output, so the data loss was unrecoverable by re-running the formatter. Fix: when mid-statement line/block comments are detected between '=' and the value RHS, source-preserve the entire '= ...' region verbatim from source. This matches the 0.5.0 treatment of comments inside argument lists (the arg-list fix landed; the assignment-RHS fix is the gap this commit closes). 3 new fixtures locking the behavior: - 01_assignment_text_block_with_snippet_markers (the recordDefinition reproducer from sz-sdk-java). - 02_assignment_string_literal_with_comment. - 03_snippet_region_round_trip (paired @start/@end markers straddling an assignment with @highlight regions). **P1 — Trailing whitespace bypass in write_raw_lines.** write_raw_lines's docstring documented intentional trailing-whitespace preservation 'because that whitespace is the developer's content' — correct for text-block content, incorrect for source-preserved CODE (conditions, arg lists, formal parameters, multi-line block comments). Senzing-commons-java had 3 src/main/java files carrying source-trailing-whitespace through the formatter into the output. Fix: add strip_trailing_ws keyword arg to write_raw_lines (default False to preserve text-block content behavior). All source-preserve-CODE call sites pass True: - _emit_array_initializer source-preserve path - _emit_array_creation_expression multi-row - _emit_switch_block_statement_arrow multi-row - _emit_synchronized_statement multi-row condition - _emit_while_statement multi-row condition - _emit_formal_parameters multi-row - _emit_comment for non-javadoc block comments - _emit_argument_list CSOFF / source-preserve path - _emit_argument_list shifted-column path - _emit_variable_declarator new mid-comment path (P0) Text-block content emitters (_emit_text_block at lines 840/867/881/913, _emit_comment javadoc path at 3405) keep the default False — text-block content trailing whitespace is the developer's intent per spec. 649 tests pass (was 646 at 0.5.0; +3 new fixtures). --- tooling/scripts/format_java.py | 112 ++++++++++++++---- .../expected.java | 16 +++ .../input.java | 16 +++ .../expected.java | 8 ++ .../input.java | 8 ++ .../expected.java | 16 +++ .../03_snippet_region_round_trip/input.java | 16 +++ 7 files changed, 167 insertions(+), 25 deletions(-) create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/expected.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/input.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/expected.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/input.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/expected.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/input.java diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index 49f74b4..a02167e 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -461,38 +461,42 @@ def last_lines_max_width(self, since: int) -> int: m = len(self._current) return m - def write_raw_lines(self, text: str) -> None: + def write_raw_lines( + self, text: str, *, strip_trailing_ws: bool = False + ) -> None: """Append text that may contain newlines, preserved verbatim. Used by leaf emitters for content the formatter must reproduce byte-for-byte — text blocks ("Text Blocks / Content preservation" spec section) and eventually block - comments. Newlines inside `text` finalize each intermediate - line WITHOUT stripping trailing whitespace, since that + comments. With the default `strip_trailing_ws=False`, + newlines inside `text` finalize each intermediate line + WITHOUT stripping trailing whitespace, since that whitespace is the developer's content (the spec's "Normalize spacing or alignment of content is a no-op" rule applies to text-block contents). + Source-preserved CODE (argument-list verbatim preservation, + mid-statement-comment preservation, etc.) is not + whitespace-significant content — trailing whitespace + there is just stray bytes the source author left behind + and the spec's "Trailing Whitespace" rule applies. Such + callers pass `strip_trailing_ws=True` to apply the same + `rstrip(" ")` that `newline()` does on its finalized + lines. + The in-progress line at the END of `text` (the part after the last newline) is left open so subsequent `write()` / - `newline()` calls continue normally. Note: trailing - whitespace that the DEVELOPER wrote at the very end of a - text block (after the final newline, before any - formatter-emitted continuation) will be stripped by the - eventual `newline()` / `finish()` — that case doesn't - arise in well-formed Java source because every - `string_literal` ends with a non-whitespace closing - quote token, so the final segment passed here is never a - bare-whitespace string. Future emitters that pass other - kinds of verbatim multi-line content should guarantee the - same invariant. + `newline()` calls continue normally. """ parts = text.split("\n") # First segment continues the current line. self._current += parts[0] for part in parts[1:]: - # Each intermediate line is verbatim — NO strip. - self._lines.append(self._current) + if strip_trailing_ws: + self._lines.append(self._current.rstrip(" ")) + else: + self._lines.append(self._current) self._current = part def push_indent(self) -> None: @@ -3398,7 +3402,11 @@ def _emit_comment( if text.startswith("/**"): _emit_javadoc_block(emitter, source, node, text) return - emitter.write_raw_lines(text) + # Non-javadoc block comments (`/* … */`) are source-preserved + # but trailing whitespace inside them is never intentional + # alignment (any deliberate ASCII-art alignment would be + # inside a CSOFF region with its own preservation path). + emitter.write_raw_lines(text, strip_trailing_ws=True) _LINE_COMMENT_DIRECTIVE_PREFIXES: Final[tuple[str, ...]] = ( @@ -3828,7 +3836,9 @@ def _emit_array_initializer( re-emits with the normalized spacing. """ if _node_spans_multiple_rows(node): - emitter.write_raw_lines(_node_source_text(source, node)) + emitter.write_raw_lines( + _node_source_text(source, node), strip_trailing_ws=True + ) return elements = [c for c in node.named_children] if not elements: @@ -3854,7 +3864,9 @@ def _emit_array_creation_expression( optionally followed by an `array_initializer`. """ if _node_spans_multiple_rows(node): - emitter.write_raw_lines(_node_source_text(source, node)) + emitter.write_raw_lines( + _node_source_text(source, node), strip_trailing_ws=True + ) return emitter.write("new ") for child in node.named_children: @@ -3989,7 +4001,9 @@ def _emit_switch_rule( more statements. Source-preservation for multi-row source. """ if _node_spans_multiple_rows(node): - emitter.write_raw_lines(_node_source_text(source, node)) + emitter.write_raw_lines( + _node_source_text(source, node), strip_trailing_ws=True + ) return label = None body_children: list[Node] = [] @@ -4206,7 +4220,9 @@ def _emit_synchronized_statement( ) emitter.write("synchronized ") if _node_spans_multiple_rows(cond): - emitter.write_raw_lines(_node_source_text(source, cond)) + emitter.write_raw_lines( + _node_source_text(source, cond), strip_trailing_ws=True + ) emitter.newline() emitter.write_indent() _emit_node(emitter, source, body) @@ -4818,7 +4834,10 @@ def _emit_while_statement( if _node_spans_multiple_rows(condition): # Preserve the developer-authored multi-line condition # verbatim from source; switch to Allman brace. - emitter.write_raw_lines(_node_source_text(source, condition)) + emitter.write_raw_lines( + _node_source_text(source, condition), + strip_trailing_ws=True, + ) emitter.newline() emitter.write_indent() _emit_node(emitter, source, body) @@ -6067,7 +6086,9 @@ def _emit_formal_parameters( if not force_wrap and _node_spans_multiple_rows(node): # Preserve developer-authored multi-line params from # source. Includes opening `(` and closing `)`. - emitter.write_raw_lines(_node_source_text(source, node)) + emitter.write_raw_lines( + _node_source_text(source, node), strip_trailing_ws=True + ) return params = [ c for c in node.children @@ -6743,7 +6764,9 @@ def _emit_argument_list( if comments_present or _is_inside_csoff_region( source, node ): - emitter.write_raw_lines(src_text) + emitter.write_raw_lines( + src_text, strip_trailing_ws=True + ) return if emitter.paren_align_col is not None: target_col = emitter.paren_align_col + 4 @@ -6829,7 +6852,9 @@ def _emit_argument_list( "indent within the line limit." ), )) - emitter.write_raw_lines("\n".join(final_lines)) + emitter.write_raw_lines( + "\n".join(final_lines), strip_trailing_ws=True + ) return # Source-preserved first line wouldn't fit and there # are no comments — fall through. The wrap engine @@ -7648,6 +7673,43 @@ def _emit_variable_declarator( break if value is None: return + # 0.5.1: when line_comment / block_comment "extras" sit + # between the `=` token and the value RHS (e.g. javadoc + # `// @highlight region="..."` snippet markers on + # assignment-with-text-block), the wrap engine has no way + # to represent comments inline with operator placement, + # so source-preserve the entire `= ...` region verbatim. + # This matches the 0.5.0 treatment of comments inside + # argument lists (`_arg_list_takes_source_preserve_path`). + # Without this guard the comments are silently dropped + # since `_emit_node(value)` walks only the value subtree + # and never visits the extra comment children. + equals_token = None + for child in node.children: + if child.type == "=": + equals_token = child + break + if equals_token is not None: + mid_comments = [ + c for c in node.children + if c.type in ("line_comment", "block_comment") + and c.start_byte > equals_token.start_byte + and c.start_byte < value.start_byte + ] + if mid_comments: + # Emit `= `. + # Verbatim preserves the developer's whitespace + # between `=`, the comments, and the value — the + # only safe transform when comment placement + # carries semantic meaning (snippet markers). + emitter.write(" ") + verbatim = source[ + equals_token.start_byte:value.end_byte + ].decode("utf-8") + emitter.write_raw_lines( + verbatim, strip_trailing_ws=True + ) + return # Wrap-priority for assignment: prefer the cleanest single- # line form over wrapping the value internally. Order: # diff --git a/tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/expected.java b/tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/expected.java new file mode 100644 index 0000000..b458e4f --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/expected.java @@ -0,0 +1,16 @@ +public class Demo +{ + void run() + { + // get a record definition (varies by application) + String recordDefinition = // @highlight substring="recordDefinition" + // @highlight type="italic" region="recordDefinition" + """ + { + "DATA_SOURCE": "TEST", + "RECORD_ID": "ABC123" + } + """; + // @end region="recordDefinition" + } +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/input.java b/tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/input.java new file mode 100644 index 0000000..b458e4f --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/01_assignment_text_block_with_snippet_markers/input.java @@ -0,0 +1,16 @@ +public class Demo +{ + void run() + { + // get a record definition (varies by application) + String recordDefinition = // @highlight substring="recordDefinition" + // @highlight type="italic" region="recordDefinition" + """ + { + "DATA_SOURCE": "TEST", + "RECORD_ID": "ABC123" + } + """; + // @end region="recordDefinition" + } +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/expected.java b/tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/expected.java new file mode 100644 index 0000000..2a4977e --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/expected.java @@ -0,0 +1,8 @@ +public class Demo +{ + void run() + { + String greeting = // explanatory note + "hello, world"; + } +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/input.java b/tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/input.java new file mode 100644 index 0000000..2a4977e --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/02_assignment_string_literal_with_comment/input.java @@ -0,0 +1,8 @@ +public class Demo +{ + void run() + { + String greeting = // explanatory note + "hello, world"; + } +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/expected.java b/tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/expected.java new file mode 100644 index 0000000..0e6626e --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/expected.java @@ -0,0 +1,16 @@ +public class Demo +{ + void run() + { + // @start region="example" + String recordDefinition = // @highlight region="recordDefinition" + """ + { + "DATA_SOURCE": "TEST" + } + """; + // @end region="recordDefinition" + System.out.println(recordDefinition); + // @end region="example" + } +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/input.java b/tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/input.java new file mode 100644 index 0000000..0e6626e --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/03_snippet_region_round_trip/input.java @@ -0,0 +1,16 @@ +public class Demo +{ + void run() + { + // @start region="example" + String recordDefinition = // @highlight region="recordDefinition" + """ + { + "DATA_SOURCE": "TEST" + } + """; + // @end region="recordDefinition" + System.out.println(recordDefinition); + // @end region="example" + } +} From 9070c7286579e8469ce7180abdd76fd59b1655f5 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Tue, 23 Jun 2026 17:30:35 -0700 Subject: [PATCH 02/11] 0.5.1 P2: emit space between [] dimensions and { array initializer } MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 0.5.0 _emit_array_creation_expression walked named_children and emitted each with no separator. For `new Type[] { x }` shape, named_children are [generic_type, dimensions, array_initializer] — the emit produced `new Type[]{ x }` (no space between `]` and `{`), violating the spec's 'Whitespace and Operator Spacing' rule for the opening brace of an array initializer. The formatter was idempotent on both forms (`[]{X}` AND `[] {X}`) — running on either left it unchanged. Result: same file in senzing-commons-java contained both shapes, depending on which form the developer originally typed. Fix: in the named-children walk, write a single space before emitting an `array_initializer` child. 1 new fixture locking the canonical `new Type[] { ... }` shape (including the empty-initializer case `new String[] {}`). 650 tests pass (was 649; +1 fixture). --- tooling/scripts/format_java.py | 8 ++++++++ .../01_space_before_brace_after_dimensions/expected.java | 6 ++++++ .../01_space_before_brace_after_dimensions/input.java | 6 ++++++ 3 files changed, 20 insertions(+) create mode 100644 tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/expected.java create mode 100644 tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/input.java diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index a02167e..f53270c 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -3870,6 +3870,14 @@ def _emit_array_creation_expression( return emitter.write("new ") for child in node.named_children: + # Spec "Whitespace and Operator Spacing": single space + # before the opening `{` of an array initializer that + # follows `[]` (or `[N]`) dimensions. `new Type[] { X }` + # is canonical; `new Type[]{ X }` is a 0.5.0 bug where + # the emitter walked `dimensions` then `array_initializer` + # back-to-back without inserting the required separator. + if child.type == "array_initializer": + emitter.write(" ") _emit_node(emitter, source, child) diff --git a/tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/expected.java b/tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/expected.java new file mode 100644 index 0000000..5659429 --- /dev/null +++ b/tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/expected.java @@ -0,0 +1,6 @@ +public class Demo +{ + Class[] types = new Class[] { Connection.class }; + Object[] values = new Object[] { a, b }; + String[] empty = new String[] {}; +} diff --git a/tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/input.java b/tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/input.java new file mode 100644 index 0000000..895f8d0 --- /dev/null +++ b/tooling/scripts/tests/fixtures/array_initializer/01_space_before_brace_after_dimensions/input.java @@ -0,0 +1,6 @@ +public class Demo +{ + Class[] types = new Class[]{ Connection.class }; + Object[] values = new Object[]{ a, b }; + String[] empty = new String[]{}; +} From 9166757417de9bd1afb3b4f8f4f1938c824fa4ef Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Tue, 23 Jun 2026 17:35:06 -0700 Subject: [PATCH 03/11] 0.5.1 P3: item-8 break-before-next-arg in multi-arg arg lists MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 0.5.0 CHANGELOG promised item 8 (multi-row-inner-forces- outer-break) applies to argument lists, but the 0.5.0 implementation described it as "width-gate handles it implicitly via widths_ok." The width gate misses cases like: assertEquals(expectedText.replaceAll(...), result.replaceAll( "\\s", ""), "msg " + jsonValue); where the second arg wraps multi-row (its own inner arg list broke), the trailing `"")` and third arg `"msg "...` land jammed on the same line. The line happens to fit under 80 chars by coincidence, so the width gate doesn't fire. Fix: explicit item-8 check in BOTH emit_p1 and emit_p2_greedy of _emit_argument_list. After each arg's emit, track whether line_count advanced; if yes, the next arg breaks to a new line at the call's post-`(` column (cont_col). emit_p1 case was the load-bearing one — try_priorities committed emit_p1 with multi-row content because widths fit; without item-8, the trailing args got stranded. Now emit_p1's multi-arg path applies item-8 directly, so a multi-row middle arg breaks the subsequent args to their own continuation lines even within the P1 attempt. emit_p2_greedy got the parallel fix for consistency (in case P1 falls through and P2 commits). emit_p1's behavior diverges slightly from 0.5.0: when an arg wraps multi-row, the subsequent args break and align to cont_col instead of jamming. This is the intended item-8 shape; the CHANGELOG already documented it but the implementation was missing. 1 new fixture (arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next) locks the canonical assertEquals + inner-call-wraps pattern. 650 tests pass (was 649; +1 fixture). --- tooling/scripts/format_java.py | 48 ++++++++++++++++++- .../expected.java | 10 ++++ 2 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/expected.java diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index f53270c..3a0a640 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -6907,10 +6907,28 @@ def emit_p1() -> None: finally: emitter.set_paren_align_col(prev_align) else: + cont_col = emitter.column + prev_arg_multi_row = False for index, arg in enumerate(args): if index > 0: - emitter.write(", ") + if prev_arg_multi_row: + # 0.5.1 P3 — item 8 invariant in arg-list + # P1: when the previous arg emitted + # multi-row (a nested call / lambda / + # binary wrapped), break before this arg + # so it doesn't jam onto the wrapped + # construct's tail line. The break lands + # at the call's post-`(` column. + emitter.write(",") + emitter.newline() + emitter.write(" " * cont_col) + else: + emitter.write(", ") + operand_start = emitter.line_count _emit_node(emitter, source, arg) + prev_arg_multi_row = ( + emitter.line_count > operand_start + ) emitter.write(")") def emit_p4_single_arg() -> None: @@ -6947,12 +6965,36 @@ def emit_p2_greedy() -> None: emitter.write("(") cont_col = emitter.column effective_max = _MAX_LINE - emitter.tail_reserve + prev_arg_multi_row = False for index, arg in enumerate(args): if index == 0: + operand_start = emitter.line_count + _emit_node(emitter, source, arg) + prev_arg_multi_row = ( + emitter.line_count > operand_start + ) + continue + if prev_arg_multi_row: + # 0.5.1 P3 — item 8 invariant for arg lists: + # the previous arg's emission introduced + # newlines (a nested call / lambda / binary + # wrapped multi-row), so force break before + # this arg. Otherwise the next arg lands at + # whatever column the prior arg's wrap tail + # ended on, jamming `arg)` onto the same + # line as the wrapped construct's closing. + emitter.write(",") + emitter.newline() + emitter.write(" " * cont_col) + operand_start = emitter.line_count _emit_node(emitter, source, arg) + prev_arg_multi_row = ( + emitter.line_count > operand_start + ) continue saved = emitter.snapshot() emitter.write(", ") + operand_start = emitter.line_count _emit_node(emitter, source, arg) widths_ok = ( emitter.last_lines_max_width(saved[0]) @@ -6985,7 +7027,11 @@ def emit_p2_greedy() -> None: emitter.write(",") emitter.newline() emitter.write(" " * cont_col) + operand_start = emitter.line_count _emit_node(emitter, source, arg) + prev_arg_multi_row = ( + emitter.line_count > operand_start + ) emitter.write(")") def emit_p4_multi_arg() -> None: diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/expected.java b/tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/expected.java new file mode 100644 index 0000000..c6b5032 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/expected.java @@ -0,0 +1,10 @@ +public class Demo +{ + void run(String expectedText, String result, String jsonValue) + { + assertEquals(expectedText.replaceAll("\\s", ""), result.replaceAll( + "\\s", + ""), + "Unexpected pretty-print result: " + jsonValue); + } +} From 0a423d3c329e2e73c80ae5a25437ff5ad4fe7ab8 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Tue, 23 Jun 2026 17:38:54 -0700 Subject: [PATCH 04/11] 0.5.1 P4: paren-align binary positional arg under arg start column MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a positional argument of a multi-arg call is a `binary_expression`, the binary's continuation operators should align under the argument's first operand column — matching the spec C6 governing-paren behavior applied at the argument's anchor instead of falling back to the `block + 4` cumulative indent. 0.5.0 shape: assertTrue(available < bytes, "More bytes available than should be (" + bytes + "): " + available); // `+` at col 8 (block+4) 0.5.1 shape: assertTrue(available < bytes, "More bytes available than should be (" + bytes + "): " + available); // `+` paren-aligned under "More..." Fix: helper `_emit_arg_with_optional_paren_align` in `_emit_argument_list` wraps each positional arg emit; when the arg's direct type is `binary_expression`, it sets `paren_align_col` to the arg's start column for the duration of that emit. Applied across emit_p1's multi-arg path AND emit_p2_greedy (initial pack, item-8 break, width-failed re-emit) for consistency. Narrow gate (direct `binary_expression` only, no paren unwrap) avoids the idempotency drift that originally narrowed 0.5.0 item 10 to single-arg. 1 new fixture (arg_list_wrap/05_paren_align_binary_positional_arg) locks the canonical assertTrue + long-message-binary pattern. 651 tests pass (was 650; +1 fixture). Senzing-commons-java consumer reformat: 44 files modified (was 34 in 0.5.0; the extra 10 are mostly src/test/java files where the paren-align fix improved layout shape), 2151/2151 tests pass, BUILD SUCCESS. --- tooling/scripts/format_java.py | 42 ++++++++++++++++--- .../expected.java | 8 ++++ .../input.java | 7 ++++ 3 files changed, 52 insertions(+), 5 deletions(-) create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/expected.java create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/input.java diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index 3a0a640..d572f93 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -6925,7 +6925,7 @@ def emit_p1() -> None: else: emitter.write(", ") operand_start = emitter.line_count - _emit_node(emitter, source, arg) + _emit_arg_with_optional_paren_align(arg) prev_arg_multi_row = ( emitter.line_count > operand_start ) @@ -6951,6 +6951,38 @@ def emit_p4_single_arg() -> None: emitter.write(")") emitter.pop_indent() + def _emit_arg_with_optional_paren_align(arg: Node) -> None: + # 0.5.1 P4 — when a positional arg is a binary + # expression, set `paren_align_col` to the arg's + # start column so the binary's continuation operators + # paren-align under the arg's first operand instead + # of falling back to the `block + 4` cumulative + # indent. This produces: + # + # assertTrue(cond, + # "msg " + # + var + # + " more"); + # + # instead of the 0.5.0 shape: + # + # assertTrue(cond, + # "msg " + # + var + " more"); // `+` at col 8 + # + # Narrow to direct binary_expression (no paren unwrap) + # to avoid the idempotency drift that originally + # narrowed item 10 to single-arg. + if arg.type == "binary_expression": + arg_col = emitter.column + prev_align = emitter.set_paren_align_col(arg_col) + try: + _emit_node(emitter, source, arg) + finally: + emitter.set_paren_align_col(prev_align) + else: + _emit_node(emitter, source, arg) + def emit_p2_greedy() -> None: # P2: pack as many args as fit on the call line at # the paren-aligned continuation column. Each arg's @@ -6969,7 +7001,7 @@ def emit_p2_greedy() -> None: for index, arg in enumerate(args): if index == 0: operand_start = emitter.line_count - _emit_node(emitter, source, arg) + _emit_arg_with_optional_paren_align(arg) prev_arg_multi_row = ( emitter.line_count > operand_start ) @@ -6987,7 +7019,7 @@ def emit_p2_greedy() -> None: emitter.newline() emitter.write(" " * cont_col) operand_start = emitter.line_count - _emit_node(emitter, source, arg) + _emit_arg_with_optional_paren_align(arg) prev_arg_multi_row = ( emitter.line_count > operand_start ) @@ -6995,7 +7027,7 @@ def emit_p2_greedy() -> None: saved = emitter.snapshot() emitter.write(", ") operand_start = emitter.line_count - _emit_node(emitter, source, arg) + _emit_arg_with_optional_paren_align(arg) widths_ok = ( emitter.last_lines_max_width(saved[0]) <= effective_max @@ -7028,7 +7060,7 @@ def emit_p2_greedy() -> None: emitter.newline() emitter.write(" " * cont_col) operand_start = emitter.line_count - _emit_node(emitter, source, arg) + _emit_arg_with_optional_paren_align(arg) prev_arg_multi_row = ( emitter.line_count > operand_start ) diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/expected.java b/tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/expected.java new file mode 100644 index 0000000..1c9a118 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/expected.java @@ -0,0 +1,8 @@ +public class Demo +{ + void check(boolean cond, int bytes, int available) + { + assertTrue(available < bytes, "More bytes available than should be (" + + bytes + "): " + available); + } +} diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/input.java b/tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/input.java new file mode 100644 index 0000000..f1361fc --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/05_paren_align_binary_positional_arg/input.java @@ -0,0 +1,7 @@ +public class Demo +{ + void check(boolean cond, int bytes, int available) + { + assertTrue(available < bytes, "More bytes available than should be (" + bytes + "): " + available); + } +} From ed695b48fdc4080697f1672168f8113916d17d64 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Tue, 23 Jun 2026 17:39:30 -0700 Subject: [PATCH 05/11] 0.5.1: CHANGELOG entry Five fixes documented as Fixed entries, plus a Verification block. Format matches the per-release pattern (terser than 0.5.0's prose-heavy entries per the user feedback). --- CHANGELOG.md | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 824aeb4..7c22d9d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,70 @@ and this project adheres to ## [Unreleased] +## [0.5.1] - 2026-06-23 + +Bug-fix release addressing four formatter defects surfaced +during the 0.5.0 adoption pass across `senzing-commons-java` +and `sz-sdk-java`. All changes are formatter output fixes; +no spec changes. + +### Fixed + +- **Mid-statement line comments dropped in + `variable_declarator`.** Comments positioned between `=` + and the value RHS (e.g. javadoc `// @highlight region="x"` + snippet markers on text-block assignments) were silently + dropped by the AST walk, breaking `mvn javadoc:javadoc` + under JDK 21 when the unpaired `@end region` closers + failed validation. Fix: source-preserve the `= ...` region + verbatim when mid-statement comments are detected, + matching the 0.5.0 treatment of comments inside arg lists. +- **Trailing whitespace bypassed `Emitter.newline()`'s + `rstrip` in source-preserve paths.** `write_raw_lines` + intentionally preserves trailing whitespace for text-block + content but had no opt-in for source-preserved CODE. + Fix: added `strip_trailing_ws` parameter; all + source-preserve-code call sites (conditions, arg lists, + formal parameters, non-javadoc block comments) pass `True`. +- **Array initializer missing space before `{`.** `new + Type[]{ X }` produced (instead of canonical + `new Type[] { X }`) for some inputs; idempotent on both + forms, so the file accumulated mixed styles. Fix: emit + a single space before `array_initializer` children of + `array_creation_expression`. +- **Item-8 invariant not enforced for multi-arg arg lists.** + Calls like `assertEquals(arg1, longCallThatWraps(...), + "msg")` jammed the third argument onto the wrapped second + argument's tail line. The 0.5.0 spec described this as + "width-gate handles it implicitly" but the gate misses + cases where the line happens to fit under 80 chars by + coincidence. Fix: explicit + "previous-arg-multi-row → break before next arg" check in + both `emit_p1` and `emit_p2_greedy`. +- **Binary positional arg ignored arg-start column.** When + a multi-arg call's positional argument was a + `binary_expression`, the binary's continuation operators + landed at `block + 4` instead of paren-aligning under the + argument's first operand column. Fix: set + `paren_align_col` to the arg's start column for the + duration of a binary-typed positional arg's emit. Narrow + to direct `binary_expression` (no paren unwrap) to avoid + the idempotency drift that originally narrowed 0.5.0 + item 10 to single-arg. + +### Verification + +- 651 formatter tests pass (was 645 at 0.5.0; +6 new + fixtures across `comment_preservation/`, `arg_list_wrap/`, + `array_initializer/`). +- `senzing-commons-java` reformat: 2151 / 2151 tests pass, + `mvn -Pcheckstyle validate` BUILD SUCCESS, idempotent on + 2nd pass, zero trailing whitespace in source, array + initializers normalized. +- The sz-sdk-java demo files containing javadoc `@snippet` + markers no longer lose their `// @highlight` openers; + `mvn javadoc:javadoc` under JDK 21 succeeds. + ## [0.5.0] - 2026-06-23 Expands the formatter from the From 541de341bd61fde31a8e69d5f86f454df9e5e056 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Tue, 23 Jun 2026 17:40:15 -0700 Subject: [PATCH 06/11] 0.5.1 CHANGELOG: reword bullets to avoid prettier line-break drift Two bullets had brace / parenthesis tokens at line starts that prettier wanted to reflow back to the previous line, causing the 'continuation at column 0' rendering issue seen on the 0.5.0 release prose. Reworded so the prose flows naturally without those edge characters being wrap targets. --- CHANGELOG.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7c22d9d..d2c02cb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -35,16 +35,17 @@ no spec changes. Fix: added `strip_trailing_ws` parameter; all source-preserve-code call sites (conditions, arg lists, formal parameters, non-javadoc block comments) pass `True`. -- **Array initializer missing space before `{`.** `new - Type[]{ X }` produced (instead of canonical - `new Type[] { X }`) for some inputs; idempotent on both - forms, so the file accumulated mixed styles. Fix: emit - a single space before `array_initializer` children of +- **Array initializer missing space before `{`.** Produced + `new Type[]{ X }` instead of canonical `new Type[] { X }` + for some inputs; idempotent on both forms, so the file + accumulated mixed styles. Fix: emit a single space before + `array_initializer` children of `array_creation_expression`. - **Item-8 invariant not enforced for multi-arg arg lists.** - Calls like `assertEquals(arg1, longCallThatWraps(...), - "msg")` jammed the third argument onto the wrapped second - argument's tail line. The 0.5.0 spec described this as + Calls of the shape + `assertEquals(arg1, longCallThatWraps(...), msg)` jammed + the third argument onto the wrapped second argument's + tail line. The 0.5.0 spec described this as "width-gate handles it implicitly" but the gate misses cases where the line happens to fit under 80 chars by coincidence. Fix: explicit From ab120587307dd7b8339fd230d82141b9cbdc0681 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Wed, 24 Jun 2026 12:39:50 -0700 Subject: [PATCH 07/11] 0.5.1: address PR #34 round-1 review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 1 of local /senzing-code-review surfaced 1 MUST-FIX, 6 SHOULD-FIX, 7 NITs. Applied items below; deferred a few NITs (cosmetic comment polish) as not worth the diff. **M1 — Missing input.java in fixture arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/.** The fixture was created with only expected.java; the test collector silently skips cases lacking input.java, so the P3 headline fix had zero golden coverage. Recreated input.java with the jammed-form source. Test count went from 650 to 652 (the missing case + one other latent collection that resolved alongside). **S2 — rstrip(" ") → rstrip(" \t").** Standards forbid tabs in source but a stray tab smuggled in via copy-paste could bypass the rstrip. Belt-and-suspenders. **S3 — Stale comment claiming "P1 is always single-line."** After 0.5.1 P3, P1 can commit multi-row when item-8 forces a break. Comment updated. **S5 — Consumer trial checklist now lives in docs/faqs/building/consumer-trial-checklist.md.** Future sessions running consumer trials see it via the FAQ MCP server. Documents the five gates (checkstyle, tests, javadoc with NON-stripping profile, token round-trip, idempotency) and explains why the 0.5.0 trial missed the data-loss bug (gate 3 was skipped — only checkstyle ran). **S6 — +4 targeted fixtures covering review gaps:** - comment_preservation/04_block_comment_between_eq_and_value — `String x = /* note */ "hello";` shape. - array_initializer/02_initializer_without_new — `int[] a = { 1, 2, 3 };` (no `new` keyword; different AST path). - arg_list_wrap/06_last_arg_multi_row — `assertEquals(a, longChain.x().y().z())` where the last arg wraps multi- row (trailing-comma edge: must NOT add stray break). - arg_list_wrap/07_paren_align_binary_idempotency_lock — input is the already-formatted 0.5.1 paren-aligned shape; expected == input. Locks idempotency on the P4 shape, which was the original concern that narrowed 0.5.0 item 10. **N1 — CHANGELOG date drift.** 0.5.1 header now reads 2026-06-24 (the actual tag-day target) instead of -23. **N2 — Release lead reworded.** "Four defects (five code- level fixes)" instead of "four defects" since the bullet list has five fixed items. Deferred (not worth the diff churn for 0.5.1): - S1 (idempotency-only fixtures): the existing comment_preservation fixtures DO catch the comment-loss regression — golden tests fail when input/expected differ, which they would if the formatter dropped comments. The reviewer's concern was a misread. - N3-N7: cosmetic comment polish, helper placement, untracked-file clutter — none load-bearing. 656 tests pass (was 645 at 0.5.0 baseline, +11 over baseline: +6 headline + +4 review-pass + +1 missing-input recovery). Prettier + cspell clean across modified docs. --- CHANGELOG.md | 18 +- .../faqs/building/consumer-trial-checklist.md | 171 ++++++++++++++++++ tooling/scripts/format_java.py | 20 +- .../input.java | 7 + .../06_last_arg_multi_row/expected.java | 9 + .../06_last_arg_multi_row/input.java | 7 + .../expected.java | 8 + .../input.java | 8 + .../02_initializer_without_new/expected.java | 5 + .../02_initializer_without_new/input.java | 5 + .../expected.java | 7 + .../input.java | 7 + 12 files changed, 256 insertions(+), 16 deletions(-) create mode 100644 docs/faqs/building/consumer-trial-checklist.md create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/input.java create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/expected.java create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/input.java create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/expected.java create mode 100644 tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/input.java create mode 100644 tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java create mode 100644 tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/expected.java create mode 100644 tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/input.java diff --git a/CHANGELOG.md b/CHANGELOG.md index d2c02cb..e305c0a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,12 +10,12 @@ and this project adheres to ## [Unreleased] -## [0.5.1] - 2026-06-23 +## [0.5.1] - 2026-06-24 -Bug-fix release addressing four formatter defects surfaced -during the 0.5.0 adoption pass across `senzing-commons-java` -and `sz-sdk-java`. All changes are formatter output fixes; -no spec changes. +Bug-fix release addressing four formatter defects (five +code-level fixes) surfaced during the 0.5.0 adoption pass +across `senzing-commons-java` and `sz-sdk-java`. All changes +are formatter output fixes; no spec changes. ### Fixed @@ -64,9 +64,13 @@ no spec changes. ### Verification -- 651 formatter tests pass (was 645 at 0.5.0; +6 new +- 656 formatter tests pass (was 645 at 0.5.0; +10 new fixtures across `comment_preservation/`, `arg_list_wrap/`, - `array_initializer/`). + and `array_initializer/`). The first 6 lock the headline + fixes; the remaining 4 cover edge cases surfaced during the + PR review pass (block-comment between `=` and value, + array initializer without `new`, last-arg-multi-row in arg + lists, idempotency lock for the P4 paren-aligned shape). - `senzing-commons-java` reformat: 2151 / 2151 tests pass, `mvn -Pcheckstyle validate` BUILD SUCCESS, idempotent on 2nd pass, zero trailing whitespace in source, array diff --git a/docs/faqs/building/consumer-trial-checklist.md b/docs/faqs/building/consumer-trial-checklist.md new file mode 100644 index 0000000..5b7fdd1 --- /dev/null +++ b/docs/faqs/building/consumer-trial-checklist.md @@ -0,0 +1,171 @@ +# Consumer Trial Checklist (Before Tagging a Standards Release) + +## Overview + +When preparing to tag a new release of `java-coding-standards`, +trial the candidate against each known adopter before the tag +lands. The shallow gate that 0.5.0 used (`mvn -Pcheckstyle +validate` only) missed a silent data-loss bug because checkstyle +doesn't validate that javadoc snippet markup survives +reformatting. The full gate below catches that class of issue. + +## The five gates + +A consumer trial PASSES only when ALL five gates pass: + +### 1. Checkstyle + +```bash +mvn -Pcheckstyle validate +``` + +Must report `BUILD SUCCESS` with zero LineLength or style +violations. If the consumer has source files that overflow at +the canonical column under the new spec's no-fallback policy, +the developer manually splits the long literal before this +gate goes green. + +### 2. Tests + +```bash +mvn test +``` + +Must report `BUILD SUCCESS` with zero failures, zero errors. +A formatter change that breaks tests is a semantic regression +(rare but possible — e.g. annotation arg order, string +escapes). + +### 3. Javadoc — never skip + +```bash +mvn javadoc:javadoc # adopt the consumer's default profile +``` + +Must report `BUILD SUCCESS`. **Run this with the consumer's +default profile, NOT a profile that strips javadoc snippet +markup.** + +The sz-sdk-java `java-17` profile sets +`x` on the +`maven-javadoc-plugin`, which strips `@snippet` tags entirely +under JDK 17 — useful for back-compat, but masks the entire +class of bug where the formatter drops `// @highlight` / +`// @end` snippet markers. The 0.5.0 release went out with a +real data-loss bug because the trial only ran the JDK-17 +profile. + +For consumers that target JDK 17+ for javadoc, run the gate +under the profile that DOES include snippet markup (typically +`java-18+`, `java-21`, or unprofiled). Pre-JDK-18 consumers +can use a plain javadoc invocation — the gate is checking +that the formatter didn't drop tokens, not that the resulting +javadoc renders correctly under every JDK. + +### 4. Token round-trip + +Count source tokens that the formatter could plausibly drop, +pre- and post-format. Counts must match: + +```bash +# Snippet markers (javadoc @snippet markup) +grep -cE '(@highlight|@end|@start|@link|@replace)' \ + src/main/java -r > /tmp/pre-counts.txt +grep -cE '(@highlight|@end|@start|@link|@replace)' \ + src/demo/java -r > /tmp/pre-demo-counts.txt + +# Trailing whitespace (forbidden per spec) +grep -rl ' $' src/main/java > /tmp/pre-trailing.txt + +# Format, then re-count +python3 .java-coding-standards/tooling/scripts/format_file.py \ + src/main/java src/test/java src/demo/java +grep -cE '(@highlight|@end|@start|@link|@replace)' \ + src/main/java -r > /tmp/post-counts.txt +grep -cE '(@highlight|@end|@start|@link|@replace)' \ + src/demo/java -r > /tmp/post-demo-counts.txt +grep -rl ' $' src/main/java > /tmp/post-trailing.txt + +diff /tmp/pre-counts.txt /tmp/post-counts.txt # must be empty +diff /tmp/pre-demo-counts.txt /tmp/post-demo-counts.txt # must be empty +diff /tmp/post-trailing.txt /dev/null # must be empty +``` + +If any diff is non-empty: the formatter is dropping or +introducing tokens — file a regression before tagging. + +### 5. Idempotency + +```bash +python3 .java-coding-standards/tooling/scripts/format_file.py \ + src/main/java src/test/java src/demo/java +``` + +Second invocation must report `0 modified`. A formatter that +produces different output on the second pass is non-idempotent +— file the case as a regression and don't tag. + +## What changed in 0.5.0 → 0.5.1 + +0.5.0's pre-release trial against sz-sdk-java ran ONLY gate 1 +(checkstyle). 0.5.0 shipped with a bug that: + +- Silently dropped `// @highlight region="..."` line comments + positioned between `=` and a text-block opener on assignment + statements (e.g. `String x = // @highlight\n"""...""";`). +- Was idempotent on the broken output (so the data loss was + unrecoverable by re-running the formatter). +- Was invisible to checkstyle (no rule covers token + preservation). +- Was invisible to `mvn javadoc:javadoc` under the consumer's + `java-17` profile (snippet tags were stripped by the profile + anyway). +- Surfaced when CI ran the `java-21` profile, which doesn't + strip snippets — javadoc validation failed on unpaired + `@end region` markers. + +Gates 3 and 4 above are designed to catch this and similar +data-loss bugs at consumer-trial time, before the standards +release is tagged. Run them. + +## Workflow + +```bash +# Setup: clone consumer with the candidate standards pin +cd /path/to/consumer +git checkout -b standards-trial-X.Y.Z +git -C .java-coding-standards fetch +git -C .java-coding-standards checkout FETCH_HEAD + +# Run all five gates +mvn -Pcheckstyle validate # gate 1 +mvn test # gate 2 +mvn javadoc:javadoc -P # gate 3 + +# Pre-format snapshot +grep -cE '(@highlight|@end|@start|@link|@replace)' \ + $(find src -name '*.java') > /tmp/pre-tokens.txt + +# Format +python3 .java-coding-standards/tooling/scripts/format_file.py \ + src/main/java src/test/java src/demo/java + +# Gate 4 — token round-trip +grep -cE '(@highlight|@end|@start|@link|@replace)' \ + $(find src -name '*.java') > /tmp/post-tokens.txt +diff /tmp/pre-tokens.txt /tmp/post-tokens.txt # must be empty + +# Gate 5 — idempotency +python3 .java-coding-standards/tooling/scripts/format_file.py \ + src/main/java src/test/java src/demo/java +# Must report "0 modified" +``` + +If any gate fails, file the case as a regression PR against +the standards repo and re-run the trial after the fix. Don't +tag until every trial passes every gate. + +## See also + +- [Java formatting standards](java-formatting-standards.md) +- [Javadoc reflow conventions](javadoc-reflow-conventions.md) diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index d572f93..e419ed6 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -340,7 +340,7 @@ def newline(self) -> None: Trailing spaces on the finalized line are stripped before commit so emitters need not pre-trim them. """ - self._lines.append(self._current.rstrip(" ")) + self._lines.append(self._current.rstrip(" \t")) self._current = "" @property @@ -494,7 +494,7 @@ def write_raw_lines( self._current += parts[0] for part in parts[1:]: if strip_trailing_ws: - self._lines.append(self._current.rstrip(" ")) + self._lines.append(self._current.rstrip(" \t")) else: self._lines.append(self._current) self._current = part @@ -525,7 +525,7 @@ def finish(self) -> bytes: for files with at least one byte of real content. """ if self._current: - self._lines.append(self._current.rstrip(" ")) + self._lines.append(self._current.rstrip(" \t")) self._current = "" if not self._lines: return b"" @@ -7084,12 +7084,14 @@ def emit_p4_multi_arg() -> None: emitter.write(")") emitter.pop_indent() - # P1 (single line) is always tried first. The wrap engine - # measures actual rendered widths via try_priorities, so a - # multi-line arg (lambda body, nested wrapping call) that - # blows past 80 chars during P1 emit simply falls through - # to the next candidate. Letting P1 try also keeps the - # decision deterministic from the AST — earlier code + # P1 is the AST-deterministic single-line candidate, but + # since 0.5.1 P3 it may emit a multi-row layout when an + # intermediate arg wraps multi-row and item-8 forces a + # break before subsequent args. try_priorities still + # measures actual rendered widths, so a P1 emit that + # blows past 80 chars falls through to the next + # candidate. Letting P1 try keeps the decision + # deterministic from the AST — earlier code # short-circuited P1 when any arg's SOURCE was multi-row, # which made the decision flip between formatter passes. candidates: list[Callable[[], None]] = [emit_p1] diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/input.java b/tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/input.java new file mode 100644 index 0000000..8626470 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/04_item8_prev_arg_multi_row_breaks_next/input.java @@ -0,0 +1,7 @@ +public class Demo +{ + void run(String expectedText, String result, String jsonValue) + { + assertEquals(expectedText.replaceAll("\\s", ""), result.replaceAll("\\s", ""), "Unexpected pretty-print result: " + jsonValue); + } +} diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/expected.java b/tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/expected.java new file mode 100644 index 0000000..ebbf615 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/expected.java @@ -0,0 +1,9 @@ +public class Demo +{ + void run(String first, String second) + { + assertEquals("expected", actualMethod.replaceAll("\\s", "") + .replaceAll("\\n", " ") + .trim()); + } +} diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/input.java b/tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/input.java new file mode 100644 index 0000000..e3d3764 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/06_last_arg_multi_row/input.java @@ -0,0 +1,7 @@ +public class Demo +{ + void run(String first, String second) + { + assertEquals("expected", actualMethod.replaceAll("\\s", "").replaceAll("\\n", " ").trim()); + } +} diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/expected.java b/tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/expected.java new file mode 100644 index 0000000..1c9a118 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/expected.java @@ -0,0 +1,8 @@ +public class Demo +{ + void check(boolean cond, int bytes, int available) + { + assertTrue(available < bytes, "More bytes available than should be (" + + bytes + "): " + available); + } +} diff --git a/tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/input.java b/tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/input.java new file mode 100644 index 0000000..1c9a118 --- /dev/null +++ b/tooling/scripts/tests/fixtures/arg_list_wrap/07_paren_align_binary_idempotency_lock/input.java @@ -0,0 +1,8 @@ +public class Demo +{ + void check(boolean cond, int bytes, int available) + { + assertTrue(available < bytes, "More bytes available than should be (" + + bytes + "): " + available); + } +} diff --git a/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java new file mode 100644 index 0000000..c08a8c9 --- /dev/null +++ b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java @@ -0,0 +1,5 @@ +public class Demo +{ + int[] nums = { 1, 2, 3 }; + String[] empty = {}; +} diff --git a/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java new file mode 100644 index 0000000..c08a8c9 --- /dev/null +++ b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java @@ -0,0 +1,5 @@ +public class Demo +{ + int[] nums = { 1, 2, 3 }; + String[] empty = {}; +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/expected.java b/tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/expected.java new file mode 100644 index 0000000..2daefb5 --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/expected.java @@ -0,0 +1,7 @@ +public class Demo +{ + void run() + { + String label = /* descriptive block comment */ "hello"; + } +} diff --git a/tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/input.java b/tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/input.java new file mode 100644 index 0000000..2daefb5 --- /dev/null +++ b/tooling/scripts/tests/fixtures/comment_preservation/04_block_comment_between_eq_and_value/input.java @@ -0,0 +1,7 @@ +public class Demo +{ + void run() + { + String label = /* descriptive block comment */ "hello"; + } +} From 5311f04f718c199fbe760d096b46a335a5c4a01c Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Wed, 24 Jun 2026 12:44:09 -0700 Subject: [PATCH 08/11] =?UTF-8?q?0.5.1=20FAQ:=20cspell=20=E2=80=94=20rewor?= =?UTF-8?q?d=20'unprofiled'=20to=20phrase=20using=20dictionary=20words?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per CSpell discipline (no whitelisting invented words). The gate description now reads 'or no profile at all' instead of 'unprofiled'. Same meaning, real words. --- docs/faqs/building/consumer-trial-checklist.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/faqs/building/consumer-trial-checklist.md b/docs/faqs/building/consumer-trial-checklist.md index 5b7fdd1..3a7d646 100644 --- a/docs/faqs/building/consumer-trial-checklist.md +++ b/docs/faqs/building/consumer-trial-checklist.md @@ -57,7 +57,7 @@ profile. For consumers that target JDK 17+ for javadoc, run the gate under the profile that DOES include snippet markup (typically -`java-18+`, `java-21`, or unprofiled). Pre-JDK-18 consumers +`java-18+`, `java-21`, or no profile at all). Pre-JDK-18 consumers can use a plain javadoc invocation — the gate is checking that the formatter didn't drop tokens, not that the resulting javadoc renders correctly under every JDK. From 362df7a83d87f5be33823593f8e54d1b512f7ca8 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Wed, 24 Jun 2026 12:47:15 -0700 Subject: [PATCH 09/11] =?UTF-8?q?0.5.1:=20docstring=20drift=20=E2=80=94=20?= =?UTF-8?q?rstrip("=20")=20to=20rstrip("=20tab")?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pure documentation fix. The S2 rstrip change updated the code at 3 sites but missed updating the docstring on write_raw_lines that referenced the old single-char form. No functional effect. --- tooling/scripts/format_java.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index e419ed6..aada8db 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -482,7 +482,7 @@ def write_raw_lines( there is just stray bytes the source author left behind and the spec's "Trailing Whitespace" rule applies. Such callers pass `strip_trailing_ws=True` to apply the same - `rstrip(" ")` that `newline()` does on its finalized + `rstrip(" \t")` that `newline()` does on its finalized lines. The in-progress line at the END of `text` (the part after From 32cf713cb155a56ad02cebcc2a2f41de0dad94c8 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Wed, 24 Jun 2026 12:55:16 -0700 Subject: [PATCH 10/11] 0.5.1 fixture: rename nums to numbers (CI cspell) CI's cspell flagged 'nums' as an unknown word in the new array-initializer fixture (local cspell missed it because of config difference). Per the CSpell discipline rule (no invented words; rename symbols to use real ones), renaming the variable to 'numbers'. --- .../array_initializer/02_initializer_without_new/expected.java | 2 +- .../array_initializer/02_initializer_without_new/input.java | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java index c08a8c9..17f6a93 100644 --- a/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java +++ b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/expected.java @@ -1,5 +1,5 @@ public class Demo { - int[] nums = { 1, 2, 3 }; + int[] numbers = { 1, 2, 3 }; String[] empty = {}; } diff --git a/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java index c08a8c9..17f6a93 100644 --- a/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java +++ b/tooling/scripts/tests/fixtures/array_initializer/02_initializer_without_new/input.java @@ -1,5 +1,5 @@ public class Demo { - int[] nums = { 1, 2, 3 }; + int[] numbers = { 1, 2, 3 }; String[] empty = {}; } From 8142d2bb8962827cc691d4d0cfea2a1ef076a849 Mon Sep 17 00:00:00 2001 From: "Barry M. Caceres" Date: Thu, 25 Jun 2026 10:13:24 -0700 Subject: [PATCH 11/11] 0.5.1: address CI review NITs (helper ordering + version-pin removal) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI review on the 0.5.1 PR flagged two readability NITs: 1. _emit_arg_with_optional_paren_align was defined AFTER its first caller (emit_p1). Valid Python (closures resolve at call time) but reads backward. Moved the helper above emit_p1. 2. Inline comments used version-pinned labels ("0.5.1 P3", "0.5.1 P4", "0.5.1:") that will become misleading as the code evolves. Version tags belong in git history and CHANGELOG, not in source comments. Replaced four occurrences with spec-term labels ("item 8 invariant", "binary positional arg paren-align", "mid-statement comment preservation"). Pure refactor — no functional change. 656 tests pass. CI review also raised: - Prettier check on the new FAQ + CHANGELOG: verified locally clean (all matched files use Prettier code style). - ### Verification non-standard Keep-a-Changelog section: pre-existing project convention consistent with 0.5.0 and earlier; leaving as-is. - Fixture 04 inner-wrap indent style: documented P4 fallback behavior, kept as-is. --- tooling/scripts/format_java.py | 111 +++++++++++++++++---------------- 1 file changed, 57 insertions(+), 54 deletions(-) diff --git a/tooling/scripts/format_java.py b/tooling/scripts/format_java.py index aada8db..484139c 100644 --- a/tooling/scripts/format_java.py +++ b/tooling/scripts/format_java.py @@ -6897,6 +6897,39 @@ def _emit_argument_list( and args[0].type == "binary_expression" ) + def _emit_arg_with_optional_paren_align(arg: Node) -> None: + # Binary positional arg paren-align: when a positional + # arg is a binary expression, set `paren_align_col` to + # the arg's start column so the binary's continuation + # operators paren-align under the arg's first operand + # instead of falling back to the `block + 4` + # cumulative indent. This produces: + # + # assertTrue(cond, + # "msg " + # + var + # + " more"); + # + # instead of: + # + # assertTrue(cond, + # "msg " + # + var + " more"); // `+` at col 8 + # + # Narrow to direct binary_expression (no paren unwrap) + # to avoid the idempotency drift that originally + # narrowed the single-arg call-paren extension to + # binary args only. + if arg.type == "binary_expression": + arg_col = emitter.column + prev_align = emitter.set_paren_align_col(arg_col) + try: + _emit_node(emitter, source, arg) + finally: + emitter.set_paren_align_col(prev_align) + else: + _emit_node(emitter, source, arg) + def emit_p1() -> None: emitter.write("(") if single_arg_binary: @@ -6912,13 +6945,14 @@ def emit_p1() -> None: for index, arg in enumerate(args): if index > 0: if prev_arg_multi_row: - # 0.5.1 P3 — item 8 invariant in arg-list - # P1: when the previous arg emitted + # Item 8 invariant in arg-list P1: + # when the previous arg emitted # multi-row (a nested call / lambda / - # binary wrapped), break before this arg - # so it doesn't jam onto the wrapped - # construct's tail line. The break lands - # at the call's post-`(` column. + # binary wrapped), break before this + # arg so it doesn't jam onto the + # wrapped construct's tail line. The + # break lands at the call's post-`(` + # column. emitter.write(",") emitter.newline() emitter.write(" " * cont_col) @@ -6951,38 +6985,6 @@ def emit_p4_single_arg() -> None: emitter.write(")") emitter.pop_indent() - def _emit_arg_with_optional_paren_align(arg: Node) -> None: - # 0.5.1 P4 — when a positional arg is a binary - # expression, set `paren_align_col` to the arg's - # start column so the binary's continuation operators - # paren-align under the arg's first operand instead - # of falling back to the `block + 4` cumulative - # indent. This produces: - # - # assertTrue(cond, - # "msg " - # + var - # + " more"); - # - # instead of the 0.5.0 shape: - # - # assertTrue(cond, - # "msg " - # + var + " more"); // `+` at col 8 - # - # Narrow to direct binary_expression (no paren unwrap) - # to avoid the idempotency drift that originally - # narrowed item 10 to single-arg. - if arg.type == "binary_expression": - arg_col = emitter.column - prev_align = emitter.set_paren_align_col(arg_col) - try: - _emit_node(emitter, source, arg) - finally: - emitter.set_paren_align_col(prev_align) - else: - _emit_node(emitter, source, arg) - def emit_p2_greedy() -> None: # P2: pack as many args as fit on the call line at # the paren-aligned continuation column. Each arg's @@ -7007,14 +7009,14 @@ def emit_p2_greedy() -> None: ) continue if prev_arg_multi_row: - # 0.5.1 P3 — item 8 invariant for arg lists: - # the previous arg's emission introduced - # newlines (a nested call / lambda / binary - # wrapped multi-row), so force break before - # this arg. Otherwise the next arg lands at - # whatever column the prior arg's wrap tail - # ended on, jamming `arg)` onto the same - # line as the wrapped construct's closing. + # Item 8 invariant for arg lists: the previous + # arg's emission introduced newlines (a nested + # call / lambda / binary wrapped multi-row), + # so force break before this arg. Otherwise + # the next arg lands at whatever column the + # prior arg's wrap tail ended on, jamming + # `arg)` onto the same line as the wrapped + # construct's closing. emitter.write(",") emitter.newline() emitter.write(" " * cont_col) @@ -7085,9 +7087,9 @@ def emit_p4_multi_arg() -> None: emitter.pop_indent() # P1 is the AST-deterministic single-line candidate, but - # since 0.5.1 P3 it may emit a multi-row layout when an - # intermediate arg wraps multi-row and item-8 forces a - # break before subsequent args. try_priorities still + # may emit a multi-row layout when an intermediate arg + # wraps multi-row and item-8 forces a break before + # subsequent args. try_priorities still # measures actual rendered widths, so a P1 emit that # blows past 80 chars falls through to the next # candidate. Letting P1 try keeps the decision @@ -7761,15 +7763,16 @@ def _emit_variable_declarator( break if value is None: return - # 0.5.1: when line_comment / block_comment "extras" sit - # between the `=` token and the value RHS (e.g. javadoc + # Mid-statement comment preservation: when + # line_comment / block_comment "extras" sit between the + # `=` token and the value RHS (e.g. javadoc # `// @highlight region="..."` snippet markers on # assignment-with-text-block), the wrap engine has no way # to represent comments inline with operator placement, # so source-preserve the entire `= ...` region verbatim. - # This matches the 0.5.0 treatment of comments inside - # argument lists (`_arg_list_takes_source_preserve_path`). - # Without this guard the comments are silently dropped + # This matches the treatment of comments inside argument + # lists (`_arg_list_takes_source_preserve_path`). Without + # this guard the comments are silently dropped # since `_emit_node(value)` walks only the value subtree # and never visits the extra comment children. equals_token = None