Fix font fallback for compound/ZWJ-joined emoji (#861)#1071
Open
StefanoD wants to merge 1 commit into
Open
Conversation
Compound emoji that rely on automatic font fallback (i.e. that are not
explicitly wrapped in a <tspan> with an emoji font) were dropped entirely.
This affected flags (regional-indicator pairs such as the UK flag) and ZWJ
sequences (rainbow flag, family, ...). Emoji-only text and explicitly
tagged emoji worked, which is why the existing tests never caught it.
Root cause
----------
shape_text resolved missing glyphs by re-shaping the *whole* text with a
fallback font and then merging the two glyph lists. The merge required both
shapings to have the same number of glyphs (`glyphs.len() != fallback_glyphs.len()
=> break`) and copied glyphs one-by-one by index. Compound emoji ligate
several code points into a single glyph in the emoji font, so the glyph
counts differ and the whole fallback was abandoned, leaving the emoji
unrendered.
The per-index assumption was wrong for a second reason: default-ignorable
code points (U+FE0F, U+200D) are turned into hidden space glyphs by the
shaper, so the .notdef glyphs of one emoji are not even contiguous.
Example, "Hi<rainbow-flag>there" shaped with Noto Sans (primary) produced
(glyph_id, text):
(H), (i), (0,""), (3,""), (3,""), (0,"<flag>"), (t), ...
-> id 0 = .notdef (U+1F3F3 / U+1F308), id 3 = hidden space (U+FE0F / U+200D)
Before: Hi[][]there (flag dropped, two .notdef boxes)
After: Hi<rainbow>there (single ligated glyph from the emoji font)
Fix
---
Replace the index-based merge with a cluster-based one
(merge_fallback_glyphs): the text is only cut at cluster boundaries shared
by *both* shapings, and whole clusters are replaced when the primary font
could not resolve them. This implements the existing
`// TODO: Replace clusters and not glyphs` and is correct regardless of how
many glyphs each font produces, for both LTR and RTL/BIDI runs (BIDI run
boundaries are always shared cluster boundaries).
Tests
-----
Adds the usvg test `compound_emoji_font_fallback`, which fails before and
passes after the fix. The full suite (1724 render tests) stays green except
for one reference image (see below).
Updated reference image: tests/text/direction/rtl.png
------------------------------------------------------
This is the only one of the 1724 render tests whose output changes (by
80 px). The test mixes Arabic (Noto Sans -> Amiri fallback) with the Latin
word "SVG", i.e. a two-stage fallback where neither font covers everything.
With the old index merge the fallback glyphs were spliced into the primary
shaping's structure, carrying over advances position-by-position; the new
cluster merge takes Amiri's own advances for the whole fallback cluster,
which shifts the mixed run by a sub-pixel amount.
The change is a refinement, not a regression:
- the glyph shapes are identical and match Chrome's rendering of the same
fonts (verified visually);
- the isolated Arabic word renders byte-for-byte identically before and
after; only the mixed Arabic+Latin advances differ slightly.
Because tiny-skia rasterizes deterministically, the regenerated reference is
stable across platforms/CI, so the strict pixel comparison keeps working.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
586f9df to
cf58084
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix font fallback for compound/ZWJ-joined emoji (#861)
Fixes #861.
Problem
Compound emoji that rely on automatic font fallback — i.e. that are not explicitly wrapped in a
<tspan>with an emoji font — were dropped entirely. This affects flags (regional-indicator pairs like 🇬🇧) and ZWJ sequences (🏳️🌈, 👨👩👧👦). Emoji-only text and explicitly tagged emoji always worked, which is why the existing tests didn't catch it.Hi🏳️🌈there(Noto Sans primary, emoji font as fallback)Hi□□there— flag dropped, two.notdefboxesHi🏳️🌈there— single ligated glyph from the emoji fontThis now matches Chrome, which renders the ZWJ sequence as one combined glyph.
Root cause
shape_textre-shaped the whole text with a fallback font and merged the two glyph lists by index, bailing out whenever the counts differed (glyphs.len() != fallback_glyphs.len() => break). Compound emoji ligate several code points into one glyph in the emoji font, so the counts differ and the entire fallback was abandoned. The per-index assumption is also broken by default-ignorable code points (U+FE0F,U+200D), which the shaper turns into hidden space glyphs — so the.notdefglyphs of one emoji aren't even contiguous.Fix
Replace the index-based merge with a cluster-based one (
merge_fallback_glyphs): cut the text only at cluster boundaries shared by both shapings and replace whole clusters that the primary font couldn't resolve. This implements the existing// TODO: Replace clusters and not glyphsand is correct regardless of glyph counts, for LTR and RTL/BIDI alike (BIDI run boundaries are always shared cluster boundaries).Tests
compound_emoji_font_fallback(usvg parser tests) — fails before, passes after.Why
tests/text/direction/rtl.pngchangedThis is the only render test whose output changes (by 80 px). It mixes Arabic (Noto Sans → Amiri fallback) with the Latin word "SVG" — a two-stage fallback where neither font covers everything. The old merge spliced fallback glyphs into the primary shaping's structure, carrying advances over position-by-position; the new cluster merge uses Amiri's own advances for the whole fallback cluster, shifting the mixed run by a sub-pixel amount. It's a refinement, not a regression: glyph shapes are unchanged and match Chrome's rendering of the same fonts, the isolated Arabic renders byte-for-byte identically, and only the mixed Arabic+Latin advances differ slightly. Since tiny-skia rasterizes deterministically, the regenerated reference is stable across CI.
🤖 Generated with Claude Code