Skip to content

Fix font fallback for compound/ZWJ-joined emoji (#861)#1071

Open
StefanoD wants to merge 1 commit into
linebender:mainfrom
StefanoD:fix/861-compound-emoji-font-fallback
Open

Fix font fallback for compound/ZWJ-joined emoji (#861)#1071
StefanoD wants to merge 1 commit into
linebender:mainfrom
StefanoD:fix/861-compound-emoji-font-fallback

Conversation

@StefanoD

Copy link
Copy Markdown

Fix font fallback for compound/ZWJ-joined emoji (#861)

Fixes #861.

Problem

Compound emoji that rely on automatic font fallback — i.e. that are not explicitly wrapped in a <tspan> with an emoji font — were dropped entirely. This affects flags (regional-indicator pairs like 🇬🇧) and ZWJ sequences (🏳️‍🌈, 👨‍👩‍👧‍👦). Emoji-only text and explicitly tagged emoji always worked, which is why the existing tests didn't catch it.

Hi🏳️‍🌈there (Noto Sans primary, emoji font as fallback)
before Hi□□there — flag dropped, two .notdef boxes
after Hi🏳️‍🌈there — single ligated glyph from the emoji font

This now matches Chrome, which renders the ZWJ sequence as one combined glyph.

Root cause

shape_text re-shaped the whole text with a fallback font and merged the two glyph lists by index, bailing out whenever the counts differed (glyphs.len() != fallback_glyphs.len() => break). Compound emoji ligate several code points into one glyph in the emoji font, so the counts differ and the entire fallback was abandoned. The per-index assumption is also broken by default-ignorable code points (U+FE0F, U+200D), which the shaper turns into hidden space glyphs — so the .notdef glyphs of one emoji aren't even contiguous.

Fix

Replace the index-based merge with a cluster-based one (merge_fallback_glyphs): cut the text only at cluster boundaries shared by both shapings and replace whole clusters that the primary font couldn't resolve. This implements the existing // TODO: Replace clusters and not glyphs and is correct regardless of glyph counts, for LTR and RTL/BIDI alike (BIDI run boundaries are always shared cluster boundaries).

Tests

  • New compound_emoji_font_fallback (usvg parser tests) — fails before, passes after.
  • Full render suite (1724 tests) stays green except for one reference image:

Why tests/text/direction/rtl.png changed

This is the only render test whose output changes (by 80 px). It mixes Arabic (Noto Sans → Amiri fallback) with the Latin word "SVG" — a two-stage fallback where neither font covers everything. The old merge spliced fallback glyphs into the primary shaping's structure, carrying advances over position-by-position; the new cluster merge uses Amiri's own advances for the whole fallback cluster, shifting the mixed run by a sub-pixel amount. It's a refinement, not a regression: glyph shapes are unchanged and match Chrome's rendering of the same fonts, the isolated Arabic renders byte-for-byte identically, and only the mixed Arabic+Latin advances differ slightly. Since tiny-skia rasterizes deterministically, the regenerated reference is stable across CI.

🤖 Generated with Claude Code

Compound emoji that rely on automatic font fallback (i.e. that are not
explicitly wrapped in a <tspan> with an emoji font) were dropped entirely.
This affected flags (regional-indicator pairs such as the UK flag) and ZWJ
sequences (rainbow flag, family, ...). Emoji-only text and explicitly
tagged emoji worked, which is why the existing tests never caught it.

Root cause
----------
shape_text resolved missing glyphs by re-shaping the *whole* text with a
fallback font and then merging the two glyph lists. The merge required both
shapings to have the same number of glyphs (`glyphs.len() != fallback_glyphs.len()
=> break`) and copied glyphs one-by-one by index. Compound emoji ligate
several code points into a single glyph in the emoji font, so the glyph
counts differ and the whole fallback was abandoned, leaving the emoji
unrendered.

The per-index assumption was wrong for a second reason: default-ignorable
code points (U+FE0F, U+200D) are turned into hidden space glyphs by the
shaper, so the .notdef glyphs of one emoji are not even contiguous.

Example, "Hi<rainbow-flag>there" shaped with Noto Sans (primary) produced
(glyph_id, text):
  (H), (i), (0,""), (3,""), (3,""), (0,"<flag>"), (t), ...
  -> id 0 = .notdef (U+1F3F3 / U+1F308), id 3 = hidden space (U+FE0F / U+200D)
Before:  Hi[][]there      (flag dropped, two .notdef boxes)
After:   Hi<rainbow>there  (single ligated glyph from the emoji font)

Fix
---
Replace the index-based merge with a cluster-based one
(merge_fallback_glyphs): the text is only cut at cluster boundaries shared
by *both* shapings, and whole clusters are replaced when the primary font
could not resolve them. This implements the existing
`// TODO: Replace clusters and not glyphs` and is correct regardless of how
many glyphs each font produces, for both LTR and RTL/BIDI runs (BIDI run
boundaries are always shared cluster boundaries).

Tests
-----
Adds the usvg test `compound_emoji_font_fallback`, which fails before and
passes after the fix. The full suite (1724 render tests) stays green except
for one reference image (see below).

Updated reference image: tests/text/direction/rtl.png
------------------------------------------------------
This is the only one of the 1724 render tests whose output changes (by
80 px). The test mixes Arabic (Noto Sans -> Amiri fallback) with the Latin
word "SVG", i.e. a two-stage fallback where neither font covers everything.
With the old index merge the fallback glyphs were spliced into the primary
shaping's structure, carrying over advances position-by-position; the new
cluster merge takes Amiri's own advances for the whole fallback cluster,
which shifts the mixed run by a sub-pixel amount.

The change is a refinement, not a regression:
  - the glyph shapes are identical and match Chrome's rendering of the same
    fonts (verified visually);
  - the isolated Arabic word renders byte-for-byte identically before and
    after; only the mixed Arabic+Latin advances differ slightly.
Because tiny-skia rasterizes deterministically, the regenerated reference is
stable across platforms/CI, so the strict pixel comparison keeps working.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@StefanoD StefanoD force-pushed the fix/861-compound-emoji-font-fallback branch from 586f9df to cf58084 Compare June 14, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compound/ZWJ-joined emoji

1 participant