Skip to content

feat(compiler): Support braced Unicode escapes in template string literals#67269

Open
kbrilla wants to merge 3 commits intoangular:mainfrom
kbrilla:feat/braced-unicode-escapes-upstream
Open

feat(compiler): Support braced Unicode escapes in template string literals#67269
kbrilla wants to merge 3 commits intoangular:mainfrom
kbrilla:feat/braced-unicode-escapes-upstream

Conversation

@kbrilla
Copy link
Contributor

@kbrilla kbrilla commented Feb 25, 2026

Braced Unicode escapes in Angular template string literals

Scope (feature-only)

This PR adds support for ES6 braced Unicode escapes in Angular template expression string literals:

  • {{ '\u{4f60}' }} -> (CJK character)
  • {{ '\u{1F600}' }} -> 😀 (emoji, above U+FFFF)
  • {{ '\u4f60' }} -> still works (traditional 4-digit form)

Additional validated sequence example:

  • \u{48}\u{65}\u{6C}\u{6C}\u{6F} -> Hello

Working example:
image

What changed

  • Extended expression lexer string escape handling to support both:
    • \uXXXX
    • \u{...}
  • Added code point handling above BMP (> 0xFFFF) via String.fromCodePoint(...).
  • Added parser tests for:
    • CJK braced escape ()
    • Emoji braced escape (😀)
    • Legacy 4-digit escape compatibility
    • Multi-escape sequence producing Hello

Character mapping

Character Name / Description ✅ Braced Escape (\u{...}) 🛠️ Standard / Surrogate Pair
H Latin Capital H \u{48} \u0048
e Latin Small E \u{65} \u0065
l Latin Small L \u{6C} \u006C
o Latin Small O \u{6F} \u006F
🚀 Rocket \u{1F680} \uD83D\uDE80 (Pair)
💎 Gem Stone \u{1F48E} \uD83D\uDC8E (Pair)
🧩 Puzzle Piece \u{1F9E9} \uD83E\uDDE9 (Pair)
𓃰 Elephant (Hieroglyph) \u{130F0} \uD80C\uDCF0 (Pair)

Key takeaways

  • The Why: Standard \uXXXX is limited to 16 bits (65,536 combinations). Modern Unicode extends to 21-bit code points (astral planes) used by many emoji and historic scripts.
  • Braced Escapes: \u{...} is the modern ES6 form. It directly expresses code points without manual surrogate math or zero-padding.
  • Surrogate Pairs: Legacy UTF-16 workaround where one non-BMP code point is encoded as two 16-bit units (high + low surrogate).

Tests run

  • pnpm bazel test //packages/compiler/test:test --test_filter=".*unicode braced escapes.*" --test_output=errors (passed)
  • pnpm bazel test //packages/compiler/test:test --test_output=errors (passed)

Notes

AI disclosure

AI assisted with implementation and drafting under user direction; all changes were reviewed and validated with local tests before opening this PR.

@angular-robot angular-robot bot added detected: feature PR contains a feature commit area: compiler Issues related to `ngc`, Angular's template compiler labels Feb 25, 2026
@ngbot ngbot bot added this to the Backlog milestone Feb 25, 2026
@kbrilla kbrilla force-pushed the feat/braced-unicode-escapes-upstream branch 2 times, most recently from f18c605 to 1b1fbee Compare February 25, 2026 03:15
@kbrilla kbrilla force-pushed the feat/braced-unicode-escapes-upstream branch from 1b1fbee to 5c4e599 Compare February 25, 2026 03:17
@kbrilla kbrilla force-pushed the feat/braced-unicode-escapes-upstream branch from 5c4e599 to 07e183e Compare February 25, 2026 03:19
@kbrilla kbrilla marked this pull request as ready for review February 25, 2026 03:22
@kbrilla kbrilla changed the title WIP: support braced Unicode escapes in template string literals Support braced Unicode escapes in template string literals Feb 25, 2026
@pullapprove pullapprove bot requested a review from thePunderWoman February 25, 2026 04:09
@kbrilla kbrilla changed the title Support braced Unicode escapes in template string literals feat(compiler): Support braced Unicode escapes in template string literals Feb 25, 2026
const hexStart = this.index;
while (
this.input.charCodeAt(this.index) !== chars.$RBRACE &&
this.input.charCodeAt(this.index) !== chars.$EOF &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this condition is helpful. this.input.charCodeAt(this.length) equals NaN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably be writen as

while (this.index < this.length) {
  const ch = this.input.charCodeAt(this.index);
  if (ch === chars.$RBRACE) break;
  this.advance();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the catch! Applied your version.


const hasClosingBrace = this.input.charCodeAt(this.index) === chars.$RBRACE;
const hex = this.input.substring(hexStart, this.index);
unicodeEscapeForError = `\\u{${hex}}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt ?
unicodeEscapeForError = hasClosingBrace ? \u{${hex}}:\u{${hex};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: compiler Issues related to `ngc`, Angular's template compiler detected: feature PR contains a feature commit

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support braced Unicode escapes

2 participants