-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
Open
Labels
3.15new features, bugs and security fixesnew features, bugs and security fixesextension-modulesC modules in the Modules dirC modules in the Modules dirperformancePerformance or resource usagePerformance or resource usagetopic-unicodetype-featureA feature request or enhancementA feature request or enhancement
Description
For most ideographs, the Name property value is derived by concatenating a script-specific prefix string to the code point, expressed in uppercase hexadecimal, with the usual 4- to 6-digit convention (see rule NR2 in chapter 4.8.1 of Unicode 17.0.0 spec).
Thus, names for Hangul syllables and most Han and Tangut ideographic characters are not explicitly listed in UnicodeData.txt. They are generated algorithmically in unicodedata. See #80667. But ideographic characters for scripts other than Han and Tangut, as well as Egyptian hieroglyphs, have their names listed explicitly in UnicodeData.txt, even when their names are derived by rule NR2. We can reduce the name table if exclude names derived by rule NR2 and generate them using existing code.
Linked PRs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
3.15new features, bugs and security fixesnew features, bugs and security fixesextension-modulesC modules in the Modules dirC modules in the Modules dirperformancePerformance or resource usagePerformance or resource usagetopic-unicodetype-featureA feature request or enhancementA feature request or enhancement