Expand description
The functions in this module return a CodePointSetData
containing
the set of characters with a particular Unicode property.
The descriptions of most properties are taken from TR44
, the documentation for the
Unicode Character Database. Some properties are instead defined in TR18
, the
documentation for Unicode regular expressions. In particular, Annex C of this document
defines properties for POSIX compatibility.
Structs§
- A wrapper around code point set data. It is returned by APIs that return Unicode property data in a set-like form, ex: a set of code points sharing the same value for a Unicode property. Access its data via the borrowed version,
CodePointSetDataBorrowed
. - A borrowed wrapper around code point set data, returned by
CodePointSetData::as_borrowed()
. More efficient to query. - A wrapper around
UnicodeSet
data (characters and strings) - A borrowed wrapper around code point set data, returned by
UnicodeSetData::as_borrowed()
. More efficient to query.
Functions§
- Characters with the Alphabetic or Decimal_Number property This is defined for POSIX compatibility.
- Alphabetic characters
- ASCII characters commonly used for the representation of hexadecimal numbers
- Characters and character sequences intended for general-purpose, independent, direct input. See
Unicode Technical Standard #51
for more details. - Format control characters which have specific functions in the Unicode Bidirectional Algorithm
- Characters that are mirrored in bidirectional text
- Horizontal whitespace characters
- Characters which are ignored for casing purposes
- Characters that are either the source of a case mapping or in the target of a case mapping
- Uppercase, lowercase, and titlecase characters
- Characters whose normalized forms are not stable under case folding
- Characters which may change when they undergo case mapping
- Characters whose normalized forms are not stable under a toLowercase mapping
- Characters which are not identical to their NFKC_Casefold mapping
- Characters whose normalized forms are not stable under a toTitlecase mapping
- Characters whose normalized forms are not stable under a toUppercase mapping
- Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents
- For programmatic determination of default ignorable code points. New characters that should be ignored in rendering (unless explicitly supported) will be assigned in these ranges, permitting programs to correctly handle the default rendering of such characters when not otherwise supported.
- Deprecated characters. No characters will ever be removed from the standard, but the usage of deprecated characters is strongly discouraged.
- Characters that linguistically modify the meaning of another character to which they apply
- Characters that are emoji
- Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps
- Characters that are emoji modifiers
- Characters that can serve as a base for emoji modifiers
- Characters that have emoji presentation by default
- Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
- Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
- Return a
CodePointSetData
for a value or a grouping of values of the General_Category property. SeeGeneralCategoryGroup
. - Characters that are excluded from composition See https://unicode.org/Public/UNIDATA/CompositionExclusions.txt
- Visible characters. This is defined for POSIX compatibility.
- Property used together with the definition of Standard Korean Syllable Block to define “Grapheme base”. See D58 in Chapter 3, Conformance in the Unicode Standard.
- Property used to define “Grapheme extender”. See D59 in Chapter 3, Conformance in the Unicode Standard.
- Deprecated property. Formerly proposed for programmatic determination of grapheme cluster boundaries.
- Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents
- Deprecated property. Dashes which are used to mark connections between pieces of words, plus the Katakana middle dot.
- Characters that can come after the first character in an identifier. If using NFKC to fold differences between characters, use
load_xid_continue
instead. SeeUnicode Standard Annex #31
for more details. - Characters that can begin an identifier. If using NFKC to fold differences between characters, use
load_xid_start
instead. SeeUnicode Standard Annex #31
for more details. - Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
- Characters used in Ideographic Description Sequences
- Characters used in Ideographic Description Sequences
- Format control characters which have specific functions for control of cursive joining and ligation
- A version of
alnum()
that uses custom data provided by aDataProvider
. - A version of
alphabetic()
that uses custom data provided by aDataProvider
. - A version of
ascii_hex_digit()
that uses custom data provided by aDataProvider
. - A version of
basic_emoji()
that uses custom data provided by aDataProvider
. - A version of
bidi_control()
that uses custom data provided by aDataProvider
. - A version of
bidi_mirrored()
that uses custom data provided by aDataProvider
. - A version of
blank()
that uses custom data provided by aDataProvider
. - A version of
case_ignorable()
that uses custom data provided by aDataProvider
. - A version of
case_sensitive()
that uses custom data provided by aDataProvider
. - A version of
cased()
that uses custom data provided by aDataProvider
. - A version of
changes_when_casefolded()
that uses custom data provided by aDataProvider
. - A version of
changes_when_casemapped()
that uses custom data provided by aDataProvider
. - A version of
changes_when_lowercased()
that uses custom data provided by aDataProvider
. - A version of
changes_when_nfkc_casefolded()
that uses custom data provided by aDataProvider
. - A version of
changes_when_titlecased()
that uses custom data provided by aDataProvider
. - A version of
changes_when_uppercased()
that uses custom data provided by aDataProvider
. - A version of
dash()
that uses custom data provided by aDataProvider
. - A version of
default_ignorable_code_point()
that uses custom data provided by aDataProvider
. - A version of
deprecated()
that uses custom data provided by aDataProvider
. - A version of
diacritic()
that uses custom data provided by aDataProvider
. - A version of
emoji()
that uses custom data provided by aDataProvider
. - A version of
emoji_component()
that uses custom data provided by aDataProvider
. - A version of
emoji_modifier()
that uses custom data provided by aDataProvider
. - A version of
emoji_modifier_base()
that uses custom data provided by aDataProvider
. - A version of
emoji_presentation()
that uses custom data provided by aDataProvider
. - A version of
extended_pictographic()
that uses custom data provided by aDataProvider
. - A version of
extender()
that uses custom data provided by aDataProvider
. - Returns a type capable of looking up values for a property specified as a string, as long as it is a binary property listed in ECMA-262, using strict matching on the names in the spec.
- A version of
load_for_ecma262
that uses custom data provided by aDataProvider
. - A version of
load_for_ecma262
that uses custom data provided by anAnyProvider
. - A version of
for_general_category_group()
that uses custom data provided by aDataProvider
. - A version of
full_composition_exclusion()
that uses custom data provided by aDataProvider
. - A version of
graph()
that uses custom data provided by aDataProvider
. - A version of
grapheme_base()
that uses custom data provided by aDataProvider
. - A version of
grapheme_extend()
that uses custom data provided by aDataProvider
. - A version of
grapheme_link()
that uses custom data provided by aDataProvider
. - A version of
hex_digit()
that uses custom data provided by aDataProvider
. - A version of
hyphen()
that uses custom data provided by aDataProvider
. - A version of
id_continue()
that uses custom data provided by aDataProvider
. - A version of
id_start()
that uses custom data provided by aDataProvider
. - A version of
ideographic()
that uses custom data provided by aDataProvider
. - A version of
ids_binary_operator()
that uses custom data provided by aDataProvider
. - A version of
ids_trinary_operator()
that uses custom data provided by aDataProvider
. - A version of
join_control()
that uses custom data provided by aDataProvider
. - A version of
logical_order_exception()
that uses custom data provided by aDataProvider
. - A version of
lowercase()
that uses custom data provided by aDataProvider
. - A version of
math()
that uses custom data provided by aDataProvider
. - A version of
nfc_inert()
that uses custom data provided by aDataProvider
. - A version of
nfd_inert()
that uses custom data provided by aDataProvider
. - A version of
nfkc_inert()
that uses custom data provided by aDataProvider
. - A version of
nfkd_inert()
that uses custom data provided by aDataProvider
. - A version of
noncharacter_code_point()
that uses custom data provided by aDataProvider
. - A version of
pattern_syntax()
that uses custom data provided by aDataProvider
. - A version of
pattern_white_space()
that uses custom data provided by aDataProvider
. - A version of
prepended_concatenation_mark()
that uses custom data provided by aDataProvider
. - A version of
print()
that uses custom data provided by aDataProvider
. - A version of
quotation_mark()
that uses custom data provided by aDataProvider
. - A version of
radical()
that uses custom data provided by aDataProvider
. - A version of
regional_indicator()
that uses custom data provided by aDataProvider
. - A version of
segment_starter()
that uses custom data provided by aDataProvider
. - A version of
sentence_terminal()
that uses custom data provided by aDataProvider
. - A version of
soft_dotted()
that uses custom data provided by aDataProvider
. - A version of
terminal_punctuation()
that uses custom data provided by aDataProvider
. - A version of
unified_ideograph()
that uses custom data provided by aDataProvider
. - A version of
uppercase()
that uses custom data provided by aDataProvider
. - A version of
variation_selector()
that uses custom data provided by aDataProvider
. - A version of
white_space()
that uses custom data provided by aDataProvider
. - A version of
xdigit()
that uses custom data provided by aDataProvider
. - A version of
xid_continue()
that uses custom data provided by aDataProvider
. - A version of
xid_start()
that uses custom data provided by aDataProvider
. - A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao
- Lowercase characters
- Characters used in mathematical notation
- Characters that are inert under NFC, i.e., they do not interact with adjacent characters
- Characters that are inert under NFD, i.e., they do not interact with adjacent characters
- Characters that are inert under NFKC, i.e., they do not interact with adjacent characters
- Characters that are inert under NFKD, i.e., they do not interact with adjacent characters
- Code points permanently reserved for internal use
- Characters used as syntax in patterns (such as regular expressions). See
Unicode Standard Annex #31
for more details. - Characters used as whitespace in patterns (such as regular expressions). See
Unicode Standard Annex #31
for more details. - A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
- Printable characters (visible characters and whitespace). This is defined for POSIX compatibility.
- Punctuation characters that function as quotation marks.
- Characters used in the definition of Ideographic Description Sequences
- Regional indicator characters, U+1F1E6..U+1F1FF
- Characters that are starters in terms of Unicode normalization and combining character sequences
- Punctuation characters that generally mark the end of sentences
- Characters with a “soft dot”, like i or j. An accent placed on these characters causes the dot to disappear.
- Punctuation characters that generally mark the end of textual units
- A property which specifies the exact set of Unified CJK Ideographs in the standard
- Uppercase characters
- Characters that are Variation Selectors.
- Spaces, separator characters and other control characters which should be treated by programming languages as “white space” for the purpose of parsing elements
- Hexadecimal digits This is defined for POSIX compatibility.
- Characters that can come after the first character in an identifier. See
Unicode Standard Annex #31
for more details. - Characters that can begin an identifier. See
Unicode Standard Annex #31
for more details.