6.4.2.1.1. General Category

The values in this field are abbreviations for the following. Some of the values are normative, and some are informative. For more information, see the Unicode Standard. Note: the standard does not assign information to control characters (except for TAB in the Bidirectonal Algorithm). Implementations will generally also assign categories to certain control characters, notably CR and LF, according to platform conventions.

General Category, Normative.

CodeMeaning
MnMark, Non-Spacing
McMark, Spacing Combining
MeMark, Enclosing
NdNumber, Decimal Digit
NlNumber, Letter
NoNumber, Other
ZsSeparator, Space
ZlSeparator, Line
ZpSeparator, Paragraph
CcOther, Control
CfOther, Format
CsOther, Surrogate
CoOther, Private Use
CnOther, Not Assigned

General Category, Informative.

CodeMeaning
LuLetter, Uppercase
LlLetter, Lowercase
LtLetter, Titlecase
LmLetter, Modifier
LoLetter, Other
PcPunctuation, Connector
PdPunctuation, Dash
PsPunctuation, Open
PePunctuation, Close
PiPunctuation, Initial quote
(may behave like Ps or Pe depending on usage)
PfPunctuation, Final quote
(may behave like Ps or Pe depending on usage)
PoPunctuation, Other
SmSymbol, Math
ScSymbol, Currency
SkSymbol, Modifier
SoSymbol, Other