File defining charset: format description

File defining charset: format description

The data is represented in tab-delimited format. The columns are:

char-character or its code in decimal or hexadecimal (0xHH) representation in charset specified by this file.

white-space, digit, hex-digit, letter, word-a set of flags specifying the class that the character belongs to. Empty field means the symbol does not belong to this class, whereas non-empty field (e.g. 'x') means it does.

For more detailed information on character classes see regular expressions description in special literature.

lowercase-if character has a pair in lowercase, the field contains this pair (as either character or code). For example, 'W' pairs with 'w'. This field is used in regular expressions for case-insensitive search, as well as in methods upper and lower of class string.

unicode1-character's main Unicode value. If it coincides with character code, this field can remain empty.

unicode2-character's additional Unicode value, if exists.

Last updated: 01.04.2004