Next: Readline Support, Previous: Data Files, Up: Top [Contents][Index]
The standard units data file is in Unicode, using UTF-8 encoding. Most definitions use only ASCII characters (i.e., code points U+0000 through U+007F); definitions using non-ASCII characters appear in blocks beginning with ‘!utf8’ and ending with ‘!endutf8’.
When units
starts, it checks the locale to determine the
character set. If units
is compiled with Unicode support and
definitions; otherwise these definitions are ignored. When Unicode
support is active, units
will check every line of all of the
units data files for invalid or non-printing UTF-8 sequences; if such
sequences occur, units
ignores the entire line. In addition
to checking validity, units
determines the display width of
non-ASCII characters to ensure proper positioning of the pointer in some
error messages and to align columns for the ‘search’ and ‘?’
commands.
At present, units
does not support Unicode under Microsoft
Windows. The UTF-16 and UTF-32 encodings are not supported on any
systems.
If definitions that contain non-ASCII characters are added to a units data file, those definitions should be enclosed within ‘!utf8’ … ‘!endutf8’ to ensure that they are only loaded when Unicode support is available. As usual, the ‘!’ must appear as the first character on the line. As discussed in Units Data Files, it’s usually best to put such definitions in supplemental data files linked by an ‘!include’ command or in a personal units data file.
When Unicode support is not active, units
makes no assumptions
about character encoding, except that characters in the range 00–7F
hexadecimal correspond to ASCII encoding. Non-ASCII characters are
simply sequences of bytes, and have no special meanings; for definitions
in supplementary units data files, you can use any encoding consistent
with this assumption. For example, if you wish to use non-ASCII
characters in definitions when running units
under Windows,
you can use a character set such as Windows “ANSI” (code page 1252 in
the US and Western Europe). You can even use UTF-8, though some
messages may be improperly aligned, and units
will not detect
invalid UTF-8 sequences. If you use UTF-8 encoding when Unicode support
is not active, you should place any definitions with non-ASCII
characters outside ‘!utf8’ … ‘!endutf8’
blocks—otherwise, they will be ignored.
Typeset material other than code examples usually uses the Unicode minus
(U+2212) rather than the ASCII hyphen-minus operator (U+002D) used in
units
; the figure dash (U+2012) and en dash (U+2013) are also
occasionally used. To allow such material to be copied and pasted for
interactive use or in units data files, units
converts these
characters to U+002D before further processing. Because of this, none
of these characters can appear in unit names.
Next: Readline Support, Previous: Data Files, Up: Top [Contents][Index]