Unicode - MIT/GNU Scheme 7.7.90+

Previous: Character Sets, Up: Characters

5.7 Unicode

MIT/GNU Scheme provides rudimentary support for Unicode characters. In an ideal world, Unicode would be the base character set for MIT/GNU Scheme. But MIT/GNU Scheme predates the invention of Unicode, and converting an application of this size is a considerable undertaking. So for the time being, the base character set for I/O and strings is ISO-8859-1, and Unicode support is grafted on.

This Unicode support was implemented as a part of the XML parser (see XML Support) implementation. XML uses Unicode as its base character set, and any XML implementation must support Unicode.

The basic unit in a Unicode implementation is the code point. The character equivalent of a code point is a wide character.

— procedure: unicode-code-point? object

Returns #t if object is a Unicode code point. Code points are implemented as exact non-negative integers. They are further limited, by the Unicode standard, to be strictly less than #x110000, with the values #xD800 through #xDFFF, #xFFFE, and #xFFFF excluded.

— procedure: wide-char? object

Returns #t if object is a wide character, specifically if object is a character with no bucky bits and whose code satisfies unicode-code-point?.

The Unicode implementation consists of three parts:

An implementation of wide strings, which are character strings that support the full Unicode character set with constant-time access.
I/O procedures that read and write Unicode characters in several external representations, specifically UTF-8, UTF-16, and UTF-32.
An alphabet abstraction, which is an efficient implementation of sets of Unicode code points (similar to the char-set abstraction).