Next: , Previous: Converting Representations, Up: Non-ASCII Characters


33.3 Selecting a Representation

Sometimes it is useful to examine an existing buffer or string as multibyte when it was unibyte, or vice versa.

— Function: set-buffer-multibyte multibyte

Set the representation type of the current buffer. If multibyte is non-nil, the buffer becomes multibyte. If multibyte is nil, the buffer becomes unibyte.

This function leaves the buffer contents unchanged when viewed as a sequence of bytes. As a consequence, it can change the contents viewed as characters; a sequence of two bytes which is treated as one character in multibyte representation will count as two characters in unibyte representation. Character codes 128 through 159 are an exception. They are represented by one byte in a unibyte buffer, but when the buffer is set to multibyte, they are converted to two-byte sequences, and vice versa.

This function sets enable-multibyte-characters to record which representation is in use. It also adjusts various data in the buffer (including overlays, text properties and markers) so that they cover the same text as they did before.

You cannot use set-buffer-multibyte on an indirect buffer, because indirect buffers always inherit the representation of the base buffer.

— Function: string-as-unibyte string

This function returns a string with the same bytes as string but treating each byte as a character. This means that the value may have more characters than string has.

If string is already a unibyte string, then the value is string itself. Otherwise it is a newly created string, with no text properties. If string is multibyte, any characters it contains of charset eight-bit-control or eight-bit-graphic are converted to the corresponding single byte.

— Function: string-as-multibyte string

This function returns a string with the same bytes as string but treating each multibyte sequence as one character. This means that the value may have fewer characters than string has.

If string is already a multibyte string, then the value is string itself. Otherwise it is a newly created string, with no text properties. If string is unibyte and contains any individual 8-bit bytes (i.e. not part of a multibyte form), they are converted to the corresponding multibyte character of charset eight-bit-control or eight-bit-graphic.