Next: , Previous: TLD Functions, Up: Top


8 PR29 Functions

A deficiency in the specification of Unicode Normalization Forms has been found. The consequence is that some strings can be normalized into different strings by different implementations. In other words, two different implementations may return different output for the same input (because the interpretation of the specification is ambiguous). Further, an implementation invoked again on the one of the output strings may return a different string (because one of the interpretation of the ambiguous specification make normalization non-idempotent). Fortunately, only a select few character sequence exhibit this problem, and none of them are expected to occur in natural languages (due to different linguistic uses of the involved characters).

A full discussion of the problem may be found at:

http://www.unicode.org/review/pr-29.html

The PR29 functions below allow you to detect the problem sequence. So when would you want to use these functions? For most applications, such as those using Nameprep for IDN, this is likely only to be an interoperability problem. Thus, you may not want to care about it, as the character sequences will rarely occur naturally. However, if you are using a profile, such as SASLPrep, to process authentication tokens; authorization tokens; or passwords, there is a real danger that attackers may try to use the peculiarities in these strings to attack parts of your system. As only a small number of strings, and no naturally occurring strings, exhibit this problem, the conservative approach of rejecting the strings is recommended. If this approach is not used, you should instead verify that all parts of your system, that process the tokens and passwords, use a NFKC implementation that produce the same output for the same input.

Technically inclined readers may be interested in knowing more about the implementation aspects of the PR29 flaw. See PR29 discussion.

8.1 Header file pr29.h

To use the functions explained in this chapter, you need to include the file pr29.h using:

     #include <pr29.h>

8.2 Core Functions

pr29_4

— Function: int pr29_4 (const uint32_t * in, size_t len)

in: input array with unicode code points.

len: length of input array with unicode code points.

Check the input to see if it may be normalized into different strings by different NFKC implementations, due to an anomaly in the NFKC specifications.

Return value: Returns the Pr29_rc value PR29_SUCCESS on success, and PR29_PROBLEM if the input sequence is a "problem sequence" (i.e., may be normalized into different strings by different implementations).

8.3 Utility Functions

pr29_4z

— Function: int pr29_4z (const uint32_t * in)

in: zero terminated array of Unicode code points.

Check the input to see if it may be normalized into different strings by different NFKC implementations, due to an anomaly in the NFKC specifications.

Return value: Returns the Pr29_rc value PR29_SUCCESS on success, and PR29_PROBLEM if the input sequence is a "problem sequence" (i.e., may be normalized into different strings by different implementations).

pr29_8z

— Function: int pr29_8z (const char * in)

in: zero terminated input UTF-8 string.

Check the input to see if it may be normalized into different strings by different NFKC implementations, due to an anomaly in the NFKC specifications.

Return value: Returns the Pr29_rc value PR29_SUCCESS on success, and PR29_PROBLEM if the input sequence is a "problem sequence" (i.e., may be normalized into different strings by different implementations), or PR29_STRINGPREP_ERROR if there was a problem converting the string from UTF-8 to UCS-4.

8.4 Error Handling

pr29_strerror

— Function: const char * pr29_strerror (Pr29_rc rc)

rc: an Pr29_rc return code.

Convert a return code integer to a text string. This string can be used to output a diagnostic message to the user.

PR29_SUCCESS: Successful operation. This value is guaranteed to always be zero, the remaining ones are only guaranteed to hold non-zero values, for logical comparison purposes.

PR29_PROBLEM: A problem sequence was encountered.

PR29_STRINGPREP_ERROR: The character set conversion failed (only for pr29_8() and pr29_8z()).

Return value: Returns a pointer to a statically allocated string containing a description of the error with the return code rc.