com.sap.i18n.text
Class Utf16String

java.lang.Object
  |
  +--com.sap.i18n.text.Utf16String

public class Utf16String
extends java.lang.Object

Wrapper class for JDK class String
Purpose: handle surrogate pairs and combining characters
Offers alternatives for searching, accessing substrings and accessing characters.
The main rules are:

All methods are static and take the string to operate on as first parameter (pseudo "this" parameter).


Constructor Summary
Utf16String()
           
 
Method Summary
static int charAt(java.lang.String str, int index)
          Alternative to String.charAt to handle surrogate pairs properly.
static int codeunitAt(java.lang.String str, int index)
          Alternative to String.charAt.
static java.lang.String combinedCharAt(java.lang.String str, int index)
          Alternative to String.charAt to handle surrogate pairs and combined characters.
static int countCodeUnits(int ch)
          Determine number of code units (16-bit) of char ch (1 or 2).
static int indexOf(java.lang.String str, int ch)
          Alternative to String.indexOf to handle combining characters properly.
static int indexOf(java.lang.String str, java.lang.String substr)
          Alternative to String.indexOf that handles combining characters properly.
static boolean isCombiningChar(int ch)
          Check if character combines with previous character.
static boolean isCombiningCharAt(java.lang.String str, int index)
          Check if character at str[index] combines with previous character.
static int lastIndexOf(java.lang.String str, int ch)
          Alternative to String.lastIndexOf that handles combining characters properly.
static int lastIndexOf(java.lang.String str, java.lang.String substr)
          Alternative to String.lastIndexOf that handles combining characters properly.
static java.lang.String limitLength(java.lang.String str, int limit)
          Alternative to String.substring that avoids cutting surrogate pairs and combining characters.
static java.lang.String removePostfix(java.lang.String str, java.lang.String postfix)
          Alternative to String.substring that avoids cutting surrogate pairs and combining characters.
static java.lang.String removePrefix(java.lang.String str, java.lang.String prefix)
          Alternative to String.substring that avoids cutting of surrogate pairs and combining characters.
static char simplifiedCharAt(java.lang.String str, int index)
          Alternative to String.charAt.
static java.lang.String substringAfter(java.lang.String str, char delim)
          Alternative to String.substring that avoids cutting surrogate pairs and combining characters.
static java.lang.String substringAfter(java.lang.String str, java.lang.String delimstr)
          Alternative to String.substring that avoids cutting surrogate pairs and combining characters.
static java.lang.String substringBefore(java.lang.String str, char delim)
          Alternative to String.substring that avoids cutting surrogate pairs and combining characters.
static java.lang.String substringBefore(java.lang.String str, java.lang.String delimstr)
          Alternative to String.substring that avoids cutting surrogate pairs and combining characters.
static java.lang.String substringFixedFormat(java.lang.String str, int beginIndex, int endIndex)
          Alternative to String.substring.
static java.lang.String toLowerFirstChar(java.lang.String str)
          Converts the first character of str to lower case.
static java.lang.String toUpperFirstChar(java.lang.String str)
          Converts the first character of str to upper case.

coding sample old:
String getMethod = "get" + Character.toUpperCase( columnName.charAt(0) ) + columnName.substring(1); coding sample new:
String getMethod = "get" + Utf16String.toUpperFirstChar( columnName );
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Utf16String

public Utf16String()
Method Detail

indexOf

public static int indexOf(java.lang.String str,
                          int ch)
Alternative to String.indexOf to handle combining characters properly. A valid match is found if character ch is found, and it is not immediately followed by a combining character. Returns the first match.

indexOf

public static int indexOf(java.lang.String str,
                          java.lang.String substr)
Alternative to String.indexOf that handles combining characters properly. A match is found if string substr is found, and it is not immediately followed by a combining character. Returns the first match.

lastIndexOf

public static int lastIndexOf(java.lang.String str,
                              int ch)
Alternative to String.lastIndexOf that handles combining characters properly. A match is found if character ch is found, and it is not immediately followed by a combining character. Returns the last match.

lastIndexOf

public static int lastIndexOf(java.lang.String str,
                              java.lang.String substr)
Alternative to String.lastIndexOf that handles combining characters properly. A match is found if string substr is found, not immediately followed by a combining character. returns the last match.

isCombiningCharAt

public static boolean isCombiningCharAt(java.lang.String str,
                                        int index)
Check if character at str[index] combines with previous character.
See Also:
isCombiningChar(int)

isCombiningChar

public static boolean isCombiningChar(int ch)
Check if character combines with previous character.
See Also:
isCombiningCharAt(java.lang.String, int)

removePrefix

public static java.lang.String removePrefix(java.lang.String str,
                                            java.lang.String prefix)
Alternative to String.substring that avoids cutting of surrogate pairs and combining characters. If str starts with prefix, return str without prefix, otherwise return null

coding sample old:
 if ( name.startsWith("environment.") ) {
   env.setProperty( Name.substring("environment.".length() ),
                    System.getProperty(name) );
 }
 
coding sample new:
 String prop = Utf16String.removePrefix( name, "environment." );
 if( prop != null ) {
   env.setProperty( prop, System.getProperty(name) );
 }
 

removePostfix

public static java.lang.String removePostfix(java.lang.String str,
                                             java.lang.String postfix)
Alternative to String.substring that avoids cutting surrogate pairs and combining characters. If str starts with postfix, return str without postfix, otherwise return null
See Also:
removePrefix(java.lang.String, java.lang.String)

substringBefore

public static java.lang.String substringBefore(java.lang.String str,
                                               char delim)
Alternative to String.substring that avoids cutting surrogate pairs and combining characters. If delim is found in str, return substring of str before delim, otherwise return null

coding sample old:
 if ( text.indexOf('|') >= 0 ) {
   text1 = text.substring( 0, text.indexOf('|') );
   text2 = text.substring( text.indexOf('|') + 1, text.length() );
 } else {
   text1 = text;
   text2 = "";
 }
 
coding sample new:
 text1 = Utf16String.substringBefore( text, '|' );
 text2 = Utf16String.substringAfter( text, '|' );
 if ( text1 == null ) {
   text1 = text;
   text2 = "";
 }
 

substringBefore

public static java.lang.String substringBefore(java.lang.String str,
                                               java.lang.String delimstr)
Alternative to String.substring that avoids cutting surrogate pairs and combining characters. If delimstr is found in str, return substring of str before delimstr, otherwise return null
See Also:
substringBefore( java.lang.String, char )

substringAfter

public static java.lang.String substringAfter(java.lang.String str,
                                              char delim)
Alternative to String.substring that avoids cutting surrogate pairs and combining characters. If delim is found in str, return substring of str ater delim, otherwise return null
See Also:
substringBefore( java.lang.String, char )

substringAfter

public static java.lang.String substringAfter(java.lang.String str,
                                              java.lang.String delimstr)
Alternative to String.substring that avoids cutting surrogate pairs and combining characters. If delimstr is found in str, return substring of str after delimstr, otherwise return null
See Also:
substringBefore( java.lang.String, char )

limitLength

public static java.lang.String limitLength(java.lang.String str,
                                           int limit)
Alternative to String.substring that avoids cutting surrogate pairs and combining characters. Cuts str that it has no more than limit characters. If the cut would split a surrogate pair or a combined character, the cut is made before the surrogate pair or combined character

coding sample old:
 if ( s.length() > maxLength ) {
   s = s.substring( 0, maxLength );
 }
 
coding sample new:
 s = limitLength( s, maxLength );
 

substringFixedFormat

public static java.lang.String substringFixedFormat(java.lang.String str,
                                                    int beginIndex,
                                                    int endIndex)
Alternative to String.substring. Marks explicitly that str has a fixed format and the substring operation is safe.

coding sample old:
 char ch = str.charAt(i);
 if ( ch < 0x20 || ch > 0x7e ) {
  String s = "0000" + Integer.toString(ch, 16);
  retval.append("\\u" + s.substring( s.length() - 4, s.length() ) );
 }
 
coding sample new:
 char ch = str.charAt(i);
 if ( ch < 0x20 || ch > 0x7e ) {
  String s = "0000" + Integer.toString(ch, 16);
  retval.append("\\u" +
    Utf16String.substringFixedFormat( s, s.length() - 4, s.length() ) );
 }
 

toUpperFirstChar

public static java.lang.String toUpperFirstChar(java.lang.String str)
Converts the first character of str to upper case.

coding sample old:
 String getMethod = "get" +
                    Character.toUpperCase( columnName.charAt(0) ) +
                    columnName.substring(1);
 
coding sample new:
 String getMethod = "get" + Utf16String.toUpperFirstChar( columnName );
 

toLowerFirstChar

public static java.lang.String toLowerFirstChar(java.lang.String str)
Converts the first character of str to lower case.
See Also:
toLowerFirstChar(java.lang.String)

charAt

public static int charAt(java.lang.String str,
                         int index)
Alternative to String.charAt to handle surrogate pairs properly. Returns Unicode character stored at position index as 32-bit value. May consume two 16-bit code units that build a surrogate pair.

coding sample old:
 for (int i = 0; i < s.length(); ++i ) {
   char ch = s.charAt( i );
   doSomethingWith( ch );
 }
 
coding sample new:
 int ch;
 for (int i = 0; i < s.length(); i+=Utf16String.countCodeUnits(ch) ) {
   ch = Utf16String.charAt( s, i );
   doSomethingWith( ch );
 }
 

countCodeUnits

public static int countCodeUnits(int ch)
Determine number of code units (16-bit) of char ch (1 or 2). Intended to be used togehter with UTF16String.charAt and does not take a string as parameter.
See Also:
charAt(java.lang.String, int)

combinedCharAt

public static java.lang.String combinedCharAt(java.lang.String str,
                                              int index)
Alternative to String.charAt to handle surrogate pairs and combined characters. Returns complete sequence of combined characters stored at position index.

simplifiedCharAt

public static char simplifiedCharAt(java.lang.String str,
                                    int index)
Alternative to String.charAt. If a "simple" character is stored at index returns this character, else returns Unicode replacement character U+FFFD. A "simple" character means, character that is not a surrogate pair and that does not combine with following combining characters.

coding sample old:
 for( int i=0; i < arg.length ; i++ ) {

   if( Utf16String.simplifiedCharAt(arg[i],0) == '-' ) {
     if( arg[i].length() < 2 ) continue;

     // known options
     switch( Utf16String.simplifiedCharAt( arg[i], 1 ) ) {

       case 'u': // user
         user = Utf16String.removePrefix( arg[i], "-u" );
         break;

       case 'd': // duration
         try {
           duration =
             Long.parseLong( Utf16String.removePrefix( arg[i], "-d" ) );
         } catch( NumberFormatException e ) {
           System.err.println( "Wrong duration specified: " + arg[i] );
         }
         break;

       default:
          System.err.println( "Unknown option: " + arg[i] );
          break;
     }

   // not an option
   } else {
     otherArgs.add( arg[i] );
   }

 } // for i < arg.length
 
coding sample new:
 for( int i=0; i < arg.length ; i++ ) {

   if( Utf16String.simplifiedCharAt(arg[i],0) == '-' ) {
     if( arg[i].length() < 2 ) continue;

     // known options
     switch( Utf16String.simplifiedCharAt( arg[i], 1 ) ) {

       case 'u': // user
         user = Utf16String.removePrefix( arg[i], "-u" );
         break;

       case 'd': // duration
         try {
           duration =
             Long.parseLong( Utf16String.removePrefix( arg[i], "-d" ) );
         } catch( NumberFormatException e ) {
           System.err.println( "Wrong duration specified: " + arg[i] );
         }
         break;

       default:
          System.err.println( "Unknown option: " + arg[i] );
          break;
     }

   // not an option
   } else {
     otherArgs.add( arg[i] );
   }

 } // for i < arg.length
 

codeunitAt

public static int codeunitAt(java.lang.String str,
                             int index)
Alternative to String.charAt. Marks explicitly that the 16-bit code unit stored at index is needed (not the Unicode character value, not a sequence of combined charactes).