NAME cyrillic - Library for fast and easy cyrillic text manipulation SYNOPSIS use cyrillic qw/866 win2dos convert locase upcase detect/; print convert( 866, 1251, $str ); print convert( 'dos','win', \$str ); print win2dos $str; DESCRIPTION This module includes cyrillic string converting functions from one and to another charset, to upper and to lower case without locale switching. Also included single-byte charsets detection routine. It is easy to add new code pages. For this purpose it is necessary only to add appropriate string of a code page. Supported charsets: ibm866, koi8-r, cp855, windows-1251, MacWindows, iso_8859-5, unicode, utf8; If the first imported parameter - number of a code page, then locale will be switched to it. FUNCTIONS * convert - between charsets convertor * upcase - convert to upper case * locase - convert to lower case * upfirst - convert first char to upper case * lofirst - convert first char to lower case * detect - detect codepage number * charset - returns charset name for codepage number At importing list also might be listed named convertors. For Ex.: use cyrillic qw/dos2win win2koi mac2dos ibm2dos/; NOTE! Specialisations (like win2dos, utf2win) call faster then convert. NOTE! Only convert function and they specialisation work with Unicode and UTF-8 strings. All others function work only with single-byte sharsets. Names for using in named charset convertors: dos ibm866 866 koi koi8-r 20866 ibm cp855 855 win windows-1251 1251 mac ms-cyrillic 10007 iso iso_8859-5 28585 uni Unicode utf UTF-8 The following rules are correct for converting functions: VAR may be SCALAR or REF to SCALAR. If VAR is REF to SCALAR then SCALAR will be converted. If VAR is ommited then $_ operated. If function called to void context and VAR is not REF then result placed to $_. CONVERSION METHODS convert SRC_CP, DST_CP, [VAR] Convert VAR from SRC_CP codepage to DST_CP codepage and returns converted string. The converting Unicode or UTF-8 data requires presence of installed Unicode::String and Unicode::Map. upcase CODEPAGE, [VAR] Convert VAR to uppercase using CODEPAGE table and returns converted string. locase CODEPAGE, [VAR] Convert VAR to lowercase using CODEPAGE table and returns converted string. upfirst CODEPAGE, [VAR] Convert first char of VAR to uppercase using CODEPAGE table and returns converted string. lofirst CODEPAGE, [VAR] Convert first char of VAR to lowercase using CODEPAGE table and returns converted string. MAINTAINANCE METHODS charset CODEPAGE Returns charset name for CODEPAGE. detect ARRAY Detect single-byte codepage of data in ARRAY and returns codepage number. If codepage not detected then returns undefined value; EXAMPLES use cyrillic qw/convert locase upcase detect dos2win win2dos/; $\ = "\n"; $_ = "\x8F\xE0\xA8\xA2\xA5\xE2 \xF0\xA6\x88\xAA\x88!"; print; upcase 866; print; dos2win; print; win2dos; print; locase 866; print; print detect $_; # CONVERTING TEST: use cyrillic qw/utf2dos mac2utf dos2mac win2dos utf2win/; $_ = "Хелло Ворльд!\n"; print "UTF-8: $_"; print " DOS: ", utf2dos mac2utf dos2mac win2dos utf2win $_; # EQVIVALENT CALLS: dos2win( $str ); # called to void context -> result placed to $_ $_ = dos2win( $str ); dos2win( \$str ); # called with REF to string -> direct converting $str = dos2win( $str ); dos2win(); # with ommited param called -> $_ converted dos2win( \$_ ); $_ = dos2win( $_ ); # FOR EASY SWITCH LOCALE CODEPAGE use cyrillic qw/866/; # locale switched to Russian_Russia.866 use locale; print $str =~ /(\w+)/; no locale; print $str =~ /(\w+)/; FAQ * Q: Why module say: Can't create Unicode::Map for 'koi8-r' charset! A: Your Unicode::Map module can't find map file for 'koi8-r' charset. In Unicode::Map manual is told whence it is possible to download this file and as it to install in the system. * Q: Why perl say: "Undefined subroutine koi2win called" ? A: The function B is specialization of the function B, which is created at inclusion it of the name in the list of import. AUTHOR Albert MICHEEV COPYRIGHT Copyright (C) 2000, Albert MICHEEV This module is free software; you can redistribute it or modify it under the same terms as Perl itself. AVAILABILITY The latest version of this library is likely to be available from: http://www.perl.com/CPAN SEE ALSO Unicode::String, Unicode::Map.