Encode::Arabic::ArabTeX - Perl extension for multi-purpose processing of the ArabTeX notation of Arabic |
Encode::Arabic::ArabTeX - Perl extension for multi-purpose processing of the ArabTeX notation of Arabic
$Revision: 1.18 $ $Date: 2003/09/08 14:43:14 $
use Encode::Arabic::ArabTeX; # imports just like 'use Encode' would, plus extended options
while ($line = <>) { # maps the ArabTeX notation for Arabic into the Arabic script
print encode 'utf8', decode 'arabtex', $line; # 'arabtex' alias 'ArabTeX' }
# ArabTeX lower ASCII transliteration <--> Arabic script in Perl's internal format
$string = decode 'ArabTeX', $octets; $octets = encode 'ArabTeX', $string;
Encode::Arabic::ArabTeX->encoder('dump' => '!./encoder.code'); # dump the encoder engine to file Encode::Arabic::ArabTeX->decoder('load'); # load the decoder engine from module's extra sources
ArabTeX is an excellent extension to TeX/LaTeX designed for typesetting the right-to-left scripts of the Orient. It comes up with very intuitive and comprehensible lower ASCII transliterations, the expressive power of which is even better than that of the scripts.
Encode::Arabic::ArabTeX implements the rules needed for proper interpretation of the ArabTeX notation of Arabic. The conversion ifself is done by Encode::Mapper, and the user interface is built on the Encode::Encoding module.
Since the ArabTeX notation is not a simple mapping to the graphemes of the Arabic script, encoding the script into the notation is ambiguous. Two different strings in the notation may correspond to identical strings in the script. Heuristics must be engaged to decide which of the representations is more appropriate.
Together with this bottle-neck, encoding may not be perfectly invertible by the decode operation, due to over-generation or approximations in the encoding algorithm.
There are situations where conversion from the Arabic script to the ArabTeX notation is still convenient and useful. Imagine you need to edit the data, enhance it with vowels or other diacritical marks, produce phonetic transcripts and trim the typography of the script ... Do it in the ArabTeX notation, having an unrivalled control over your acts!
Nonetheless, encoding is not the very purpose for this module's existence ;)
The module decodes the ArabTeX notation as defined in the User Manual Version 3.09 of July 22, 1999, ftp://ftp.informatik.uni-stuttgart.de/pub/arabtex/doc/arabdoc.pdf. The implementation uses three levels of Encode::Mapper engines to decode the notation:
<'>
into the verbatim encoding of the relevant carrier.
This level of processing can become optional, if people ever need to encode the hamza carriers explicitly.
[ "|", "" ], # ArabTeX's "invisible consonant" [ "T", "\x{0629}" ], # ta' marbu.ta
[ "p", "\x{067E}" ], # pa' [ "v", "\x{06A4}" ], # va' [ "g", "\x{06AF}" ], # gaf
[ "c", "\x{0681}" ], # .ha with hamza [ "^c", "\x{0686}" ], # gim with three [ ",c", "\x{0685}" ], # _ha with three [ "^z", "\x{0698}" ], # zay with three [ "^n", "\x{06AD}" ], # kaf with three [ "^l", "\x{06B5}" ], # lam with a bow above [ ".r", "\x{0695}" ], # ra' with a bow below
There are many nice features in the notation, like assimilation, gemmination, hyphenation, all implemented here. Defective and historical writings of vowels are supported, too! Try yourself if your fonts can handle these ;)
There are modes and options in ArabTeX that have not been dealt with yet in
Encode::Arabic::ArabTeX, still, mutual consistency of the systems is very high. This
release does not support vowel quoting and works in the ArabTeX's \fullvocalize
mode only. The inconvenience
will be made up for in the forthcoming versions. Regular expression substitution will be helpful to the user by
then.
The module exports as if use Encode
also appeared in the package. The import
options, except for the
first-place :xml
, are just delegated to Encode and imports performed properly.
If the first element in the list to use
is :xml
, all XML markup, or rather any data enclosed in the
well-paired and non-nested angle brackets <
and >
, will be preserved. Properties of the
Encode::Arabic::ArabTeX engines can be generally controlled through the
Encode::Mapper API.
Initialization of the engines takes place the first time they are used, unless they have already been defined. There are two explicit methods for it:
--dump
and --load
options have some
experimental meaning.
encoder
.
These methods will be refined in the future, becoming the interface for miscelaneous settings etc.
Encode::Arabic, Encode::Mapper, Encode::Encoding, Encode
ArabTeX system ftp://ftp.informatik.uni-stuttgart.de/pub/arabtex/arabtex.htm
Klaus Lagally http://www.informatik.uni-stuttgart.de/ifi/bs/people/lagall_e.htm
External Tools Not Only for ArabTeX Documents http://ufal.mff.cuni.cz/publications/year2002/FLM2002.zip
Arabeyes Arabic Unix Project http://www.arabeyes.org
Otakar Smrz, http://ckl.mff.cuni.cz/smrz/
eval { 'E<lt>' . 'smrz' . "\x40" . ( join '.', qw 'ckl mff cuni cz' ) . 'E<gt>' }
Perl is also designed to make the easy jobs not that easy ;)
Copyright 2003 by Otakar Smrz
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Encode::Arabic::ArabTeX - Perl extension for multi-purpose processing of the ArabTeX notation of Arabic |