NAME Grapheme::Ngram - n-grams of Unicode Extended Grapheme Clusters SYNOPSIS use Grapheme::Ngram; my $class = 'Grapheme::Ngram'; my @ngrams = $class->ngram($string,$width); DESCRIPTION For many applications it's better to work along graphemes. Building n-grams is one of them. METHODS new $object = Grapheme::Ngram->new(); ngram my @ngram = $object->ngram($string, $width); $string ... string of characters $width ... length of the resulting tokens. Default is 1. Returns an empty list if the number of graphemes in $string is lower than $width, or if $width is not an integer larger than 0. from_tokens my @ngram = $object->from_tokens(\@tokens, $width); Same as "ngram" but takes tokens. This method is used by "ngram". This allows to use a custom tokenizer for e.g. treating 'sh' also as grapheme: my @tokens = $string =~ m/(Sh|sh|\X)/g; _tokenize my @graphemes = $object->_tokenize($string); This internal method splits $string into a list of graphemes. SOURCE REPOSITORY AUTHOR Helmut Wollmersdorfer, COPYRIGHT AND LICENSE Copyright (C) 2014 by Helmut Wollmersdorfer This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.