NAME WordList - Word lists VERSION This document describes version 0.6.2 of WordList (from Perl distribution WordList), released on 2020-05-22. SYNOPSIS Use one of the "WordList::*" modules. DESCRIPTION "WordList::*" modules are modules that contain, well, list of words. This module, "WordList", serves as a base class and establishes convention for such modules. "WordList" is an alternative interface for Games::Word::Wordlist and "Games::Word::Wordlist::*". Its main difference is: "WordList::*" wordlists are read-only/immutable and the modules are designed to have low startup overhead. This makes them more suitable for use in CLI scripts which often only want to pick a word from one or several lists. Unless you are defining a dynamic wordlist (see below), words (or phrases) must be put in "__DATA__" section, one per line. Putting the wordlist in the "__DATA__" section relieves perl from having to parse the list during the loading of the module. To search for words or picking some random words from the list, the module also need not slurp the whole list into memory (and will not do so unless explicitly instructed). You must sort your words ascibetically (or by Unicode code point). Sorting makes it more convenient to diff different versions of the module, as well as performing binary search. If you have a different sort order other than ascibetical, you must set package variable $SORT with some true value (say, "frequency"). There must not be any duplicate entry in the word list. Dynamic and non-deterministic wordlist. A dynamic wordlist must set package variable $DYNAMIC to either 1 (deterministic) or 2 (non-deterministic). A dynamic wordlist does not put the wordlist in the DATA section; instead, user relies on "first_word()" + "next_word()", or "each_word()", or "all_words()" to get the list. A deterministic wordlist returns the same list everytime "each_word()" or "all_words()" is called. A non-deterministic list can return a different list for a different "each_word()" or "all_words()" call. See WordListRole::Dynamic::FirstNextResetFromEach and WordListRole::Dynamic::EachFromFirstNextReset if you want to write a dynamic wordlist module. It is possible for a dynamic list to return unordered or duplicate entries, but it is not encouraged. Parameterized wordlist. When instantiating a wordlist class instance, user can pass a list of key-value pairs as parameters. Normally only a dynamic wordlist would accept parameters. Parameters are defined in the %PARAMS package variable. It is a hash of parameter names as keys and parameter specification as values. Parameter specification follows function argument metadata specified in Rinci::function. DIFFERENCES WITH GAMES::WORD::WORDLIST Since this is a new and non-backward compatible interface from Games::Word::Wordlist, I also make some other changes: * Namespace is put outside "Games::" Because obviously word lists are not only useful for games. * Interface is simpler This is partly due to the list being read-only. The methods provided are just: - "pick" (pick one or several random entries) - "word_exists" (check whether a word is in the list) - "each_word" (run code for each entry) - "all_words" (return all the words in a list) A couple of other functions might be added, with careful consideration. * Namespace is more language-neutral and not English-centric METHODS new Usage: $wl = WordList::Module->new => obj Constructor. each_word Usage: $wl->each_word($code) Call $code for each word in the list. The code will receive the word as its first argument. If code return -2 will exit early. first_word Another way to iterate the word list is by calling "first_word" to get the first word, then "next_word" repeatedly until you get "undef". next_word Get the next word. See "first_word" for more details. reset_iterator Reset iterator. Basically "first_word" is equivalent to "reset_iterator" + "next_word". pick Usage: $wl->pick([ $n , [ $allow_duplicates ] ]) => list Pick $n (default: 1) random word(s) from the list, without duplicates (unless $allow_duplicates is set to true). If there are less then $n words in the list and duplicates are not allowed, only that many will be returned. The algorithm used is from perlfaq ("perldoc -q "random line""), which scans the whole list once (a.k.a. each_word() once). The algorithm is for returning a single entry and is modified to support returning multiple entries. word_exists Usage: $wl->word_exists($word) => bool Check whether $word is in the list. Algorithm in this implementation is linear scan (O(n)). Check out WordListRole::BinarySearch for an O(log n) implementation, or WordListRole::Bloom for O(1) implementation. all_words Usage: $wl->all_words() => list Return all the words in a list, in order. Note that if wordlist is very large you might want to use "each_word" instead to avoid slurping all words into memory. HOMEPAGE Please visit the project's homepage at . SOURCE Source repository is at . BUGS Please report any bugs or feature requests on the bugtracker website When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature. SEE ALSO "WordListRole::*" modules. "WordListMod::*" modules. "WordList::*" modules. Rinci. AUTHOR perlancar COPYRIGHT AND LICENSE This software is copyright (c) 2020, 2018, 2017, 2016 by perlancar@cpan.org. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.