author: | Pierre Nicodème |
---|---|
title: |
q
-gram analysis and urn models
|
keywords: | Sequence comparison, Bernoulli model, urn models |
abstract: |
Words of fixed size
q
are commonly referred to as
q
-grams. We consider the problem of
q
-gram filtration, a method commonly used to speed up
sequence comparison. We are interested in the statistics of
the number of
q
-grams common to two random texts (where
multiplicities are not counted) in the non uniform
Bernoulli model. In the exact and dependent model, when
omitting border effects, a
q
-gram in a random sequence depends on the
q-1
preceding
q
-grams. In an approximate and independent model, we
draw randomly a
q
-gram at each position, independently of the others
positions. Using ball and urn models, we analyze the
independent model. Numerical simulations show that this
model is an excellent first order approximation to the
dependent model. We provide an algorithm to compute the
moments.
|
If your browser does not display the abstract correctly (because of the different mathematical symbols) you may look it up in the PostScript or PDF files. | |
reference: |
Pierre Nicodème (2003),
q
-gram analysis and urn models, in Discrete Random
Walks, DRW'03, Cyril Banderier and Christian
Krattenthaler (eds.), Discrete Mathematics and
Theoretical Computer Science Proceedings AC, pp.
243-258
|
bibtex: | For a corresponding BibTeX entry, please consider our BibTeX-file. |
ps.gz-source: | dmAC0124.ps.gz (80 K) |
ps-source: | dmAC0124.ps (248 K) |
pdf-source: | dmAC0124.pdf (216 K) |
The first source gives you the `gzipped' PostScript, the second the plain PostScript and the third the format for the Adobe accrobat reader. Depending on the installation of your web browser, at least one of these should (after some amount of time) pop up a window for you that shows the full article. If this is not the case, you should contact your system administrator to install your browser correctly.
Due to limitations of your local software, the two formats may show up differently on your screen. If eg you use xpdf to visualize pdf, some of the graphics in the file may not come across. On the other hand, pdf has a capacity of giving links to sections, bibliography and external references that will not appear with PostScript.