regexps.com
Hash values and hash tables are useful in many programs. The Hackerlab C Library provides:
hash utilities -- functions for computing hash values in common situations. Hash functions for strings can be found elsewhere. (For strings of 8-bit characters, see Computing Hash Values From Strings. For Unicode strings, see XREF.)
an md5 implementation -- functions to compute an md5 digest of an arbitrary string of bytes.
hash tables -- a general-purpose implementation of variably sized hash tables.
#include <hackerlab/hash/hash-utils.h>
The functions in this section provide tools useful for computing hash values.
unsigned long hash_ul (unsigned long n);
Generate a hash value from an integer.
This function is slow, but attempts to give a good distribution of
hash values even for a series of n
which are not particularly
random.
slow
means that the function does rougly 3 * sizeof (n)
array
look-ups and lots of bit twiddling.
unsigned long hash_pointers (void * elts, size_t n_elts);
Compute a hash value from an array of pointers.
This function is slow, but attempts to give a good distribution of hash values even for a series of pointers which are not particularly random. Usually, pointers are not particularly random.
slow
means that the function does roughly 3 * sizeof (n)
array
look-ups and lots of bit twiddling, per pointer.
unsigned long hash_mem (t_uchar * elts, size_t n_bytes);
Compute a hash value from an array of bytes.
This function is slow, but attempts to give a good distribution of hash values even for a series of bytes which are not particularly random.
slow
means that the function does roughly 3 * sizeof (n)
array
look-ups and lots of bit twiddling, per sizeof (unsigned long)
bytes.
The md5 routines allow you to compute an MD5 message digest
According to the definition of MD5 in RFC 1321
from April 1992
.
md5_context_t make_md5_context (alloc_limits limits);
Allocate and initialize an object which will keep track of the state of an md5 digest computation.
void md5_context_reset (md5_context_t ctx);
Reinitialize an md5 state object. This will
undo the effects of any previous calls to
md5_scan
.
void free_md5_context (alloc_limits limits, md5_context_t ctx);
Free all resources associated with an md5 state object.
void md5_scan (md5_context_t hd, t_uchar * inbuf, size_t inlen);
Scan the next inlen
bytes of inbuf
, treating them as subsequent
bytes in a message for which we are computing an md5 digest.
This function may be called repeatedly on sequential "bursts" of a total message.
void md5_final (t_uchar * result, md5_context_t state);
Declare that a complete message has been scanned using
state
and md5_scan()
.
Return the 16-byte md5 digest in result
, which must point to
storage for at least 16
bytes.
As a side-effect, state
is reinitialized and may be used
again with md5_scan ()
to process a new message.
The hashtree library implements in-core hash tables that are automatically dynamically resized. Callers are given complete control over memory allocation of hash table data structures.
A hash tree is an associative data structure mapping
keys
to
values
. Both keys and values are stored as pointers of type
void *
and may point to any type of value.
When creating a new hash table, programs specify how parts of the
hash table will be allocated and freed. For example, programs
might simply use malloc
and free
, or they might use allocation
from a size-limited pool, or they might arrange so that parts of
the hashtree don't have to be allocated at all because they are
stored in hashtree keys.
Programs also specify, in advance, how keys are compared for equality.
Programs do not specify in advance how hash values are computed for
keys. Instead, when storing or looking up keys in a hash table,
programs first compute a hash value for the key and then call
either hashtree_find
or hashtree_store
. Hash values are
unsigned long
integers.
Internally, hashtrees are stored as trees. Each tree node has 16
children. The children of leaf nodes are lists of key/value pairs.
At each level of the tree, four bits from the hash value are used
to select a child. The minimum depth of the tree is two: a root
node at level 0
, lists of key/value pairs at level 1
. The maximum
depth of the tree is:
(2 * sizeof (unsigned long))
If a list of key/value pairs grows too long (more than 5
elements),
and the node containing that list is not a maximum-depth node, the
node
overflows
. When overflow occurs, the leaf node is made an
internal node with 16
subtrees. Each of those subtrees is a new
leaf node. All of the key/value pairs from the overflowing node
are redistributed among the new leaf nodes.
This style of hash tree makes reasonably efficient use of memory for both small and moderately large numbers of key/value pairs. Access times are determined by the depth of the tree and are usually limited by the maximum depth. In the exceptional circumstance that a maximum-depth node contains a very large number of key/value pairs (indicating a poor distribution of hash values), access times for keys in that node grow linearly with the number of keys in that node.
The cost of adding an element (possibly causing overflow) is most
commonly the same as the cost of adding an element. The cost of
overflow is determined by the number of key/value pairs that must
be redistributed. That number is limited by the number of
key/value pairs in a leaf node at not-maximal depth (64
== 16
lists
of key/value pairs * 4
pairs per list). Once again, in the
exceptional circumstance of a maximum depth node with many
key/value pairs, the cost of inserting a key grows linearly with
the number of keys in that node.
WARNINGS:
Hashtrees are not a panacea.
If you know advance roughly how many key/value pairs a hash table will contain, you can obtain better memory use and better access times by using a flat, fixed-size hash table.
Hashtrees are most useful when the number of key/value pairs may vary over a wide range, when it is important that tables with only a few key/value pairs remain small, and when the cost of adding an element (possibly causing overflow), must remain small.
Hashtrees are also a reasonable default choice when good hash-table performance is desirable but optimal performance is not necessary. In most situations, hashtrees will give at least good performance.
struct hashtree;
A struct hashtree
represents a hash table. It is an opaque
structure that should be allocated by hashtree_new
or else
initialized by being filled with 0
bytes.
struct hashtree_item
A struct hashtree_item
represents one key-value pair stored in a
hash table. It contains (at least) the fields:
void * key; void * binding;
It is safe to modify either field, but modifications must not change either the hash value of the key or its equality to other keys.
struct hashtree_rules;
A struct hashtree_rules
contains function pointers, and an
allocation limits.
It has these fields, in this order:
// a function to compare keye: hashtree_eq_fn eq;
// a function to allocate tree nodes hash_alloc_fn hash_alloc;
// a function to free tree nodes: free_hashtree_fn free_hash;
// a function to allocate key/value pairs: hashtree_alloc_item_fn hash_item_alloc;
// a function to free key/value pairs: free_hashtree_item_fn free_hash_item;
// allocation limits that apply to hash tables // using these rules: alloc_limits limits;
typedef int (*hashtree_eq_fn) (void * key1, void * key2, struct hashtree_rules * rules);
hashtree_eq_fn
compares two keys for equality. It returns 1
if they
are equal, 0
otherwise.
typedef struct hashtree * (*hashtree_alloc_fn) (struct hashtree_rules * rules);
hashtree_alloc_fn
allocates a new hash table tree node. Hash tables
are nested to form trees; this function allocates one node of such
a tree. This function may return 0
if allocation fails.
typedef void (*free_hashtree_fn) (struct hashtree * node, struct hashtree_rules * rules);
free_hashtree_fn
frees an empty hash table tree node.
typedef struct hashtree_item * (*hashtree_alloc_item_fn) (void * key, struct hashtree_rules * rules);
hashtree_alloc_item_fn
allocates a hash table item (key/value
pair) for the indicated key. It may return 0
if allocation fails.
hashtree_alloc_item_fn
must fill in the key
and binding
fields of the struct hashtree_item
that it returns. Typically,
the field key
is set equal to the parameter key
, and the field
binding
is initialized to 0
, indicating that the key initially
has no binding.
Sometimes, if a key will be stored in at most one hashtree, a
useful optimization is to store a struct hashtree_item
within
each key. In that case, hashtree_alloc_item_fn
doesn't have to
allocate memory at all: it can return a pointer to the struct
hashtree_item
in the key.
typedef void (*free_hashtree_item_fn) (struct hashtree_item * node, struct hashtree_rules * rules);
free_hashtree_item_fn
frees a hash table item (key/value pair).
The field limits
points to allocation limits which are used
by the default implementations of these functions. By convention,
non-default implementations should also use limits
when performing
allocations.
Defaults are provided for the functions in a struct
hashtree_rules
. If any particular function pointer is 0
or if the
struct hashtree_rules *
passed to hashtree_alloc
is 0
, the
default implementations are used.
The default functions perform allocations by calling lim_malloc
with the allocation limits stored in the field limits
. (See
Allocation With Limitations.) If limits
is 0
, or if the
struct hashtree_rules *
pointer is 0
, the allocation limits
lim_use_must_malloc
is used. In that case, if allocation fails,
the process is exitted by calling panic
.
The default implementation of eq
compares pointers to keys using
==
.
struct hashtree_rules * hashtree_make_rules (alloc_limits limits);
Allocate a new hashtree_rules
structure. (See
hashtree_rules.)
The limits
field of the allocated structure is initialized to limits
.
The function pointers in the allocated structure point to
default implementations. Those implementations will use limits
when performing allocations.
If an allocation failure occurs, this function returns 0
.
void hashtree_init_rules (struct hashtree_rules * rules, hashtree_eq_fn eq, hashtree_alloc_fn hash_alloc, free_hashtree_fn free_hash, hashtree_alloc_item_fn hash_item_alloc, free_hashtree_item_fn free_hash_item, alloc_limits limits);
Initialize a struct hashtree_rules
structure.
It is also possible to initialize a rules structure by filling in
the fields directly. The advantage of using hashtree_init_rules
is that all fields must be specified -- omitting a field will cause
a compilation error.
Function
hashtree_free_limit_rules
void hashtree_free_limit_rules (struct hashtree_rules * rules);
Free storage of a struct hashtree_rules
allocated by
hashtree_make_rules
.
struct hashtree * hashtree_new (struct hashtree_rules * rules);
Allocate a new hash table.
rules
specifies how memory is allocated for this hash tree, and
how keys are compared for equality. rules
may be 0
, in which
case default implementations are used. See Types for Hashtrees.
If an allocation failure occurs, this function returns 0
.
void hashtree_free (struct hashtree * it, hashtree_free_data_fn freefn, struct hashtree_rules * rules);
Free all storage allocated to a hash table, including the struct
hashtree
itself. (See hashtree_free_static
.)
The freefn
is:
typedef void (*hashtree_free_data_fn) (struct hashtree_item * it, struct hashtree_rules * rules);
When called, it should free any storage associated with
the key
and binding
fields of it
.
void hashtree_free_static (struct hashtree * tab, hashtree_free_data_fn freefn, struct hashtree_rules * rules);
Free all storage allocated to a hash table, but not the struct
hashtree
itself. (See hashtree_free
.)
struct hashtree_item * hashtree_find (struct hashtree * table, unsigned long hash, void * key, struct hashtree_rules * rules);
Search for an entry for key
in hash table table
and return the
hash table item for that key (or 0
). hash
is the hash value for
key
.
struct hashtree_item * hashtree_store (struct hashtree * table, unsigned long hash, void * key, struct hashtree_rules * rules);
Ensure that there is an entry for key
in hash table table
and
return the hash table item for that key. hash
is the hash value
for key
.
If key
is not already present in table
, this function uses
rules->hash_item_alloc
to create a new key/value pair.
If an allocation failure occurs, this function returns 0
.
void hashtree_delete (struct hashtree_item * it, struct hashtree_rules * rules);
Remove hash table item it
from its hash table.
To remove a particular key, use hashtree_find
to retrieve
its key/value pair, and hashtree_delete
to remove that pair
from the tree.
This function does not free any storage assocated with the key or the binding.
This function does call rules->free_hash_item
to free it
.
regexps.com