The Hackerlab at regexps.com

Hashing

up: libhackerlab
next: Eight-bit Characters and Strings
prev: Bitsets

Hash values and hash tables are useful in many programs. The Hackerlab C Library provides:

hash utilities -- functions for computing hash values in common situations. Hash functions for strings can be found elsewhere. (For strings of 8-bit characters, see Computing Hash Values From Strings. For Unicode strings, see XREF.)

an md5 implementation -- functions to compute an md5 digest of an arbitrary string of bytes.

hash tables -- a general-purpose implementation of variably sized hash tables.


Hash Utilities

up: Hashing
next: MD5 Routines


#include <hackerlab/hash/hash-utils.h>

The functions in this section provide tools useful for computing hash values.

Function hash_ul

unsigned long hash_ul (unsigned long n);

Generate a hash value from an integer.

This function is slow, but attempts to give a good distribution of hash values even for a series of n which are not particularly random.

slow means that the function does rougly 3 * sizeof (n) array look-ups and lots of bit twiddling.



Function hash_pointers

unsigned long hash_pointers (void * elts, size_t n_elts);

Compute a hash value from an array of pointers.

This function is slow, but attempts to give a good distribution of hash values even for a series of pointers which are not particularly random. Usually, pointers are not particularly random.

slow means that the function does roughly 3 * sizeof (n) array look-ups and lots of bit twiddling, per pointer.



Function hash_mem

unsigned long hash_mem (t_uchar * elts, size_t n_bytes);

Compute a hash value from an array of bytes.

This function is slow, but attempts to give a good distribution of hash values even for a series of bytes which are not particularly random.

slow means that the function does roughly 3 * sizeof (n) array look-ups and lots of bit twiddling, per sizeof (unsigned long) bytes.




MD5 Routines

up: Hashing
next: Hash Table Trees
prev: Hash Utilities

The md5 routines allow you to compute an MD5 message digest According to the definition of MD5 in RFC 1321 from April 1992 .

Function make_md5_context

md5_context_t make_md5_context (alloc_limits limits);

Allocate and initialize an object which will keep track of the state of an md5 digest computation.



Function md5_context_reset

void md5_context_reset (md5_context_t ctx);

Reinitialize an md5 state object. This will undo the effects of any previous calls to md5_scan .



Function free_md5_context

void free_md5_context (alloc_limits limits, md5_context_t ctx);

Free all resources associated with an md5 state object.



Function md5_scan

void md5_scan (md5_context_t hd, t_uchar * inbuf, size_t inlen);

Scan the next inlen bytes of inbuf , treating them as subsequent bytes in a message for which we are computing an md5 digest.

This function may be called repeatedly on sequential "bursts" of a total message.



Function md5_final

void md5_final (t_uchar * result, md5_context_t state);

Declare that a complete message has been scanned using state and md5_scan() .

Return the 16-byte md5 digest in result , which must point to storage for at least 16 bytes.

As a side-effect, state is reinitialized and may be used again with md5_scan () to process a new message.




Hash Table Trees

up: Hashing
prev: MD5 Routines

The hashtree library implements in-core hash tables that are automatically dynamically resized. Callers are given complete control over memory allocation of hash table data structures.


Hashtree Data Structures

up: Hash Table Trees
next: Creating Hashtree Rules


How Hashtrees Work
up: Hashtree Data Structures
next: Types for Hashtrees

A hash tree is an associative data structure mapping keys to values . Both keys and values are stored as pointers of type void * and may point to any type of value.

When creating a new hash table, programs specify how parts of the hash table will be allocated and freed. For example, programs might simply use malloc and free , or they might use allocation from a size-limited pool, or they might arrange so that parts of the hashtree don't have to be allocated at all because they are stored in hashtree keys.

Programs also specify, in advance, how keys are compared for equality.

Programs do not specify in advance how hash values are computed for keys. Instead, when storing or looking up keys in a hash table, programs first compute a hash value for the key and then call either hashtree_find or hashtree_store . Hash values are unsigned long integers.

Internally, hashtrees are stored as trees. Each tree node has 16 children. The children of leaf nodes are lists of key/value pairs. At each level of the tree, four bits from the hash value are used to select a child. The minimum depth of the tree is two: a root node at level 0 , lists of key/value pairs at level 1 . The maximum depth of the tree is:

     (2 * sizeof (unsigned long))

If a list of key/value pairs grows too long (more than 5 elements), and the node containing that list is not a maximum-depth node, the node overflows . When overflow occurs, the leaf node is made an internal node with 16 subtrees. Each of those subtrees is a new leaf node. All of the key/value pairs from the overflowing node are redistributed among the new leaf nodes.

This style of hash tree makes reasonably efficient use of memory for both small and moderately large numbers of key/value pairs. Access times are determined by the depth of the tree and are usually limited by the maximum depth. In the exceptional circumstance that a maximum-depth node contains a very large number of key/value pairs (indicating a poor distribution of hash values), access times for keys in that node grow linearly with the number of keys in that node.

The cost of adding an element (possibly causing overflow) is most commonly the same as the cost of adding an element. The cost of overflow is determined by the number of key/value pairs that must be redistributed. That number is limited by the number of key/value pairs in a leaf node at not-maximal depth (64 == 16 lists of key/value pairs * 4 pairs per list). Once again, in the exceptional circumstance of a maximum depth node with many key/value pairs, the cost of inserting a key grows linearly with the number of keys in that node.

WARNINGS:

Hashtrees are not a panacea.

If you know advance roughly how many key/value pairs a hash table will contain, you can obtain better memory use and better access times by using a flat, fixed-size hash table.

Hashtrees are most useful when the number of key/value pairs may vary over a wide range, when it is important that tables with only a few key/value pairs remain small, and when the cost of adding an element (possibly causing overflow), must remain small.

Hashtrees are also a reasonable default choice when good hash-table performance is desirable but optimal performance is not necessary. In most situations, hashtrees will give at least good performance.


Types for Hashtrees
up: Hashtree Data Structures
prev: How Hashtrees Work

Type hashtree

struct hashtree;

A struct hashtree represents a hash table. It is an opaque structure that should be allocated by hashtree_new or else initialized by being filled with 0 bytes.



Type hashtree_item

struct hashtree_item

A struct hashtree_item represents one key-value pair stored in a hash table. It contains (at least) the fields:

     void * key;
     void * binding;

It is safe to modify either field, but modifications must not change either the hash value of the key or its equality to other keys.



Type hashtree_rules

struct hashtree_rules;

A struct hashtree_rules contains function pointers, and an allocation limits.

It has these fields, in this order:

  // a function to compare keye:
  hashtree_eq_fn eq;

  // a function to allocate tree nodes
  hash_alloc_fn hash_alloc;

  // a function to free tree nodes:
  free_hashtree_fn free_hash;

  // a function to allocate key/value pairs:
  hashtree_alloc_item_fn hash_item_alloc;

  // a function to free key/value pairs:
  free_hashtree_item_fn free_hash_item;

  // allocation limits that apply to hash tables
  // using these rules:
  alloc_limits limits;

typedef int
(*hashtree_eq_fn) (void * key1,
                   void * key2,
                   struct hashtree_rules * rules);


hashtree_eq_fn compares two keys for equality. It returns 1 if they are equal, 0 otherwise.

typedef struct hashtree * 
(*hashtree_alloc_fn) (struct hashtree_rules * rules);

hashtree_alloc_fn allocates a new hash table tree node. Hash tables are nested to form trees; this function allocates one node of such a tree. This function may return 0 if allocation fails.

typedef void
(*free_hashtree_fn) (struct hashtree * node,
                     struct hashtree_rules * rules);

free_hashtree_fn frees an empty hash table tree node.

typedef struct hashtree_item * 
(*hashtree_alloc_item_fn) (void * key, struct hashtree_rules * rules);

hashtree_alloc_item_fn allocates a hash table item (key/value pair) for the indicated key. It may return 0 if allocation fails.

hashtree_alloc_item_fn must fill in the key and binding fields of the struct hashtree_item that it returns. Typically, the field key is set equal to the parameter key , and the field binding is initialized to 0 , indicating that the key initially has no binding.

Sometimes, if a key will be stored in at most one hashtree, a useful optimization is to store a struct hashtree_item within each key. In that case, hashtree_alloc_item_fn doesn't have to allocate memory at all: it can return a pointer to the struct hashtree_item in the key.

typedef void
(*free_hashtree_item_fn) (struct hashtree_item * node,
                          struct hashtree_rules * rules);

free_hashtree_item_fn frees a hash table item (key/value pair).

The field limits points to allocation limits which are used by the default implementations of these functions. By convention, non-default implementations should also use limits when performing allocations.

Defaults are provided for the functions in a struct hashtree_rules . If any particular function pointer is 0 or if the struct hashtree_rules * passed to hashtree_alloc is 0 , the default implementations are used.

The default functions perform allocations by calling lim_malloc with the allocation limits stored in the field limits . (See Allocation With Limitations.) If limits is 0 , or if the struct hashtree_rules * pointer is 0 , the allocation limits lim_use_must_malloc is used. In that case, if allocation fails, the process is exitted by calling panic .

The default implementation of eq compares pointers to keys using == .




Creating Hashtree Rules

up: Hash Table Trees
next: Hash Tree Allocation
prev: Hashtree Data Structures

Function hashtree_make_rules

struct hashtree_rules * hashtree_make_rules (alloc_limits limits);

Allocate a new hashtree_rules structure. (See hashtree_rules.)

The limits field of the allocated structure is initialized to limits .

The function pointers in the allocated structure point to default implementations. Those implementations will use limits when performing allocations.

If an allocation failure occurs, this function returns 0 .



Function hashtree_init_rules

void hashtree_init_rules (struct hashtree_rules * rules,
                          hashtree_eq_fn eq,
                          hashtree_alloc_fn hash_alloc,
                          free_hashtree_fn free_hash,
                          hashtree_alloc_item_fn hash_item_alloc,
                          free_hashtree_item_fn free_hash_item,
                          alloc_limits limits);

Initialize a struct hashtree_rules structure.

It is also possible to initialize a rules structure by filling in the fields directly. The advantage of using hashtree_init_rules is that all fields must be specified -- omitting a field will cause a compilation error.



Function hashtree_free_limit_rules

void hashtree_free_limit_rules (struct hashtree_rules * rules);

Free storage of a struct hashtree_rules allocated by hashtree_make_rules .




Hash Tree Allocation

up: Hash Table Trees
next: Hash Tree Access
prev: Creating Hashtree Rules

Function hashtree_new

struct hashtree * hashtree_new (struct hashtree_rules * rules);

Allocate a new hash table.

rules specifies how memory is allocated for this hash tree, and how keys are compared for equality. rules may be 0 , in which case default implementations are used. See Types for Hashtrees.

If an allocation failure occurs, this function returns 0 .



Function hashtree_free

void hashtree_free (struct hashtree * it,
                    hashtree_free_data_fn freefn,
                    struct hashtree_rules * rules);

Free all storage allocated to a hash table, including the struct hashtree itself. (See hashtree_free_static .)

The freefn is:

  typedef void
  (*hashtree_free_data_fn) (struct hashtree_item * it,
                            struct hashtree_rules * rules);

When called, it should free any storage associated with the key and binding fields of it .



Function hashtree_free_static

void hashtree_free_static (struct hashtree * tab,
                           hashtree_free_data_fn freefn,
                           struct hashtree_rules * rules);

Free all storage allocated to a hash table, but not the struct hashtree itself. (See hashtree_free .)




Hash Tree Access

up: Hash Table Trees
prev: Hash Tree Allocation

Function hashtree_find

struct hashtree_item * hashtree_find (struct hashtree * table,
                                      unsigned long hash,
                                      void * key,
                                      struct hashtree_rules * rules);

Search for an entry for key in hash table table and return the hash table item for that key (or 0 ). hash is the hash value for key .



Function hashtree_store

struct hashtree_item * hashtree_store (struct hashtree * table,
                                       unsigned long hash,
                                       void * key,
                                       struct hashtree_rules * rules);

Ensure that there is an entry for key in hash table table and return the hash table item for that key. hash is the hash value for key .

If key is not already present in table , this function uses rules->hash_item_alloc to create a new key/value pair.

If an allocation failure occurs, this function returns 0 .



Function hashtree_delete

void hashtree_delete (struct hashtree_item * it,
                      struct hashtree_rules * rules);

Remove hash table item it from its hash table.

To remove a particular key, use hashtree_find to retrieve its key/value pair, and hashtree_delete to remove that pair from the tree.

This function does not free any storage assocated with the key or the binding.

This function does call rules->free_hash_item to free it .



libhackerlab: The Hackerlab C Library
The Hackerlab at regexps.com