The Hackerlab at regexps.com

Arrays

up: libhackerlab
next: Bitsets
prev: Low-level Memory Management

Dynamically allocated arrays are a common feature of C programs. The Hackerlab C library provides two array-like data structures: variable size arrays, and power-of-two size sparse arrays.

Variable size arrays are contiguous regions of memory, similar to memory allocated by malloc. The size of a variable size array, measured as the number of elements it contains, can be retrieved at run-time using the function ar_size .

Power-of-two size sparse arrays are a tree structure, used to represent an array holding a number of elements which is a power of two. Access to these arrays is very fast, and they are memory efficient for very large arrays which are sparsely populated.


Variable Size Arrays

up: Arrays
next: Power-of-Two Sparse Arrays

A variable size array is a dynamically allocated block of memory, similar to a block returned by lim_malloc , except that a variable sized array is tagged with its size, measured in the number of array elements.

A null pointer counts as an array of 0 elements. For example, if ar_size , which returns the size of a variable sized arrary, is passed 0 , it returns 0 . That means there is no special function to allocate a new variable sized array -- instead, array pointers should be initialized to 0 . This example creates an array with ten integers by using ar_ref :

     {
       int * the_array;
       int * tenth_element;

       the_array = 0;
       tenth_element = (int *)ar_ref (&the_array,
                                      lim_use_must_malloc,
                                      9,
                                      sizeof (int));
     }

A variable size array can be used as a stack. (See ar_push and ar_pop .)

Array functions use the lim_malloc family of functions to allcoate memory. (See Allocation With Limitations.)


Basic Variable Size Array Functions

up: Variable Size Arrays
next: Variable Sized Arrays as Stacks

Function ar_size

int ar_size (void * base,
           alloc_limits limits,
           size_t elt_size);

Return the number of elements in the array. If base == 0 , return 0 .

limits is the allocation limits associated with this array.



Function ar_ref

void * ar_ref (void ** base,
               alloc_limits limits,
               int n,
               int szof);

Return the address of element n of an array, expanding the array to n+1 elements, if necessary.

base is a pointer to a pointer to the array.

limits is the allocation limits associated with this array.

szof is the size, in bytes, of one element of the array.

If this function adds new elements to an array, those elements are filled with 0 bytes.

This function may resize and relocate the array. If it does, *base is updated to point to the new location of the array.



Function ar_setsize

void ar_setsize (void ** base,
               alloc_limits limits,
               int n,
               size_t szof);

Resize the array so that it contains exactly n elements.

base is a pointer to a pointer to the array.

limits is the allocation limits associated with this array.

szof is the size, in bytes, of one element.

If this function adds new elements to an array, those elements are filled with 0 bytes.

This function can be used to make an array smaller, but doing so does not reclaim any storage. (See ar_compact .)



Function ar_compact

void ar_compact (void ** base,
               alloc_limits limits,
               size_t szof);

Resize an array so that it is only as large as it needs to be.

base is a pointer to a pointer to the array.

limits is the allocation limits associated with this array.

szof is the size, in bytes, of one element.

This function may resize and relocate the array. If it does, *base is updated to point to the new location of the array.

Functions like ar_setsize can be used to make an array smaller, but doing so does not reclaim any storage used by the array and does not move the array in memory.

This function does attempt to reclaim storage (by using lim_realloc ). If the array occupies significantly more memory than needed, this function will move it to a smaller block. If lim_realloc returns 0 , this function has no effect.



Function ar_free

void ar_free (void ** base, alloc_limits limits);

Release storage associated with the array pointed to by *base . Set *base to 0 .

limits is the allocation limits associated with this array.




Variable Sized Arrays as Stacks

up: Variable Size Arrays
prev: Basic Variable Size Array Functions

Function ar_push

void * ar_push (void ** base,
              alloc_limits limits,
              size_t szof);

Return the address of element n in an array previously containing only n-1 elements.

base is a pointer to a pointer to the array.

limits is the allocation limits associated with this array.

szof is the size, in bytes, of one element.

The new array element is filled with 0 bytes.

This function may resize and relocate the array. If it does, *base is updated to point to the new location of the array.



Function ar_pop

void * ar_pop (void ** base,
             alloc_limits limits,
             size_t szof);

Return the address of the n th element in an array previously containing n elements. Resize the array so that it contains exactly n-1 elements.

base is a pointer to a pointer to the array.

limits is the allocation limits associated with this array.

szof is the size, in bytes, of one element.

This function may resize and relocate the array. If it does, *base is updated to point to the new location of the array.



Function ar_copy

void * ar_copy (void * base,
              alloc_limits limits,
              size_t szof);

Create a new array which is a copy of the array pointed to by base .

limits is the allocation limits associated with this array.




Power-of-Two Sparse Arrays

up: Arrays
prev: Variable Size Arrays


#include <hackerlab/arrays/pow2-array.h>

A pow2_array ( power-of-two sparse array ) is an array-like data structure. It always holds a number of elements which is some power of two. It provides reasonably fast access to elements (but slower than ordinary arrays). It provides good memory efficiency for sparsely populated arrays.

NOTE: this interface is net yet complete. Some details of the existing interface may change in future releases.


The pow2_array Data Structure

up: Power-of-Two Sparse Arrays
next: Allocating Power-of-Two Sparse Arrays

A pow2_array is represented by a tree structure of uniform depth. Leaf elements are ordinary (dynamically allocated) arrays, each leaf having the same number of elements.

So that sparsely populated arrays can be stored efficiently in memory, subtrees which are populated entirely with default values are represented in one of two ways: the root of such subtrees may be represented as a NULL pointer; or the root of such as subtree may be represented by a default node . In the latter case, one default node exists for each level of the tree (a default root, a default second-level node, a default leaf node, and so on). Representation by null pointer saves memory by not allocating default nodes. Representation by default nodes speeds up access, in some cases.

For each level of the tree, two values are defined: a shift and a mask . For a given internal node of the tree, the subtree containing the N th element below that node is stored in the subtree:

             (N >> shift) & mask

The index of the same element within that subtree is:

             N & ((1 << shift) - 1)

For leaf nodes, shift is 0 .

The opaque type pow2_array_rules holds the set of shift and mask values which define a tree structure for arrays of some size. The opaque type pow2_array holds a particular array.

Here is a simple example: a sparse array containing 8 elements. (In ordinary use, we would presumably choose a much larger power of two.)

We will define oen possible tree structures for this array: a two level tree with four elements in each leaf. Other structures are possible: we might have defined a two-level structure with two elements in each leaf or a three-level structure with 2 in each leaf, and two sub-trees below each internal node.

For the two level tree with four elements per leaf node, we have:

     root:                   shift == 2
                             mask == 1

     leaf nodes:             shift == 0
                             mask == 3

The default leaf node, at address Ld is a four element array:

     Ld:
     ---------------------------- 
     | dflt | dflt | dflt | dflt| 
     ---------------------------- 

The default root node, at address Rd is a two element array:

     Rd:
     -----------
     | Ld | Ld |
     -----------

An array with a non-default value (V ) in element 2 , but default values everywhere else might look like:

                     root:
                     ----------------
                     |  leaf  |  Ld |
                     --/----------|--
                      /           |
                     /            |
          /---------/             |
     leaf:                       Ld:
     --------------------------  -----------------------------
     | dflt | V | dflt | dflt |  | dflt | dflt | dflt | dflt |
     --------------------------  -----------------------------

Suppose that elements in this array are of type T . Then, using the shift and mask values given above, the address of element N in that tree is:

     (T *)((char *)root[(N >> 2) & 1] + ((N & ((1 << 2) - 1)) & 3))

That is the address returned by the function pow2_array_rref . Note that this address might be in leaf , or it might be in the default leaf Ld .

When modifying a particular element, it is important to not modify the default leaf. A copy-on-write strategy is used. For example, before modifying element 7 , the tree is rewritten:

                     root:
                     --------------------
                     |  leaf0  |  Leaf1 |
                     --/------------/----
                      /            /
                     /            /
          /---------/            /
    leaf0:                      leaf1:
    --------------------------  -----------------------------
    | dflt | V | dflt | dflt |  | dflt | dflt | dflt | dflt |
    --------------------------  -----------------------------

The function pow2_array_ref performs that copy-on-write operation and then returns an element address similarly to pow2_array_rref .

If the default value for elements is a region of memory filled with 0 bytes, a tree can be represented without using default nodes. For example, the array containing an element only in element 2 would be represented:

                     root:
                     ----------------
                     |  leaf  |  0  |
                     --/-------------
                      /           
                     /            
          /---------/             
     leaf:                       
     -----------------  
     | 0 | V | 0 | 0 |  
     -----------------  

A tree of this variety is created by not specifying a default leaf node when calling make_pow2_array_rules . pow2_array_ref returns a NULL pointer if an element is accessed which is not currently in such a tree.

The function pow2_array_compact compresses the representation of a sparse array by eliminating identical subtrees. For example, after calling pow2_array_compact on an array with default values everywhere except elements 1 and 5 , the tree would look like:

                     root:
                     -------------------
                     |  leaf  |  Leaf |
                     -------\-------/---
                             \     /
                              \   /
                               \ /
                                leaf:
                                --------------------------
                                | dflt | V | dflt | dflt |
                                --------------------------

After calling pow2_array_compact , it is no longer safe to call pow2_array_ref for the same array. pow2_array_ref is safe. pow2_array_compact is useful in combination with pow2_array_print .


Allocating Power-of-Two Sparse Arrays

up: Power-of-Two Sparse Arrays
next: Accessing Elements in Sparse Arrays
prev: The pow2_array Data Structure

Function make_pow2_array_rules

pow2_array_rules make_pow2_array_rules (alloc_limits limits,
                                        size_t elt_size,
                                        void * default_page,
                                        int shift,
                                        size_t mask,
                                        ...);

Return the pow2_array_rules which defines the tree structure for a particular type of sparse array.

limits is used when allocating the pow2_array_rules and default nodes. See Allocation With Limitations.

elt_size is the size, in bytes, of individual elements.

default_page is either 0 or a default leaf node.

The remaining arguments are a series of shift and mask pairs, ending with a pair in which shift is 0 .

See The pow2_array Data Structure for more information about default leaf nodes, shifts, and masks.

If allocation fails, this funtion returns 0 .



Function pow2_array_alloc

pow2_array pow2_array_alloc (alloc_limits limits,
                             pow2_array_rules rules);

Allocate a sparse array.

limits is used when allocating the array. It is also used by pow2_array_ref when allocating nodes within the array. See Allocation With Limitations.

rules defines the tree structure for the array and should be an object returned by make_pow2_array_rules .

If allocation fails, this funtion returns 0 .




Accessing Elements in Sparse Arrays

up: Power-of-Two Sparse Arrays
prev: Allocating Power-of-Two Sparse Arrays

Function pow2_array_rref

void * pow2_array_rref (pow2_array array, size_t addr);

Return the address if the addr element within array .

The value pointed to by this address should not be modified.

If the element has never been modified, and no default leaf node was passed to make_pow2_array_rules , this function returns 0 .



Function pow2_array_ref

void * pow2_array_ref (pow2_array array, size_t addr);

Return the address if the addr element within array .

The value pointed to by this address may be modified.

This function might allocate memory if the element has not previously been modified. If allocation fails, this function returns 0 .



libhackerlab: The Hackerlab C Library
The Hackerlab at regexps.com