Friday, January 14, 2011

Constant-time Set Membership

While scouring the internet yesterday, I came upon an article written by Preston Briggs, currently at NVIDIA and a former office mate in Bellevue. Efficient representation for sparse sets.

This paper considers set representations of objects for which a bijection to the natural numbers exists and provides a novel alternative to bit vectors.

Sets represented by bit vectors are one of the more common approaches [what we use in Ocelot]. Given a universe of size u, bit vectors indicate whether element i is a member of a set by setting bit i providing access times on the order O(u). This representation also requires O(u) time to clear the bit vector initially. If short-lived sets with large u are needed frequently, overheads can be prohibitive. A sparse representation may be desirable, but it should have O(1) access times.

Preston's approach represents a sparse set with two arrays of size u, denoted dense and sparse, and an integer k representing the number of elements in the set. The array dense stores a packed sequence of elements that have been added to the set, and valid indices are 0 ... k-1. The array sparse provides a mapping of natural numbers to possible indices into dense. For example, assume j belongs to the set and is stored in location dense[p]; the following is true: sparse[j] = p.

With two array access, both O(1) we can efficiently determine whether element j is in the set:

def isIn(j, S):
return (S.sparse[j] < S.k) and (S.dense[S.sparse[j]] == j)

That is, if sparse[j] is a valid index (from 0 to k-1) and the object at that index is indeed the element, then it belongs to the set. This technique has the advantage that neither dense nor sparse needs to be cleared initially. To clear the set, simply set k = 0.

Elements are added by:

def insert(S, j):
S.dense[S.k] = j
S.sparse[j] = S.k
S.k += 1

Storage requirements are 2*(u+1) for a set of u integers. I thought this was interesting.

No comments: