Initializing an array in O(1), and the Farray library

Tom Herman
10 min readDec 8, 2020

--

Let’s jump in.
You are using an array in your code, and you want to set it all to one value.

I’m going to use a C/C++ syntax here.
The trivial way is:

int A[100];
for (int i = 0; i < 100; i++)
A[i] = 0;

Very nice and clean. The problem? it takes too much time!
Well maybe the above code doesn’t, but this one:

int A[100000000];
for (int i = 0; i < 100000000; i++)
A[i] = 0;

You are not convinced yet? How about a program that does it in a loop?

int A[100000000];for (int times = 0; times < 1000000; times++) {
for (int i = 0; i < 100000000; i++)
A[i] = 0;
// some code...
}

Well, O(n) kinda sucks here.

We can allocate a binary array in the same length as the array and set it (and not the array) all to zeros. Then each time we will access index i in the array, we will first check if it’s uninitialized (binary array[i] == 0), and if so we’ll set it to 1, and set the array[i] with the default value.
it will be 32 times faster to initialize (for a 32bit integer) but still O(n).

We can use memset or std::fill, but both are still implemented in O(n).

Let’s aspire for a better time. Can we do it in O(1)?

Did someone say Hash Table?
We can create an empty hash table in O(1) time and can write/read from it in amortized O(1). we can say that if some index isn’t in the hash table yet, we know it’s uninitialized — so we will return the default value.
That Looks awesome! but how about we’ll do it again?

Well, we can’t. We can’t magically make all the existing entries disappear, it will take about O(number of elements in the hash table) time to do that. it’s an improvement, but far from O(1). So it is good for one-time initialization, and it’s amortized. It’s not optimally.

Let's continue with what we've found. We can use a fixed-size hash table as a way to “initialize” an array (one-time). Each cell in it will be set by default to {0, 0}. The first value will be the cell’s data, and the second will be the current “time”. curr_time will be initialized to 0.

The read/write/fill (fill sets all cells) operations will be (in pseudo-code):

init() {
curr_time = 1
hash_table_init_array(A)
default_value = 0
}
fill() {
curr_time++
}
write(i, v) {
A.set(i, {v, curr_time})
}
read(i) {
v, time = A.get(i)
if (time == curr_time)
return v
else
return default_value
}

Each init() will increment curr_time by 1, so it can be applied about 2^n (for n-bit curr_time) times. It’s much better than 1 time, but is still bounded (It can be fixed by a single re-initialization of the array A (to time=0 again), but it will take Ө(n) time).

Moreover, it still has major problems:
* First, it still has the amortized O(1) time, meaning that it can take Ө(n) worst time for a read/write.
* Second, It uses an extra of sizeof(curr_time) * n memory, and the extra memory the hash function takes (probably about an extra Ө(n)).

The first solution:

The next solution is O(1) worst-time. Yep.

Besides the given array A[n], we’ll need two more arrays, B and C, both of type [some unsigned-integer with at-least log2(n) bits]. we’ll also save the current default value (def), and another such unsigned-integer — b.

fill(v) {
b = 0
def = v
}
is_inited(i) { // make if chained
j = B[i]
if (j < b && C[j] == i)
return true
return false
init(i) { // create chain
C[b] = i
B[i] = b
b++
}
read(i) {
if (is_inited(i))
return A[i]
else
return def
}
write(i, v) {
if (!is_inited(i))
init(i)
A[i] = v
}

The read and write are straight forward.
A cell of index i is considered initialized if B[i] is below b (let’s say j<b), and C[j] equals i. So — it is initialized if B and C are chained (below b).

The chained condition can be described as C[B[i]] == i, or as B[i]==j && C[j]=i.

Three quick observations:
- After every fill b is set to 0, so all cells are then uninitialized.
- An already initialized cell will stay initialized until the next fill.
-
for every j<b, i=C[j] is chained with B[i]
(C[j] is used for the initialized cell of index i).

So all we are left to see is that an uninitialized cell will always return def.
let’s look at an uninitialized cell with index i.
if B[i] ≥ b then is_inited(i) returns false — great.
if B[i] < b (let’s say B[i] = j) then C[j] is already chained
(as the last observation states) and therefore i’=C[j] is different from i, because i’ represent an initialized cell, while i doesn’t — so they must represent different cells.

A bit confusing, but it works, and in worst-case constant-time.
You can read more about it (with an example) here.

We can also set the default value to a different one simply by setting def (and calling fill if you want to set written values too).

It still has one big flaw — it takes an extra O(n) space. specifically, it takes extra 2n memory words (words with at-least log2(n) bits).
Keep in mind that the use of such O(1) fill operation is best and most needed when the array is of a big size. In that case — having to allocate an extra memory of about twice the array size — is a huge waste of memory.

The better solution:

The next solution is O(1) worst-time, with only O(1) extra space.
The main idea is to take the last solution (the chain idea) and to put it all in the base array A, and making the algorithm in-place.

It’s based on the In-Place Initializable Arrays paper.

The array (of type T[n]) is divided into blocks, each divided into two half blocks, each of size T[m] for some constant m.
m is chosen such that sizeof(T)*m ≥ sizeof(size_t) (with size_t a variable that can hold indices for an array). Each half block is a union of size_t and T[m].
The algorithm requires an extra memory word — b, and the default value — def.

The b divides the array into two sections:
- UCA: The first b blocks (blocks 0-[b-1])
- WCA: The remaining blocks (blocks b-last_block)

Two blocks (indices i1, i2) are considered chained iff both:
* one is from UCA and the other from WCA
* their first-half-blocks are making a chain:

blocks[i1].firstHalf.as_size_t == i2
&&
blocks[i2].firstHalf.as_size_t == i1

An UCA block is considered initialized if it’s not chained.
Its data is saved in its two half-blocks’ T[m] arrays.

A WCA block is considered initialized if it is chained.
Its data is saved in the second-half-block of both itself and the chained block.

That’s why if a block is initialized — there is always a place for its 2*T[m] data. Notice that for every two chained blocks, exactly one is considered initialized.

A fill(v) is just setting b to 0, and if necessary changing the def to a new value.

A read(i) is just deciding whether the block is initialized (if not — returning def), and is so deciding where its data can be found and read it.

A write(i, v) is a bit more complicated — if the block is already initialized we just need to edit the array in the right spot. Else — extend the UCA by one block (b++), fix the problems it caused, and use a freed block (there will always be a block in UCA without a purpose after this extension) to chain or hold data for the written block. The written block’s data is first initialized with def, and then v is written to the right spot.

More delicate points:

Of course, some implementation details are missing here — what if setting a spot accidentally created a chain? What if the array can’t be divided into equal-sized blocks?

After each change to the memory — if accidentally a chain was created, and it can only be done by changing the lower-half-block of a UCA block, the corresponding newly-chained WCA block is changed so that its first block will contain the index of itself.
It breaks the chain without creating more chains because a chain can’t be formed between a block and itself.
It’s ok to change the WCA block because it’s not initialized, and it wasn’t chained to any block before. It doesn’t hold any important information yet.

If the array can’t be divided into equally-sized blocks — The last remaining cells which can’t form a block will be treated as a normal array. fill will set them all, and read/write will be done normally. complexity-wise it is smaller than a block, which is of a constant size.

An Implementation

Sounds awesome! I guess... it’s a great result but sounds quite hard to implement.
Well — You are not wrong. It is a bit tricky.
That's why it is so nice that someone (me!) had written an open-source library just for that!

Farray — Fillable Array

As I read the above paper — I thought, wow! it’s a great result (*elaborated more at the end), but I couldn’t find a single implementation of it. Well, I found a java implementation of it, but it only worked for integers, wasn’t general enough, and was a side implementation of a bigger project.

I wanted to create something general enough (that will allow creating an array of any data-type) that will be easy to include in existing projects.

That’s why I created Farray — C++ header-only implementation of this article.
Just download the header-file you want, and include it.
That’s it.

It is templated and easy to use.
here’s a code snippet from the GitHub page:

#include "farray1.hpp"
Farray1<int> A(n, 1); // allocate int[n] and set all to 1's

A[3] = 5;
int x = A[12] + A[19] + A[3]; // reading (1+1+5)

cout << "This must be seven: " << x << endl;


A = 2020; // simple initialization - set all values to 2020

for (int i = 5; i <= 10; i++)
A[i] = i*i;

for (int i = 3; i <= 12; i++)
cout << A[i] << " ";

The output will be:

This must be seven: 7
2020 2020 25 36 49 64 81 100 2020 2020

My implementation is thoroughly tested.
Both the code can be found here, together with the Project’s Website.

The above Farray1 class is the implementation of the 1 extra bit solution below (the implementation we just talked about is in the Farray class).

The Best solution

The paper explains how the above solution can be changed into one that only needs 1 extra bit of memory, besides the array, and still allows fill, read, write in O(1) time.

The trick is to expand the half-block size to be at the size of size_t+size_t+T (increasing m to allow it) and to save the b, and def in the last block reserved size_t+T. Whenever we will need access to them — we will access that place.

The key observation here is that the last block’s first-half-block [last size_t+T part] will be overwritten with T[m] values only when the whole array is done initializing.
It is the last part to be written before the array is fully initialized.

The last observation to make here is that after the whole array was initialized — it is in the form of a regular array. read/write can be done in the normal way, and the array is fully functional on its own. No need for the b and def anymore, which were overridden by the last write to the array.

To differentiate between the two possible states of the array (fully initialized vs not fully initialized) we need an extra bit of memory. Thus we need the 1-bit flag — which is set iff the array is fully initialized. That’s how we can know if to just read/write as a normal array or to use the special read/write we implemented (which use the b and def found in the last block).

The paper finishes by proving that for an array with no extra memory, its fill operation must take at-least Ө(n). That’s why this solution is optimal.

Let’s just mention that it’s space optimally, and its time is still O(1) but it is a bit slower than the “better” implementation, simply because its block size is bigger.

The first header file I created was of the 1bit implementation, and it is thoroughly tested and complete. It is the first open implementation of this algorithm.

More Information

Check the Farray Website I created for my implementation of this paper. It features explanations, advanced features, the project’s vision & structure, and more.

Hope you learned something new, and you are welcome to use my library.

The reason I implemented it was first to show that it’s possible;
The second reason was to make it a one-write many-uses case — It’s enough complicated and tricky to be implemented with random bugs, and it should be simple to use, as it’s basically a variant of the basic array data-structure.

The first and second reasons are complete. It was random-tested a lot (see the test folder for more information) and is super easy to use (including a stand-alone header-file — no need for any make or other compilation script change).

The third and last reason — I want it to succeed. I gave this project many hours of my life, much more that I probably should have, and I want it to get known. Please share two/three of your friends with the O(1) initializable array concept, the results, and the Farray library, and help make it grow!

Contact me if you had great fun with the topic as I did, and don’t forget to Star the repo if you think it’s worth it.

--

--

Tom Herman
Tom Herman

Written by Tom Herman

CS master's student at the Technion, Israel

No responses yet