I encountered the following interview question on the Internet.
Describe a data structure for which getValue(int index), setValue(int index, int value), and setAllValues(int value) are all O(1).
Though array is good enough for the first and second operations to be performed in O(1), what can be proposed for the third (setAllValues)?
How about an array
of tuples {timestamp, value}
, with an additional {timestamp, value}
called all
. Since you only care about the relative time of insertions, you can use a monotonically increasing id
for the values of timestamp:
type Entry {
int timestamp,
int value
}
type structure {
int id
Entry all
Entry[] array
}
Initialise all fields to 0. Then the following should work for you:
setValue(index i, value v):
array[i] = {id++, value}
value getValue(index i)
if(all.timestamp > array[i].timestamp) return all.value
else return array[i].value
setAll(value v)
all = {id++, value}
A problem with this approach is that eventually you'll run out of ids for timestamp, and might wrap around. If you chose a 64 bit value to store timestamps, then this gives you 18,446,744,073,709,551,616 insertions or setAlls before this happens. Depending on the expected use of the datastructure, an O(n) cleanup phase might be appropriate, or you could just throw an exception.
Another issue that might need to be considered is multi-threading. Three obvious problems:
id++
isn't atomic and two threads obtained a new id
at the same time then you could get two entries with the same id.insert()
at the same time as a get()
then you might end up in a position where you've got say {new_id, old_value}
in the array, causing the wrong value to be returned.If any of these are problems, the absolute easiest solution to this is to put "not thread safe" in the documentation (cheating). Alternatively, if you can't implement the methods atomically in your language of choice, you'd need to put some sort of synchronisation locks around them.
current_time()
returning a different value though (otherwise, two timestamps may be equal - Damien_The_Unbeliever 2012-04-04 06:07
{system_timestamp,id}
in place of timestamps to get around this, when I realised that there's no need to involve system timestamps at all. See edit : - Timothy Jones 2012-04-04 06:43
id
is an obvious fix :-) - you might also mention in the "interview" that multi-threading issues may also need to be addressed - Damien_The_Unbeliever 2012-04-04 06:46
"an array of tuples"
? Is it an array with two fields {timestamp, value}
? something like HashMap? (I couldn't understand your approach) - Hengameh 2015-07-23 12:18
setAll(100)
, set the value at Y to anything, then read the value at X, the single boolean approach would give the wrong answer - Timothy Jones 2019-02-19 04:30
I got the same question in one of the technical interviews.
Here is my complete ready-to-use Java-implementation including the test cases.
The key idea is to keep the value of setAll()
in special variable (e.g. joker
) and then trace the change of this value in a proper way.
In order to keep the code clean, some access modifiers have been abolished.
Node class:
import java.time.LocalDate;
class Node {
int value;
LocalDate jokerSetTime;
Node(int val, LocalDate jokSetTime) {
value = val;
jokerSetTime = jokSetTime;
}
}
DS class:
class DS {
Node[] arr;
DS(int len) {
arr = new Node[len];
}
}
DataStructure class:
import java.time.LocalDate;
class DataStructure {
private boolean isJokerChanged;
private Integer joker;
private LocalDate jokerSetTime;
private DS myDS;
DataStructure(int len) {
myDS = new DS(len);
}
Integer get(int i) {
Integer result;
if (myDS.arr.length < i) {
return null;
}
// setAll() has been just called
if (isJokerChanged) {
return joker;
}
if (myDS.arr[i] == null) {
// setAll() has been previously called
if (joker != null) {
result = joker;
} else {
result = null;
}
} else if (myDS.arr[i].jokerSetTime == jokerSetTime) {
// cell value has been overridden after setAll() call
result = myDS.arr[i].value;
} else {
result = joker;
}
return result;
}
void setAll(int val) {
isJokerChanged = true;
joker = val;
jokerSetTime = LocalDate.now();
}
void set(int i, int val) {
isJokerChanged = false;
myDS.arr[i] = new Node(val, jokerSetTime);
}
}
Main class:
class Main {
public static void main(String[] args) {
DataStructure ds = new DataStructure(100);
Integer res;
res = ds.get(3);
ds.set(3, 10);
res = ds.get(3);
ds.setAll(6);
res = ds.get(3);
res = ds.get(15);
ds.set(4, 7);
res = ds.get(4);
res = ds.get(3);
ds.setAll(6);
ds.set(8, 2);
res = ds.get(3);
}
}
Update:
The code has been updated. The previous implementation didn't take into account the case when setAll()
is called twice in a row with the same value and is followed by set(x)
, get(y)
, e.g.: setAll(100)
, set(3, 1)
, setAll(100)
, set(5, 3)
, get(3)
.
The use of timestamp approach has been added to allow distinguishing between different setAll()
calls with identical values.
P.S. This implementation is not a thread safe.
ds.setAll(6); ds.set(3,10); ds.setAll(6); ds.set(4,1); res = ds.get();
It will return the incorrect value of 10, even though setAll(6)
was called after it. The second ds.set()
call is needed to clear the isJokerChanged
variable, which would have superficially defended against this bug otherwise - AnatolyVorobey 2018-07-12 17:29
How about an array of pointers to a single common value? Set the value, and all references will point to the single changed value in O(1)..
I was just asked this question very recently in an Interview. I came up with a hash table implementation. It would solve the problem of running out of the timestamp values but the thread safety feature needs to be implemented (probably using lazy initialization techniques)
Let's say in our class we have a private variable _defaultValue to hold a default value and we also have a private hashtable or dictionary _hashtable. SetAllValues could just set _defaultValue equal to the value passed and _hashtable initialized/set to a new hashtable and discard any reference to the old hash table. SetValue should just add a new value to _hashtable or update the value if the key ( or index) is already present in the _hashtable. GetValue should check whether the key (or index) is present in _hashtable, then return it, else return the value stored in _defaultValue.
This is my first answer on StackOverflow. I am a little lazy in writing up the code. Will probably edit the answer soon.
The interviewer nodded yes to this solution but insisted to implement it without using a hashtable. I guess, he was asking me to give a similar approach as Timothy's answer. And I wasn't able to get it at that moment :(. Anyways, Cheers!
EDIT: Posting the code (in C#)
class MyDataStruct
{
private int _defaultValue;
private Dictionary<int,int> _hashtable;
public MyDataStruct()
{
_defaultValue = 0; // initialize default with some value
_hashtable = new Dictionary<int, int>();
}
public void SetAllValues(int val)
{
_defaultValue = val;
_hashtable = new Dictionary<int, int>();
}
public void SetValue(int index, int val)
{
if (_hashtable.ContainsKey(index))
{
_hashtable.Add(index, val);
}
else
{
_hashtable[index] = val;
}
}
public int GetValue(int index)
{
if (_hashtable.ContainsKey(index))
{
return _hashtable[index];
}
else
{
return _defaultValue;
}
}
}
We can have a variable V which stores an int and an array of containing Tuple as {Value, id}..
And a global int variable G (which will act like identity in SQL and whenever any set or setAll operation is done its value get increment by 1)
initial all Ids and V value will be default say null..
so V = null All Tuple = {null, null}
set(index i, val v) -> G = G+1, arr[i].Val = v, arr[i].Id = G
get(index i) -> if V.Id > arr[i].Id return V.Val else return arr[i]
set-all(val v) -> G= G+1, V.Id= G, V.Val = v
All existing answers use a timestamp that is incremented on every setVal
operation. This is not necessary. In fact, it is necessary only to increment the timestamp on setAll
. Another issue some raised was arithmetic overflow. This can be handled without breaking the constant time bounds by updating a single cell on each setAll
and performing the time comparison carefully.
The basic concept is essentially similar to that of the other answers, but with a twist.
What they do: Store the value used for the last setAll
operation separately, and keep track of the time that operation was performed. Each time they perform a setVal
, they store the current time along with the given value in the array. Each time they perform a getVal
, they compare the time in the given location to the time the last setAll
occurred, and then choose either the value at the location or the setAll
value depending on which is greater.
Why this can be a problem: Suppose the current time overflows and a setAll
operation occurs soon afterward. It will appear as though most of the stored array values are newer than the setAll
value, when they are actually older.
The solution: Stop imagining that we're keeping track of the total amount of time that has passed since data structure initialization. Imagine a giant clock with a "second hand" that ticks not 60 times around the circle but rather 2^n times around the circle. The position of the second hand represents the time of the most recent setAll
operation. Each setVal
operation stores this time along with a value. So if we perform a setAll
when the "clock" is at 45, and then perform six setVal
operations on different elements, the setAll
time and the times at all six locations will be the same. We wish to maintain the following invariant:
The time at a given element location equals the
setAll
time if and only if that element was set withsetVal
more recently than the lastsetAll
operation.
Clearly, the procedure described above automatically ensures that if the element was set recently, then its time will equal the setAll
time. The challenge is to make the reverse implication hold as well.
To be continued ....
I've written this in Haskell because that is the language I know best, but it is not exactly the most natural language for the job.
{-# LANGUAGE BangPatterns #-}
module RepArr where
import Control.Monad.Primitive
import Data.Primitive.MutVar
import qualified Data.Vector.Mutable as V
import Data.Vector.Mutable (MVector)
import Control.Applicative
import Prelude hiding (replicate)
import Control.Monad.ST
import Data.Word
-- The Int in the MutVar is the refresh pointer
data RepArr s a = RepArr (MutVar s (Word, a, Int)) (MVector s (Word,a))
-- Create a fancy array of a given length, initially filled with a given value
replicate :: (PrimMonad m, Applicative m) => Int -> a -> m (RepArr (PrimState m) a)
replicate n a = RepArr <$> newMutVar (0,a,0) <*> V.replicate n (0,a)
getVal :: PrimMonad m => RepArr (PrimState m) a -> Int -> m a
getVal (RepArr allv vec) n = do
(vectime, vecval) <- V.read vec n
(alltime, allval, _) <- readMutVar allv
if (fromIntegral (alltime - vectime) :: Int) > 0
then return allval
else return vecval
setVal :: PrimMonad m => RepArr (PrimState m) a -> Int -> a -> m ()
setVal (RepArr allv vec) i v = do
(!alltime, _, _) <- readMutVar allv
V.write vec i (alltime, v)
setAll :: PrimMonad m => RepArr (PrimState m) a -> a -> m ()
setAll r@(RepArr allv vec) v = do
(oldt, _, oldc) <- readMutVar allv
getVal r oldc >>= setVal r oldc
let !newc = case oldc+1 of
op1 | op1 == V.length vec -> 0
| otherwise -> op1
let !newt = oldt+1
writeMutVar allv (newt, v, newc)
To avoid potential (rare) garbage collection pauses, it's actually necessary to unbox the Int
and Word
values, as well as using unboxed vectors instead of polymorphic ones, but I'm not really in the mood and this is a fairly mechanical task.
Here's a version in C (completely untested):
#include <malloc.h>
struct Pair {
unsigned long time;
void* val;
};
struct RepArr {
unsigned long allT;
void* allV;
long count;
long length;
struct Pair vec[];
};
struct RepArr *replicate (long n, void* val) {
struct RepArr *q = malloc (sizeof (struct RepArr)+n*sizeof (struct Pair));
q->allT = 1;
q->allV = val;
q->count = 0;
q->length = n;
int i;
for (i=0; i<n; i++) {
struct Pair foo = {0,val};
q->vec[i] = foo;
}
return q;
}
void* getVal (struct RepArr *q, long i) {
if ((long)(q->vec[i].time - q->allT) < 0)
return q->allV;
else
return q->vec[i].val;
}
void setVal (struct RepArr *q, long i, void* val) {
q->vec[i].time = q->allT;
q->vec[i].val = val;
}
void setAll (struct RepArr *q, void* val) {
setVal (q, q->count, getVal (q, q->count));
q->allV = val;
q->allT++;
q->count++;
if (q->count == q->length)
q->count = 0;
}
/*
At the time I am writing this, all of the solutions on this page will double (or more) the amount of space required to store the array. The following solution reduces the amount of wasted space from Ω(n) to θ(n/w), where w is the number of bits in a computer "word". On my machine, that's 64.
This prose in this answer is in C comments, so you can copy-and-paste this answer verbatim and compile it with your C compiler.
*/
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
/*
The problem is to support reading and writing values in an array in O(1) time along with bulk writes in which all values in the array are written at once in O(1) time. This is possible using a technique invented by Aho, Hopcroft and Ullman, as far as I know. I will present a version due to Gonzalo Navarro, "Constant-Time Array Initialization in Little Space".
The idea is to keep three metadata arrays along with the data array. We also
keep two integers: unset
, which is the last value used in a bulk write
operation, and size
, an approximation for the number of values that have been
set since the last bulk write. At all times, the number of distinct values
written since the last bulk write is between size
and w * size
.
The three metadata arrays describe information about blocks of w values in the data array. They are:
nth
: nth[i] is the ith unique block to be written to since the last bulk
write
inverse_nth
: inverse_nth[i] is the order in in which the ith block of the
array was written, counting from 0 at the last bulk write.
bitset
: The jth bit of bitset[i]
is 1 when the array cell numbered
64*i + j has been written to since the last bulk write.
bitset[i]
and inverse_nth[i]
are allowed to be invalid if i
is not a
member of the set {nth[0]
, nth[1]
, ... , nth[size-1]
}. In other words,
inverse_nth[i]
and bitset[i]
are valid if and only if inverse_nth[i] < size
and nth[inverse_nth[i]] == i
.
Rather than store three separate arrays of the same length, I chose to store one
array, is_set
, with three fields.
*/
typedef struct {
int nth_;
int inverse_nth_;
uint64_t bitset_;
} IsSetCell;
typedef struct {
int unset_;
int size_;
IsSetCell is_set_[];
} IsSetArray;
typedef struct {
IsSetArray * metadata_;
int payload_[];
} ResettableArray;
/*
To construct an array, we need a default value to return when the reading a value that has never been written to.
*/
ResettableArray * ConstructResettableArray(int n, int unset) {
ResettableArray* result =
malloc(offsetof(ResettableArray, payload_) + n * sizeof(int));
if (!result) return NULL;
n = (n + 63) / 64;
result->metadata_ =
malloc(offsetof(IsSetArray, is_set_) + n * sizeof(IsSetCell));
if (!result->metadata_) {
free(result);
return NULL;
}
result->metadata_->size_ = 0;
result->metadata_->unset_ = unset;
return result;
}
void DestructResettableArray(ResettableArray * a) {
if (a) free(a->metadata_);
free(a);
}
/*
The bulk of the algorithm is in writing and reading the metadata. After
IsSet()
and Set()
are defined (below), reading and writing the arrays is
straightforward.
*/
bool IsSet(const IsSetArray * a, int i);
void Set(IsSetArray * a, int i);
int GetValue(const ResettableArray * a, int i) {
if (!IsSet(a->metadata_, i)) return a->metadata_->unset_;
return a->payload_[i];
}
void SetValue(ResettableArray * a, int i, int v) {
a->payload_[i] = v;
Set(a->metadata_, i);
}
void SetAllValues(ResettableArray * a, int v) {
a->metadata_->unset_ = v;
}
/*
The complex part of reading and writing is the bidirectional relationship
between inverse_nth
and nth
. If they point to each other at location i
(is_set[is_set[i].inverse_nth].nth == i
) then location i contains valid data
that was written after the last bulk write, as long as is_set[i].inverse_nth <
size
.
*/
uint64_t OneBit(int i) {
return UINT64_C(1) << i;
}
bool IsSet(const IsSetArray * a, int i) {
const int cell = i/64, offset = i & 63;
const int inverse_nth = a->is_set_[cell].inverse_nth_;
return inverse_nth < a->size_ && a->is_set_[inverse_nth].nth_ == cell &&
a->is_set_[cell].bitset_ & OneBit(offset);
}
void Set(IsSetArray * a, int i) {
const int cell = i/64, offset = i & 63;
const int inverse_nth = a->is_set_[cell].inverse_nth_;
if (inverse_nth >= a->size_ || a->is_set_[inverse_nth].nth_ != cell) {
a->is_set_[cell].inverse_nth_ = a->size_;
a->is_set_[cell].bitset_ = 0;
a->is_set_[a->size_].nth_ = cell;
++a->size_;
}
a->is_set_[cell].bitset_ |= OneBit(offset);
}
Python example
class d:
def __init__(self, l):
self.len = l
self.g_p = [i for i in range(self.len)]
self.g_v = [0 for i in range(self.len)]
self.g_s = self.len - 1
self.g_c = 0
def getVal(self, i):
if (i < 0 or i >= self.len):
return
if (self.g_p[i] <= self.g_s):
return self.g_v[self.g_p[i]]
return self.g_c
def setVal(self, i, v):
if (i < 0 or i >= self.len):
return
if (self.g_p[i] > self.g_s):
self.g_s += 1
self.g_p[self.g_s], self.g_p[i] = self.g_p[i], self.g_p[self.g_s]
self.g_v[self.g_p[i]] = v
def setAll(self, v):
self.g_c = v
self.g_s = -1
Regarding Timothy Jone's answer:
A problem with this approach is that eventually you'll run out of ids for timestamp, and might wrap around. If you chose a 64 bit value to store timestamps, then this gives you 18,446,744,073,709,551,616 insertions or setAlls before this happens. Depending on the expected use of the datastructure, an O(n) cleanup phase might be appropriate, or you could just throw an exception.
This is exactly the worst case scenario which make this solution O(n) too, and not O(1). This stucture, although saves a lot of potential O(n) insert operation, is still in O(n) effeciency.
setAll
, it performs one step in the refresh process. Note that the comparison between the setAll
time and the individual entry time is not done using the usual a<=b
, but instead with a-b<=0
. These are different in the presence of overflow. I don't test whether one time is before the other, but rather whether one lags the other by at most an amount depending on the width of the type - dfeuer 2015-03-24 13:27
This is my answer in Java (I'm not completly sure about the syntax). I did the set-functions synchronized in order to avoid a situation where changeT and defChangeT are equal.
Struct ValueSt{
int value;
Timestamp changeT;
}
Class MyDS{
int default;
Timestamp defChangeT;
HashMap map;
public MyDS(){
map = new HashMap<int, ValueSt>();
}
public synchronized void mySet(Int key, int value){
ValueSt vs = new ValueSt(value, Timestamp(System.current));
map.put(key, vs);
}
public synchronized void setAll(int value){
default = value;
defChangeT = Timestamp(System.current));
}
public int myGet(int key){
ValueSt vs = map.get(key);
if(vs != null){
if(vs.changeT > defChangeT)
return vs.value;
else
return default;
}
return null;
}
}
The correct solution in C#:
public sealed class SpecialDictionary<T, V>
{
private Dictionary<T, Tuple<DateTime, V>> innerData;
private Tuple<DateTime, V> setAllValue;
private DateTime prevTime;
public SpecialDictionary()
{
innerData = new Dictionary<T, Tuple<DateTime, V>>();
}
public void Set(T key, V value) => innerData[key] = new Tuple<DateTime, V>(GetTime(), value);
public void SetAll(V value) => setAllValue = new Tuple<DateTime, V>(GetTime(), value);
public V Get(T key)
{
Tuple<DateTime, V> tmpValue = innerData[key];
if (setAllValue?.Item1 > tmpValue.Item1)
{
return setAllValue.Item2;
}
else
{
return tmpValue.Item2;
}
}
private DateTime GetTime()
{
if (prevTime == null)
{
prevTime = DateTime.Now;
}
else
{
if (prevTime == DateTime.Now)
{
Thread.Sleep(1);
}
prevTime = DateTime.Now;
}
return prevTime;
}
}
And the test:
static void Main(string[] args)
{
SpecialDictionary<string, int> dict = new SpecialDictionary<string, int>();
dict.Set("testA", 1);
dict.Set("testB", 2);
dict.Set("testC", 3);
Console.WriteLine(dict.Get("testC"));
dict.SetAll(4);
dict.Set("testE", 5);
Console.WriteLine(dict.Get("testC"));
Console.WriteLine(dict.Get("testE"));
Console.ReadKey();
}