Efficient design structure for iterating over many arrays

I have 20 different variable-length arrays, each holding a unique value, and I need to calculate every possible combination of these:

#define LENGTH_COUNT 6
#define WIDTH_COUNT 4
etc, for all 20 arrays:

int length[LENGTH_COUNT];
int width[WIDTH_COUNT];
int height[HEIGHT_COUNT];
int weight[WEIGHT_COUNT];
int growth[GROWTH_COUNT];
int decay[DECAY_COUNT];
int sound[SOUND_COUNT];
int texture[TEXTURE_COUNT];
int moisture[MOISTURE_COUNT];
int volume[VOLUME_COUNT];
int speed[SPEED_COUNT];
int color[COLOR_COUNT];
int purpose[PURPOSE_COUNT];
int delay[DELAY_COUNT];
int vibrancy[VIBRANCY_COUNT];
int brix[BRIX_COUNT];
int ripeness[RIPENESS_COUNT];
int mold[MOLD_COUNT];
int temp[TEMP_COUNT];
int language[LANGUAGE_COUNT];


void iterate(void)
{
    for (int i = 0; i < LENGTH_COUNT; ++i)
          for (int j = 0; j < WIDTH_COUNT; ++j)
                for (int k = 0; k < HEIGHT_COUNT; ++k)
                // etc for all 20 arrays
                    int value = doSomething(length[i], width[j], height[k].....);
}

There must be a less brain-dead way to do this. One idea I had is:

#define ARRAY_COUNT  20
#define MAX_LENGTH 12   // the longest array length is 12
int arrays[ARRAY_COUNT][MAX_LENGTH];

But if I were to do this, I don't know how to do the equivalent to what I am doing in the iterate function. Any ideas?

2012-04-05 01:17
by PaeneInsula

You are going to need to keep track of the size of each 'sub-array'. To make it easier, create a struct which has the maximum index of the sub-array, as well as a pointer to it - gbulmer 2012-04-05 01:26

So you need to calculate every combination of, upto 12 values (lets call it 10 for simplicity) from all 20 arrays? That is about 10^20 combinations of values? Imagine just counting from 1 to 10^20. Even if you could produce 1 value every nanosecond (10^-9 seconds) that's over 3,000 years. Are there some simplifying assumptions? What is the bigger problem 'cos I don't think this approach is the way to go - gbulmer 2012-04-05 01:35

To reduce the complexity, I omitted the parallelizing done here. Even parallelized, it would take a few years, but I am going to run it for a couple of days, and I will be able to toss out all the variables that don't have an impact on the results. So this will greatly reduce the runtime, to a few hours (acceptable). The objective: run thru all these possible combos, looking for a best fit to a certain other bit of info - PaeneInsula 2012-04-05 01:54

You would have to parallelize across 1000 machines to get this down to a few years. So are you saying there are no criteria (objective function) that could be used to reduce the search space dramatically? Is this an assignment? I ask because I used to set exercises like this where the insight needed was to test the usefulness of each type of value within the 'search space'. A trivial example was a word game called Bogle, which had most permutations of 16 letters. If all permutations were generated, it took years, but by constraining the search it took seconds on a 1988 IBM PC - gbulmer 2012-04-05 02:03

I see your point. The reason I wanted to do all the permutations is because while I have a hunch about which variables will be worthwhile, I would rather determine this empirically. Maybe I could constrain things by reducing all of the arrays to a length of 3, which should help me toss out all of the useless values, and get down to a reasonable number. (btw, this is an assignment...I'm in my 40's and self-employed, and my wife has assigned me the task of putting some bread on the table... - PaeneInsula 2012-04-05 02:41

Picking random combinations to test would be a better approach, if you can't reduce the search space as gbulmer suggests - jimw 2012-04-05 03:03

@gBulmer: how would I constrain the search space in this situation? The problem is that some of the values are useless by themselves, but become useful when other combos of variables are used. By running a first pass with all 20 arrays limited to 3 values, it would be about 5 billion computations. This is running on a 24 logical core machine with 192 gig, so each core would be handling a mere (!) 200 mm calcs - PaeneInsula 2012-04-05 03:16

@user994179 - sorry I've been away. Are you still interested in an alternative approach, or have you found one? Approximately how much computer power can you apply to solving the actual problem? We can try to work backwards from that number to derive useful strategies - gbulmer 2012-04-06 21:46

@user994179 - put another way, if you know how much compute-power you have, and the major costs (to generate one permutation, measure its value, and record that), then you can estimate the size of the search space that can be explored within the resource limits. I expect there is only enough resources to explore a limited number of dimensions. Further, some of the time will be spent identifying the most productive dimensions - gbulmer 2012-04-07 00:48

You could create (untested/uncompiled):

int *arrays[] = { first_arr, second_arr, ... }; // array of arrays
int maxes[] = { FIRST_ARR_MAX, SECOND_ARR_MAX, ... }; // array of array lengths
int counter[NUM_ARRAYS] = {0}; // initialize a counter to 0 for each of the arrays.

int state = 0;

while (true) {
    doSomething(first_arr[counter[0]], second_arr[counter[1]], ... );

    // Update the counter.
    int i;
    for (i = NUM_ARRAYS - 1; i >= 0; i--) {
        counter[i]++;
        // If incrementing the current counter didn't overflow, we're good, so we break.
        if (counter[i] < maxes[i]) {
            break;
        } else {
            // Overflow by setting the counter to 0.
            counter[i] = 0;
            // Now we will fall through and move to the next counter.
        }
    }
    // Check for all 0's intelligently.  State == 0 means counter[0] is 0.
    // If it's no longer 0, move to state 1.
    if (state == 0 && counter[0] > 0) {
        state = 1;
    } else (state == 1 && counter[0] == 0) {
        break;
    }
}

2012-04-05 01:33
by Words Like Jared

This seems to work, although I haven't tested it thoroughly. It's in python but hopefully it's clear how to use in your case. The important point is that it divides the problem up into easier parts to avoid the deeply-nested for loops.

test_arrays = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
def combine(partial, next_array):
    new_array = []
    for p in partial:
        if not isinstance(p, list):
            p = [p]
        for v in next_array:
            new_array.append(p + [v])
    return new_array

def combinations(arrays):
    base = arrays[0]
    for i in xrange(1, len(arrays)):
        base = combine(base, arrays[i])
    return base

print combinations(test_arrays)

But as others have said above, you almost certainly don't want to do this at all. A better approach would be to select random combinations and test/evaluate those. Various algorithms can be used to improve on naive random sampling, but which is appropriate depends on your application.

2012-04-05 02:04
by jimw

Python has many, many features that make it hard to efficiently transliterate to C. In this case, the .append method particularly - huon 2012-04-05 02:33

Indeed, but the python is really serving as pseudo-pseduo-code here - I hope the questioner won't just try to 'transliterate' the answer, but rather understand the reasoning behind the algorithm. Given that the approach itself is questionable, the principle seems to matter more than the practice - jimw 2012-04-05 02:35