I'm trying to generate an array of integers contains randoms adding up to a particular value. Here's my code:
private long[] getRandoms(long size , long sum) throws Exception {
double iniSum = 0;
System.out.println("sum = " + sum);
long[] ret = new long[(int) size];
for (int i = 0 ; i < ret.length; i++) {
ret[i] = randomInRange(1, sum);
iniSum += ret[i];
}
double finSum = 0;
for (int i = 0 ; i < ret.length; i++) {
ret[i] = Math.round((sum * ret[i]) / iniSum);
System.out.println("ret[" + i +"] = " + ret[i]);
finSum += ret[i];
}
if (finSum != sum) throw new Exception("Could not find " + size + " numbers adding up to " + sum + " . Final sum = " + finSum);
return ret;
}
private long randomInRange(long min , long max) {
Random rand = new Random();
long ret = rand.nextInt((int) (max - min + 1)) + min;
System.out.println("ret = " + ret);
return ret;
}
However, the results are not accurate, for instance:
Could not find 100 numbers adding up to 4194304 . Final sum = 4194305.0
I think I'm losing accuracy in this bit:
(sum * ret[i]) / iniSum
Can you recommend an alternative algorithm or a fix in my code which can help me achieve this objective?
double
for your sums vs long
? And you are aware, aren't you, that random behavior assures that the algorithm will fail most of the time - Hot Licks 2012-04-05 02:30
n
random numbers (in any range, as long as the distribution is uniform), summing them, and dividing them all by the sum is the standard way to generate n
random "fair" numbers from [0,1]
which sum to 1. It should be obvious that multiplying them all by another number k
will give you n
random "fair" numbers from [0,k]
- BlueRaja - Danny Pflughoeft 2012-04-06 16:08
Each time you scale a value with ret[i] = Math.round((sum * ret[i]) / iniSum)
, you will lose some precision, partly from the division operation itself, but mostly from storing the scaled value as an integer. The later situation is similar to proportional electoral systems where a small number of seats must be allocated in proprtion to a larger numbers of votes.
Two techniques for mitigating this:
First scale all the values in the list, but keep track of the difference between the ideal scaled value (a real number) and stored scaled value. Use truncation instead of rounding, so that the discrepency will always be positive. If there is a discrepency, you can increment some of the values in order of the difference between the ideal amount and the current stored amount.
long finSum = 0; // Might as well be a long
float[] idealValues = new float[size];
for (int i = 0 ; i < ret.length; i++) {
ideal[i] = (sum * ret[i]) / iniSum;
ret[i] = (long) (ideal[i]); // Truncate, not round
System.out.println("ret[" + i +"] = " + ret[i]);
finSum += ret[i];
}
/* There are iniSum - finSum extra amounts to add. */
for (int i = 0; i < iniSum - finSum; i++)
{
/* Pick a target slot to increment. This could be done by keeping a priority queue. */
int target = 0;
float bestScore = 0.0;
for (int j = 0; j < size; j++) {
float score = (ideal[j] - ret[j])/ret[j];
if (score > bestScore) {
target = j;
bestScore = score;
}
}
/* Allocate an additional value to the target. */
ret[target]++;
}
Or more simply, you could just set the last value in the list to whatever is outstanding after scaling doing all the others. That does statistically skew the output, however.
Just got an idea. Initialize the array to all 0. Randomly pick a number in the array and increase by 1 and decrease sum by 1 until sum is 0. It's not practical at all when sum is large:)
long ret = new long[size];
for(int i=0;i<size;i++) ret[i]=0;
for(int i=0;i<sum;i++) {
int n = random(0,size);
ret[n]++;
}
Yes, there is a standard way to do this which avoids floating-point inaccuracies. It is related to the method of counting the number of ways for n
numbers to sum to s
. It is not complicated, but oddly enough no one has mentioned it yet, so here goes:
Imagine we have a string with n-1
X's in it and s
o's. So, for example, for s=5, n=3, one example string would be
oXooXoo
Notice that the X's divide the o's into three distinct groupings: one of length 1, length 2, and length 2. This corresponds to the solution [1,2,2]. Every possible string gives us a different solution, by counting the number of o's grouped together (a 0 is possible: for example, XoooooX
would correspond to [0,5,0]).
So, all we need to do is imagine creating such a string by choosing random positions for the n-1
X's, then figure out what solution that corresponds to. Here is a more detailed breakdown:
n-1
unique random numbers between [1, s+n-1]
. Doing this is simple enough with rejection sampling - if a number is chosen twice, just drop it and pick another one.currentValue - previousValue - 1
.Finally, here is some example Java (untested) which should do this:
private List<long> getRandomSequenceWithSum(int numValues, long sum)
{
List<long> xPositions = new List<long>();
List<long> solution = new List<long>();
Random rng = new Random();
//Determine the positions for all the x's
while(xPositions.size() < numValues-1)
{
//See https://stackoverflow.com/a/2546186/238419
//for definition of nextLong
long nextPosition = nextLong(rng, sum+numValues-1) + 1; //Generates number from [1,sum+numValues-1]
if(!xPositions.contains(nextPosition))
xPositions.add(nextPosition);
}
//Add an X at the beginning and the end of the string, to make counting the o's simpler.
xPositions.add(0);
xPositions.add(sum+numValues);
Collections.sort(xPositions);
//Calculate the positions of all the o's
for(int i = 1; i < xPositions.size(); i++)
{
long oPosition = xPositions[i] - xPositions[i-1] - 1;
solution.add(oPosition);
}
return solution;
}
I believe your issue lies with Math.round
try modifying your code to use doubles and avoid any loss of precision
A good method is to work with a list of intervals which you then split at each step. Here's the pseudo-code
array intervals = [(0,M)]
while number intervals<desired_number
pick an split point x
find the interval containing x
split that interval at x
If you want to be really careful you need to check that your split point x is not an end point of an interval. You can do this by rejection, or you can pick the interval to split, then where to split that interval, but in that case you need to be careful about not introducing bias. (If you care about bias).
How about this:
long finSum = 0;
for (int i = 0 ; i < ret.length; i++) {
ret[i] = Math.round((sum * ret[i]) / iniSum);
System.out.println("ret[" + i +"] = " + ret[i]);
finSum += ret[i];
}
ret[0] += sum - finSum;
In other words, put all the rounding errors into the first element.
Suggestions:
for (int i = 0 ; i < ret.length; i++) {
ret[i] = randomInRange(1, sum);
iniSum += ret[i];
}
In the first for loop, instead of iterating from 0 to (ret.length-1), use iterations only from 0 to (ret.length-2). Keep the last element for adjustment.
Also dont call randomInRange(1, sum), use randomInRange(1, sum-iniSum) instead. This will reduce the normalization required.
In the end, the last output number will be (sum-iniSum). Thus, the 2nd for loop can be removed.
Take a number. Subtract a reasonable random number from it. Repeat until you have gotten enough numbers. Their sum is by definition fixed and is equal to the initial number. You make exactly n
operations where n
is the quantity of numbers you need; no guessing.
Here's Python code:
import random
def generate_randoms(initial, n):
# generates n random numbers that add up to initial.
result = [] # empty list
remainder = initial # we'll subtract random numbers and track the remainder
numbers_left = n
while numbers_left > 1:
# guarantee that each following step can subtract at least 1
upper_bound = remainder - numbers_left
next_number = random.randrange(1, upper_bound) # [1, upper_bound)
result.append(next_number)
remainder -= next_number # chop off
numbers_left -= 1
result.append(remainder) # we've cut n-1 numbers; don't forget what remains
return result
It works, but it has a downside: the farther are the numbers, the smaller they are. You might need to shuffle the resulting list to fight this, or play with smaller upper bounds.
This sure is an interesting question (upvote!), and I look forward to seeing some more creative solutions. One issue I have with the code you've listed is that you're using the Random
class, which only returns integers, not longs, but then you just cast it to a long. So if your function really needs longs, you'll have to find a different random number generator.
The "ideal" solution would be to generate all possible additive combinations (called partitions) and then pick one randomly. However, the best-known algorithms for doing this are very expensive indeed. There's lots of good source material out there on this problem. Here's a particularly good read on partitioning algorithms:
http://www.site.uottawa.ca/~ivan/F49-int-part.pdf
Part of the problem is figuring out what you mean by "random" in this instance. For example, if you're looking for an array of 5 numbers that sum to 1000, {1000,0,0,0,0} is just as valid as {200,200,200,200,200}, which is just as valid as {136,238,124,66,436}. An algorithm that has an equal chance of generating any of these sets would be truly random.
However, I'm guessing that you're not looking for a completely random distribution, but something in between.
Unfortunately, my Java is pretty rusty, and I'm doing most everything in C# these days, so I'll present in C#...it shouldn't take too much translation to convert these algorithms into Java.
Here's my take at the problem:
public int[] GetRandoms( int size, int sum, Random r=null ) {
if( r==null ) r = new Random();
var ret = new int[size];
while( sum > 0 ) {
var n = r.Next( 1, (int)Math.Ceiling((double)sum/size)+1 );
ret[r.Next(size)] += n;
sum -= n;
}
return ret;
}
(On a practical note, see how I'm passing in my random number generator? I default it to null
and just create a new one for convenience's sake, but now I can pass in a seeded random number generator, and generate the same results if I wish, which is great for testing purposes. Whenever you have a method that utilizes a random number generator, I highly recommend taking this approach.)
This algorithm basically keeps adding random numbers to random entries in the array until the sum has been reached. Because of its additive nature, it will favor larger numbers (i.e., it will be extremely unlikely that small numbers will appear in the result). However, the numbers will appear to be random, and they will sum appropriately.
If your target number is small (under 100 say), you can actually achieve the true random approach of generating all of the partitions and just picking one at random. For example, here's a partitioning algorithm (courtesy http://homepages.ed.ac.uk/jkellehe/partitions.php, translated into C#):
public List<int[]> GetPartitions( int n ) {
var result = new List<int[]>();
var a = new int[n+1];
var k = 1;
a[0] = 0;
var y = n - 1;
while( k != 0 ) {
var x = a[k-1] + 1;
k--;
while( 2*x <= y ) {
a[k] = x;
y -= x;
k++;
}
var l = k + 1;
while( x <= y ) {
a[k] = x;
a[l] = y;
result.Add( a.Take( k + 2 ).ToArray() );
x++;
y--;
}
a[k] = x + y;
y = x + y - 1;
result.Add( a.Take( k + 1 ).ToArray() );
}
return result;
}
(To aid you in the Java translation, a.Take(x)
means take the first x
elements out of array a
...I'm sure there's some equivalent magic in Java these days for doing that.)
This returns a list of partitions...just pick one at random, and you'll have a perfect random distribution.
-How about this?
private long[] getRandoms(int size , long sum) {
long[] array = new long[size];
long singleCell = sum / size;
Arrays.fill(array, singleCell);
array[size-1] += sum % singleCell;
int numberOfCicles = randomInRange(size, size*10);
for (int i = 0; i < numberOfCicles; i++) {
int randomIndex = randomInRange(0, size);
long randomNumToChange = randomInRange(0, array[randomIndex]);
array[randomIndex] -= randomNumToChange;
array[randomInRange(0, size)] += randomNumToChange;
}
return array;
}