Selecting a random row that hasnt been selected much before?

Go To StackoverFlow.com

0

Lets just put it at its simplest, a table with two fields: 'item_id' & 'times_seen'.

| item_id | times_seen |
----------+-------------
|   1001  |     48     |
|   1002  |     25     |
|   1003  |      1     |
|   1004  |     12     |
|   1005  |     96     |
|   1006  |     35     |

I'm trying to find a way to randomly select a row, but give preference to items that haven't been selected much before.

(obviously, a second query would be sent to increment the 'times-seen' field after it has been selected)

Although my current "project" is a php/mysql one, I'd like language agnostic solutions if possible. I'd much rather have a math based solution that could be adapted elsewhere. I'm not opposed to a php solution though. I'd just like to be able to understand how the code works rather than just copy and paste it.

2012-04-05 22:30
by Curtis Cook
What do you mean by "give preference to?" Would it be okay if you never generated the same item twice until you had generated everything else? Or should it always be possible to generate an item - templatetypedef 2012-04-05 22:31
it should always possible to generate any ite - Curtis Cook 2012-04-05 22:36


2

How about a SQL solution:

select * from item order by times_seen + Rand()*100 limit 1;

How much you multiply random with (Its a value between 0 and 1) depends on how much randomness you want..

Edit: http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand

2012-04-05 22:43
by barsju
+1 Thought about something similar but couln't make it to the end. This is just simple and brilliant! :- - Basti 2012-04-05 22:48
I understand that a different random number is added to each field for this to work, but does mysql actually do that, or does it add the same random number to all the fields - Curtis Cook 2012-04-05 23:00
Different random number.. Otherwise it would not work.. And it does - barsju 2012-04-05 23:02
Even though it's a SQL solution, the math seems strong. I quickly mocked a an excel spreadsheet, and ran through a few hundred thousand 'queries' and it curved right as expected - Curtis Cook 2012-04-05 23:13
On a side note, it seems that the rand multiplier should be the difference between the max and min times_seen values, as it produces what appears to be an even distribution type randomness - Curtis Cook 2012-04-05 23:16
"it should always possible to generate any item" This won't be the case if the difference between min and max is greater than the multiplier for Rand() - starblue 2012-04-06 07:53


2

  1. Fetch all the rows in the table
  2. Determine the max value for times_seen
  3. Assign each row a weight of max - times_seen
  4. Pick from list based on weights

Step 4 is the tricky part, but you could do it all like this:

$max = 1;
$rows = array();

$result = mysql_query("SELECT * FROM table");
while ($row = mysql_fetch_array($result)){
    $max = max($max, $row['times_seen']);
    $rows[] = $row;
}

$pick_list = array();
foreach ($rows as $row){
    $count = $max - $row['times_seen'];
    for ($i=0; $i<$count; $i++) $pick_list[] = $row['item_id'];
}
shuffle($pick_list);
$item_id = array_pop($item_id);

To do it all in SQL:

SELECT * 
FROM table 
ORDER BY RAND( ) * ( MAX( times_seen ) - times_seen ) DESC
LIMIT 1

This selects a single row with weightings inversely proportional to the times_seen

2012-04-05 22:36
by Cal
Ads