Hi all,
First of all, I'm not trying to create a social network, facebook is big enough! (comic) I've chosen this question as example because it fits exactly on what I'm trying to do.
Imagine that I have in MySQL a users
table and a user_connections
table with 'friend requests'. If so, it would be something like this:
Users Table:
userid username
1 John
2 Amalia
3 Stewie
4 Stuart
5 Ron
6 Harry
7 Joseph
8 Tiago
9 Anselmo
10 Maria
User Connections Table:
userid_request userid_accepted
2 3
7 2
3 4
7 8
5 6
4 5
8 9
4 7
9 10
6 1
10 7
1 2
Now I want to find circles between friends and create a structure array and put that circle on the database (none of the arrays can include the same friends that another has already).
Return Example:
// First Circle of Friends
Circleid => 1
CircleStructure => Array(
1 => 2,
2 => 3,
3 => 4,
4 => 5,
5 => 6,
6 => 1,
)
// Second Circle of Friends
Circleid => 2
CircleStructure => Array(
7 => 8,
8 => 9,
9 => 10,
10 => 7,
)
I'm trying to think of an algorithm to do that, but I think it will take a lot of processing time because it would randomly search the database until it 'closes' a circle.
PS: The minimum structure length of a circle is 3 connections and the limit is 100 (so the daemon doesn't search the entire database)
EDIT:
I've think on something like this:
function browse_user($userget='random',$users_history=array()){
$user = user::get($userget);
$users_history[] = $user['userid'];
$connections = user::connection::getByUser($user['userid']);
foreach($connections as $connection){
$userid = ($connection['userid_request']!=$user['userid']) ? $connection['userid_request'] : $connection['userid_accepted'];
// Start the circle array
if(in_array($userid,$users_history)) return array($user['userid'] => $userid);
$res = browse_user($userid, $users_history);
if($res!==false){
// Continue the circle array
return $res + array($user['userid'] => $userid);
}
}
return false;
}
while(true){
$res = browse_user();
// Yuppy, friend circle found!
if($res!==false){
user::circle::create($res);
}
// Start from scratch again!
}
The problem with this function is that it could search the entire database without finding the biggest circle, or the best match.
relationship table
?? then use the relationship table
to generate your circle connection
... this would act like a filter
by reducing the number to a manageable size .. not new users you instantly send request to a Job Queue to generate the relation and circl - Baba 2012-04-05 21:14
Truth
, i can't explain my real problem because it is into patenting situation. I've tested all possibilities comparing this situation with the real one and they are almost the same. The answer i want will solve my real problem - CuSS 2012-04-10 21:09
Alfred Godoy's answer is very useful, in that processing should be done incrementally. I'll try to flesh out some specific detail.
Rather than "people" being "friends" I'll talk about "nodes" connected by "edges". It's not true that cycles grow from 3 to 4 (or n to n+1) nodes. Several edges, not forming a cycle, could be closed by a new edge and form a new cycle.
So instead (or as well) as a list of cycles, we should maintain a list of edge chains. e.g. assuming this Connection table:
userid_request userid_accepted
2 7
3 7
3 8
Then the Chain table should contain this:
chainid start end length
1 2 3 2
1 2 8 3
2 7 8 2
This structure lets you record and retrieve chains of any length, and given two ends of a chain, follow it through using id and length. It takes space... But that's the memory cost of searching for cycles in a graph for you. It can be simplified, depending on exactly what you want to do; give details.
By the way, I assume that the graph is undirected - in other words, an edge from a node to the other goes both ways. In the connection, the node with the lowest id will always be the "request" and the highest id at the "accepted" end.
When a new node is added, you want to maintain the chain table and look whether the new node closes the chain cycles. It involves a couple of queries. I give you one.
Suppose a new entry is added in the Connection table:
userid_request userid_accepted
@new_request @new_accepted
You can use a query to select any such cycle. You already know about the new link and its two nodes so you're only looking for a chain that closes it:
select chainid, length
from chain
where (start = @new_request and end = @new_accepted)
or (end = @new_request and start = @new_accepted)
(note that because the graph is not directed, you have to look for two cycle configurations).
To maintain the chain table, each time an edge is added:
And then when nodes are removed, you should take out corresponding chains and cycles - the chain table will be big enough as it is.
Doing this kind of operations on large datasets is almost always a (too) big job for a single batch. When working with large amounts of data, you should build indexes continuously. Make a circle test at every time an "user" is added or become "friend" with another "user", and store the resulting circles in an index table. Then when a new "user" signs up or becomes "friend" with another "user", you should use your index database to find new circles based on the old ones.
Edit:
I got a quite enthusiastic on this problem, so I made a proof-of-concept class of boisvert's ideas on how to solve this. I put the code on GitHub at: https://github.com/alfreddatakillen/Six-Degrees-of-Separation (it would be a bit messy to post it here).
It seems to work pretty well. However, I have not tested it with huge amounts of data. I'm pretty sure that you will have to optimize this, since it basicly is a brute-force attack storing every step on the way... (If you do optimize or use the code, I'd be happy if you push changes to GitHub - I might want to use it in a future project myself.)
:-)
Are you trying to do something like Six Degrees of Seperation
This at first looks as a NP-hard problem, because in order to compute each circle you must traverse every node in the graph to find all possible circles out there. I believe that there is no easy way to do this, and most likely there is probably no way to do this real time.
So you should probably perform such computations in jobs, caching the results for efficiency.
Off the top of my head, if we consider this problem as graph with each user as a node and each relationship as an edge with weight of 1, we can use breadth-first search to find from user 1 all paths with maximum weight of 6. On each node traversal (up to total weigh of six) we search if the starting node and the current node share an edge. If so we close the circle. It will start of with a circle of 2 and expand onwards. If we reach weight 6 and the final node does not share an edge with the starting node we discard the circle.
Probably in a real case scenario to speed things up, you could try and calculate the circles up to a smaller total weight (say 3) for each user and then try to join the circles to get up to 6.
Just an idea for the creating of circles- This function would load every time the user attempts to view the circles:
Odd way of attempting this, I know
I think for this type of JOB you should relay on NOSQL database which are good at what you're looking for probably a graph database like Neo4j.