How to find connections between users so I can create a closed friend circle?

Go To StackoverFlow.com

12

Hi all,

First of all, I'm not trying to create a social network, facebook is big enough! (comic) I've chosen this question as example because it fits exactly on what I'm trying to do.

Imagine that I have in MySQL a users table and a user_connections table with 'friend requests'. If so, it would be something like this:

Users Table:

userid  username
1       John
2       Amalia
3       Stewie
4       Stuart
5       Ron
6       Harry
7       Joseph
8       Tiago
9       Anselmo
10      Maria


User Connections Table:

userid_request  userid_accepted
2               3
7               2
3               4
7               8
5               6
4               5
8               9
4               7
9               10
6               1
10              7
1               2

Now I want to find circles between friends and create a structure array and put that circle on the database (none of the arrays can include the same friends that another has already).

Return Example:

    // First Circle of Friends
    Circleid => 1
    CircleStructure => Array(
        1 => 2,
        2 => 3,
        3 => 4,
        4 => 5,
        5 => 6,
        6 => 1,
    )
    // Second Circle of Friends
    Circleid => 2
    CircleStructure => Array(
        7 => 8,
        8 => 9,
        9 => 10,
        10 => 7,
    )

I'm trying to think of an algorithm to do that, but I think it will take a lot of processing time because it would randomly search the database until it 'closes' a circle.

PS: The minimum structure length of a circle is 3 connections and the limit is 100 (so the daemon doesn't search the entire database)

EDIT:

I've think on something like this:

function browse_user($userget='random',$users_history=array()){

    $user = user::get($userget);

    $users_history[] = $user['userid'];

    $connections = user::connection::getByUser($user['userid']);
    foreach($connections as $connection){
        $userid = ($connection['userid_request']!=$user['userid']) ? $connection['userid_request'] : $connection['userid_accepted'];

        // Start the circle array
        if(in_array($userid,$users_history)) return array($user['userid'] => $userid);

        $res = browse_user($userid, $users_history);

        if($res!==false){
            // Continue the circle array
            return $res + array($user['userid'] => $userid);
        }
    }

  return false;
}

while(true){

    $res = browse_user();

    // Yuppy, friend circle found!
    if($res!==false){
            user::circle::create($res);
    }

    // Start from scratch again!
}

The problem with this function is that it could search the entire database without finding the biggest circle, or the best match.

2012-04-05 19:25
by CuSS
The best way to do this is to load entire dataset into memory and then look for circles, unless you have too much data to fit. Look for algorithms for finding cycles in (directed?) graphs - Crack 2012-04-05 19:32
To add to what Crack said, it sounds like a directed graph, so this may help: http://stackoverflow.com/questions/261573/best-algorithm-for-detecting-cycles-in-a-directed-grap - sparrow 2012-04-05 19:33
The problem is that, for now, it will be a little only, but speculating i will have arround 10.000.000.000 records to connect. The problem on that algorythm is that it would run always the first 'users', so, on each running cycle, it would be processing the same 'users' first. I will edit post and write my algorithm - CuSS 2012-04-05 19:40
Maybe try explaining your problem instead of trying to explain on users - any assumptions that can be made about your dataset may allow for a simpler algorithm - Crack 2012-04-05 19:49
It's more complex even than 'users', i've created the algorithm that i was thinking. But it won't grant me success - CuSS 2012-04-05 20:03
And also it will overload my MySQL server (this daemon will run on different computers - CuSS 2012-04-05 20:09
What about creating a relationship table ?? then use the relationship table to generate your circle connection ... this would act like a filter by reducing the number to a manageable size .. not new users you instantly send request to a Job Queue to generate the relation and circl - Baba 2012-04-05 21:14
I suggest you explain your real problem, instead of trying to simplify it for us. Were smart enough to understand your problem (as long as you explain it well enough). Like people here said, any assumption made on your dataset can result in a simplified algorithm and resolution of your problem faster - Madara Uchiha 2012-04-10 21:02
Hi Truth, i can't explain my real problem because it is into patenting situation. I've tested all possibilities comparing this situation with the real one and they are almost the same. The answer i want will solve my real problem - CuSS 2012-04-10 21:09
Are friend requests reciprocal, e.g., 1->2 also implies 2->1? If not, I feel like the working dataset is just getting tinyier, rather than the permutative behemoth it could've been. This isn't too dissimilar from a 6 degrees of bacon app I did on a dataset of 1.7mil, but then again. thats not 10billion like yours is.. - hexparrot 2012-04-14 01:28
You could look into using a network-based database, like Neo4J [not affiliated! - Rafa 2012-04-16 12:59
This is nothing more than a hierarchical query which mySQL doesn't support. Oracle uses the connect by Prior syntax to traverse a tree and find a complete listing used in combination with the no-cycle parameter, it will die once it finds a circle and move on to the next circle. MySQL might have ways around this but nothing I've been subject to - xQbert 2012-04-16 23:34
@CuSS, regarding "patent situation", you need to tell your employer that graph traversal falls under prior art whichever way you look at it :) - questions remain: is the graph directed? and: are there ways to simplify the problem, based on what you say about resetting the connections when a cycle is found? I'm afraid I haven't voted your question up yet, because until some of that information is clear, answers continue to feel like second guessing - boisvert 2012-04-17 09:24


9

Alfred Godoy's answer is very useful, in that processing should be done incrementally. I'll try to flesh out some specific detail.

Rather than "people" being "friends" I'll talk about "nodes" connected by "edges". It's not true that cycles grow from 3 to 4 (or n to n+1) nodes. Several edges, not forming a cycle, could be closed by a new edge and form a new cycle.

So instead (or as well) as a list of cycles, we should maintain a list of edge chains. e.g. assuming this Connection table:

userid_request  userid_accepted
2               7
3               7
3               8

Then the Chain table should contain this:

chainid  start  end  length
1        2      3    2
1        2      8    3
2        7      8    2

This structure lets you record and retrieve chains of any length, and given two ends of a chain, follow it through using id and length. It takes space... But that's the memory cost of searching for cycles in a graph for you. It can be simplified, depending on exactly what you want to do; give details.

By the way, I assume that the graph is undirected - in other words, an edge from a node to the other goes both ways. In the connection, the node with the lowest id will always be the "request" and the highest id at the "accepted" end.

When a new node is added, you want to maintain the chain table and look whether the new node closes the chain cycles. It involves a couple of queries. I give you one.

Suppose a new entry is added in the Connection table:

userid_request  userid_accepted
@new_request    @new_accepted

You can use a query to select any such cycle. You already know about the new link and its two nodes so you're only looking for a chain that closes it:

select chainid, length
from chain
where (start = @new_request and end = @new_accepted)
or (end = @new_request and start = @new_accepted)

(note that because the graph is not directed, you have to look for two cycle configurations).

To maintain the chain table, each time an edge is added:

  • look for existing chains prolonged by the node
  • look for new chains of length 2.

And then when nodes are removed, you should take out corresponding chains and cycles - the chain table will be big enough as it is.

2012-04-13 08:17
by boisvert
Hmm, yes you are of course right. Circles does not expand only by 1 - Alfred Godoy 2012-04-14 02:07
I changed my answer. Thanks - Alfred Godoy 2012-04-14 03:11
But it took your answer before I could get it - boisvert 2012-04-14 06:34
I made a proof-of-concept PHP class of your way to solve this problem: https://github.com/alfreddatakillen/Six-Degrees-of-Separatio - Alfred Godoy 2012-04-17 20:18
Can't vote up your answer more than I have! Good one. The problem remains memory cost: with 10 billion nodes, that's a lot of chain - boisvert 2012-04-17 23:02
My problem is that on my program, each "user" can be only into one "circle", unless i delete the "circle". I'm trying to understand the logic of your answer but i didn't get it - CuSS 2012-04-24 19:49
needs drawings - we'll have to wait until the guys at SO add post-it notes :) - more to the point, as others have pointed out to you, your problem keeps changing . You have to do some of this yourself since you can't tell the full story - boisvert 2012-04-25 08:52


11

Doing this kind of operations on large datasets is almost always a (too) big job for a single batch. When working with large amounts of data, you should build indexes continuously. Make a circle test at every time an "user" is added or become "friend" with another "user", and store the resulting circles in an index table. Then when a new "user" signs up or becomes "friend" with another "user", you should use your index database to find new circles based on the old ones.

Edit:

I got a quite enthusiastic on this problem, so I made a proof-of-concept class of boisvert's ideas on how to solve this. I put the code on GitHub at: https://github.com/alfreddatakillen/Six-Degrees-of-Separation (it would be a bit messy to post it here).

It seems to work pretty well. However, I have not tested it with huge amounts of data. I'm pretty sure that you will have to optimize this, since it basicly is a brute-force attack storing every step on the way... (If you do optimize or use the code, I'd be happy if you push changes to GitHub - I might want to use it in a future project myself.)

:-)

2012-04-05 23:04
by Alfred Godoy
Before anything, thanks for your reply. The problem is that these 'circles' must be aprooved, and once aprooved by all users, the circle will disappear, as like their 'connections'. (So, each user will accept the circle, and i must show to them a percentage of accepting rate). But i like the idea to try to reach a circle during connection or queue it on database so the daemon can search for a circle. I will vote the answer up! : - CuSS 2012-04-05 23:47
The approval stuff does not change anything in how the circles are built. That's just another column in the circles database table. And the removal stuff should as simple as just removing rows from your database tables - Alfred Godoy 2012-04-11 12:59
Wait... you remove a circle once it's found? Do you remove the users in question - Nathaniel Ford 2012-04-11 20:31
@CuSS: from your remarks, it sounds like you're not looking for "circles" but rather "groups" where the order in which people are connected doesn't really matter? In case the answers haven't made that clear already, you're asking for a solution to a difficult problem, but by changing the problem, even slightly, it can be made a lot simpler. But to find that, we need clearer information.. - boisvert 2012-04-13 11:24
I'm a little confused now because i'm without glasses and tired of programming during hours. But i will download your code, and trying to understand the logic. If it applies as a solution for my problem, i will put the code better and send to you again. Add me on facebook plz http://www.facebook.com/cuss.hh - CuSS 2012-04-24 19:43


1

Are you trying to do something like Six Degrees of Seperation

This at first looks as a NP-hard problem, because in order to compute each circle you must traverse every node in the graph to find all possible circles out there. I believe that there is no easy way to do this, and most likely there is probably no way to do this real time.

So you should probably perform such computations in jobs, caching the results for efficiency.

Off the top of my head, if we consider this problem as graph with each user as a node and each relationship as an edge with weight of 1, we can use breadth-first search to find from user 1 all paths with maximum weight of 6. On each node traversal (up to total weigh of six) we search if the starting node and the current node share an edge. If so we close the circle. It will start of with a circle of 2 and expand onwards. If we reach weight 6 and the final node does not share an edge with the starting node we discard the circle.

Probably in a real case scenario to speed things up, you could try and calculate the circles up to a smaller total weight (say 3) for each user and then try to join the circles to get up to 6.

2012-04-14 08:29
by mobius


0

Just an idea for the creating of circles- This function would load every time the user attempts to view the circles:

  1. Find the user (#1) with the most friends who is not in any circles yet.
  2. Loop all his friends looking for one (#2) who has most friends and is not in any circles yet
  3. Loop all the new found friend's friends the same way until loop ends, limit 20?
  4. check if last user found (#20) is friends with 1st user
  5. if yes, circle complete
  6. if no, check if any of (#19's) friends are friends with 1st user..
  7. if no, check if any of (#18's) friends are friends with 1st user..
  8. continue loop users backwards until someone is friends with 1st user
  9. circle complete

Odd way of attempting this, I know

2012-04-12 19:54
by squarephoenix


0

I think for this type of JOB you should relay on NOSQL database which are good at what you're looking for probably a graph database like Neo4j.

2012-04-17 16:28
by deej
Ads