Querying MongoDB Subset Two Levels Deep Using PHP Driver

Go To StackoverFlow.com

1

I've accessed the Facebook Graph API to get a JSON object representing the latest posts on my feed (my Facebook wall). I then saved it into a MongoDB collection called feeds using the PHP Mongo Driver.

//$post['feed']['data'] contains the Facebook JSON object of wall posts
//create a mongo instance
$mongo = new Mongo();
//access the feeds collection
$feeds = $mongo->changeup->feeds;
//dump the feed right into mongo
$feeds->insert($post['feed']['data']);

This is what one of the arrays looks like after reading back the whole object that was placed into mongo.

I'm only showing you one, but it gives me several more, each indexed, the next one is [1] => Array() and so on... some are structured differently, as some contain the [story] field, others contain the [message] field, and some contain both.

Query:
$cursor = $feeds->find();

foreach ( $cursor as $feed ) { 
print_r($feed);
}

Result:
[0] => Array
        (
            [id] => 505212695_10150696450097696
            [from] => Array
                (
                    [name] => John Doe
                    [id] => 505212695
                )

            [story] => "Text of a story I posted on my wall..."
            [story_tags] => Array
                (
                    [38] => Array
                        (
                            [0] => Array
                                (
                                    [id] => 15212444
                                    [name] => John Doe
                                    [offset] => 38
                                    [length] => 10
                                    [type] => user
                                )

                        )

                )

            [type] => status
            [application] => Array
                (
                    [name] => Share_bookmarklet
                    [id] => 5085647995
                )

            [created_time] => 2012-04-04T05:51:21+0000
            [updated_time] => 2012-04-04T05:51:21+0000
            [comments] => Array
                (
                    [count] => 0
                )

)

The problem is that I don't want to just find the entire collection, I want to find only those arrays that have say [message] and [story] fields, and then just find their contents and nothing else.

I'm trying to receive a subset, two levels deep:

//this works, however, I'm only able to get the 0 array 
$cursor = $feeds->find( array(), array('0.story' => true) );

How do I filter by all arrays?

I want my end result to look like this:

Array
(
    [_id] => MongoId Object
        (
            [$id] => 4f7db4dd6434e64959000000
        )

    [0] => Array
        (
            [story] => "Text of a story I posted on my wall..."
        )
    [1] => Array
        (
            [story] => "Text of a story I posted on my wall..."
        )
    [2] => Array 
        (
            [story] => "Text of a story I posted on my wall..."
            [message] => "In this case message text exists as well..."
        )
    [3] => Array
        (
            [message] => "Text of a message I posted on my wall..."
        )

    etc...
)
2012-04-05 15:24
by Allen


2

I believe the initial issue starts with your data structure for each feed document. Notice that your object is simply an id, and then an incrementing amount of number keys, and thats it. What would be ideal is that you insert an actual object structure, with keys and values, at the top level. Currently, because you directly dumped the facebook data straight into mongo without formatting it, the driver mapped your array to key/value. Now each feed doc has variable amount of anonymous objects.

Refer to this: http://www.php.net/manual/en/mongo.writes.php

What I would think your feed doc should look like might be this:

{ 
    "_id" : ObjectId("4f7db4dd6434e64959000000"), 
    "posts" : 
    [
        {
            "story" : "Text of a story I posted on my wall...",
            "message" : "In this case message text exists as well...",
        },
        {
            "story" : "Text of a story I posted on my wall...",
            "message" : "In this case message text exists as well...",
        }
    ],
    "posts_meta1": "some val",
    "posts_meta2": "other data"
}

Notice that it contains a "posts" top level key, with your array of post objects underneath. This fixes multiple issues. You have a top level key to index with, instead of "number", you have a cleaner root level for adding more feed fields, and you can cleanly achieve your find query.

A simple find might look like this:

// Return all feed docs, and only include the posts.story field
db.feeds.find({}, {"posts.story": 1})

A more advanced query might look like this:

// Return an feed document that either contains a posts.story
// field, or, contains a posts.message field
db.feeds.find({
    $or: [ 
        {$exists: {"posts.story": true}}, 
        {$exists: {"posts.message": true} 
    ]
})

In a nutshell, your data returned from facebook should be formatted first into an object structure, and then inserted into mongo. For instance, dates should be inserted as proper date objects as opposed to raw strings: http://www.php.net/manual/en/class.mongodate.php. This allows you to then do date-based queries in mongo, and the php driver will also make sure to convert them back and forth so that they are more native to your language.

2012-04-05 17:06
by jdi
Thank you very much, works - Allen 2012-04-05 23:24


1

Without seeing the JSON data sent from Facebook, it's hard to tell what the structure should look like in the story_tags field. You may need to decode the JSON coming from Facebook and force json_decode to convert into a PHP associative array:

$ar = json_decode($post['feed']['data'], true);

The 'true' flag here forces it to handle the data as an associative array.

You'd then insert as follows:

$feeds->insert($ar);

Either way, I'd be inclined to restructure the data to something that suits your needs better before storing it in the database - this will allow you to use indexes more effectively among other things. If you really need to store the whole response from Facebook, you could always store it as a nested object:

$ar['raw'] = $post['feed']['data'];
2012-04-05 18:15
by Mick Sear
Ads