Is there any way to emit attachment data in a couchdb view

Go To StackoverFlow.com

12

I have found it very useful to use CouchDB attachments when image displaying data on a website. However when I replicate the database to a mobile environment, it's very inefficient to run a view and then have to cycle through the documents to get access to their attachments. On the iOS/Android platform it seems a lot more efficient to store the data as regular BLOBS and have access to all binary data with just one view query, the view query that emits all the document data in the first place. Is there a way to read attachment DATA in my map function, and include it in the emit statement. I see that there is attachment information available via _attachments, but this does not give access to the data.

Update A major drawback (not detailed in the accepted answer) to using BLOBS in in the document itself rather than attachments is that when you update a document you have to GET the entire document and then POST it back. If you aren't using attachments you have to get all that binary data, with attachments you don't. If you will be performing updates on your documents, using attachments is really the only reasonable way to design for binary data.

2012-04-05 22:12
by deepwinter


14

You've hit upon a great question!

What is an attachment?

Is it binary data? Well, you can Base64 encode data and store it directly in the document (similar to a data URI). And you can of course have text or application/json attachments. So it's not that.

Is it direct downloads? Not really. Show and list functions let you serve any part of the document directly (or build new content based on document data).

So what is an attachment? To me, a working definition of attachments is, data which is not accessible in views, shows, and lists. It is an optimization. You lose access to it from server-side Javascript; however you gain speed because CouchDB doesn't have to encode and decode a huge amount of data.

Sometimes I also think about it like C pointers. Working with pointers is very fast. They are a small, simple data type. However, the cost is extra programming work, because they must be dereferenced to get to the data. You always have an extra step. That is the cost of the speed. CouchDB attachments are similar.

If your data is small (maybe favicons, vcards, text) and fits in the entire document, go for it! Don't attach them. However for larger data such as most images and other files, attachments become necessary.

Multiple fetches

Suppose you query a view and get 20 rows for display on the screen. Now you must fetch 20 image attachments.

Programmers instinctively find this undesirable. And, yes, it might be a deal-breaker. However in many cases it's a fine trade-off. Donald Knuth says, "premature optimization is the root of all evil." Will it kill us to make 21 total fetches from a local server, backed by SSD?

In many cases, it is just fine to make 20 queries. The key is to make them all at the same time. CouchDB (and mobile couchbase) is optimized for concurrent requests. You will find that fetching 20 images takes basically the same time as fetching one.

Fetching an HTTP resource concurrently works differently in every language. In Javascript, it is very easy, requiring a few lines of code with async.js.

// Assuming fetch_image(), and images = ['alice.png', 'bob.png',  etc.]
async.forEach(images, fetch_image, function(er) {
  if(er) throw er
  console.log('Fetched 20 images!')
})
2012-04-06 00:08
by JasonSmith
Thanks, that's the notion I've been getting as I've been learning. One problem, however, that arises in the mobile environment is that if I need to query 100 records, and I want all 100 thumbnails, it results in needing to query couchdb or couchbasemobile 101 times (1 for the initial view query, and 100 attachment queries) if those thumbs are attachments, which has significant performance implications. If they are just blobs, it's only 1 query, and fast. Could I use show or list to get around this - deepwinter 2012-04-06 00:18
Yes, that is a fundamental trade-off. If they are attachments, you can not use show or list to get around it. You must make multiple queries. However that is not so bad. I will update the answer with an idea - JasonSmith 2012-04-06 00:59
Ads