What's the cost of converting a sequential collection into a parallel one, against creating it from scratch - 【StackMirror】|scala|collections|parallel-processing|parallel-collections

according to the official docs there are two options to create parallel collections:

// There's a little bug here, doesn't matter for the sake of the question
import scala.collection.parallel.mutable.ParArray
val pv = new ParVector[Int]

val pv = Vector(1,2,3,4,5,6,7,8,9).par

Now, what are the differences? Does exist any performance penalty when I convert it from a simple sequential collection?

What would you do if you've to create a bit parallel collection (say, several thousand elements), would you create it from scratch or convert it?

Thank you guys!

EDIT:

As @oxbow_lakes says there's a piece of docs that focus on this topic, but i'm trying to get "experienced advices". I mean, what would YOU do if you have to read a big collection from a DB, for instance.

2012-04-04 03:41
by santiagobasulto

Depends on the collection. Vector is basically free, ParVector is just a wrapper around the vector. Same for Arrays. Others, e.g. List, will have to be completely copied in a different structure, more amenable to parallelism. And then copied back to a new list if you want your result to be a List too.

You may have a look at this brand new guide on the scala documentation site, section Creating a parallel collection.

2012-04-04 07:26
by Didier Dupont

The official documentation for the par method says:

For most collection types, this method creates a new parallel collection by copying all the elements. For these collection, par takes linear time [...]

Specific collections (e.g. ParArray or mutable.ParHashMap) override this default behaviour by creating a parallel collection which shares the same underlying dataset. For these collections, par takes constant or sublinear time.

That is, in general the operation in O(n), except when using the mutable collections ParArray and ParHashMap, where it is less that O(n) - but possibly not constant time.

2012-04-04 07:28
by oxbow_lakes

+1 Thanks @oxbow_lakes I've read that, but I was looking for some advice given experience. For example, what would YOU do if you've to create a big collection (say, for example, reading it from a DB) - santiagobasulto 2012-04-04 10:12

I'm not sure if you are claiming only mutable collections enjoy this benefit or not, but that is certainly not true. List needs copying, true, but Vector does not, for example - Daniel C. Sobral 2012-04-04 18:45