according to the official docs there are two options to create parallel collections:
1)
// There's a little bug here, doesn't matter for the sake of the question
import scala.collection.parallel.mutable.ParArray
val pv = new ParVector[Int]
2)
val pv = Vector(1,2,3,4,5,6,7,8,9).par
Now, what are the differences? Does exist any performance penalty when I convert it from a simple sequential collection?
What would you do if you've to create a bit parallel collection (say, several thousand elements), would you create it from scratch or convert it?
Thank you guys!
EDIT:
As @oxbow_lakes says there's a piece of docs that focus on this topic, but i'm trying to get "experienced advices". I mean, what would YOU do if you have to read a big collection from a DB, for instance.
Depends on the collection. Vector
is basically free, ParVector
is just a wrapper around the vector. Same for Arrays
. Others, e.g. List
, will have to be completely copied in a different structure, more amenable to parallelism. And then copied back to a new list if you want your result to be a List too.
You may have a look at this brand new guide on the scala documentation site, section Creating a parallel collection.
The official documentation for the par
method says:
For most collection types, this method creates a new parallel collection by copying all the elements. For these collection, par takes linear time [...]
Specific collections (e.g.
ParArray
ormutable.ParHashMap
) override this default behaviour by creating a parallel collection which shares the same underlying dataset. For these collections, par takes constant or sublinear time.
That is, in general the operation in O(n), except when using the mutable collections ParArray
and ParHashMap
, where it is less that O(n) - but possibly not constant time.
List
needs copying, true, but Vector
does not, for example - Daniel C. Sobral 2012-04-04 18:45