Taskpool system for .NET - 【StackMirror】|c#|.net|threadpool|task

I am currently writing a bulk processing algorithm for pitch detection in audio being streamed from disk. I have tightened up my algorithm such that it runs in nearly real-time for serially streamed data.

Ideally I would like the system to be working way faster than real-time such that I could hand it real-time data and after a not huge delay be generating the pitch track data.

Now the thing that strikes me is that the serial processing of the data is where I could provide a hell of a lot of speedup. I'm running on a quad core i7 (with 8 hardware threads) so I ought to be able to improve the speed significantly by spreading processing across multiple blocks.

As it goes I currently do the following:

Stream data off disk
Buffer data until I have the window size that I wish to analyse.
Process the window of data.
Copy the data back n-samples (where n is the amount I want to slide (this can be as low as 1ms back in an 80ms window!)
rinse and repeat.

Now it strikes me that once I have a window I could easily copy that data into a given thread work buffer (as well as providing a memory location the result will be written to). This way I could, effectively buffer up to 7 (Leave thread 8 open to pump the data) threads worth of data that a thread pool would then process.

When I try to submit the 8th window of audio I want the pool to block until a thread is available to process the data and so on. The idea being that I would keep 7 threads constantly working processing the data. From previous experience I would expect to see about 5x speed up from doing this.

In the past I have written under C++ my own task based system that would do the job perfectly but this app is being developed under C#. To get good parallelism with low overhead under C++ I spent significant amounts of time building a good lockless queueing mechanism.

I was rather hoping, under C#, that someone would have taken the pain out of doing this for me. However I can't find anything that would seem to work. I've looked at System.Threading.ThreadPool and it appears to have no way of checking how many threads are currently in action. Not to mention that the overheads seems prohibitive. The big problem then arises that I can't re-use an existing pre-allocated structure (which is important in my processing) forcing me to re-create it each time I submit a work item. This has the huge disadvantage that I then generate work quicker than I can process it so not only do I end up wasting tonnes of time setting up structure and workspaces that I really ought not to need but my memory usage spirals out of control.

I then found out about System.Threading.Tasks but that too doesn't seem to offer the functionality I'm after.

I guess I could just use my C++ Task Manager via interop but I really assumed that in this day and age someone would already have set up something similar. So am I missing something? Or can anyone provide me with a link to such a task management engine?

2012-04-04 20:06
by Goz

I'm not sure I follow. Why do you need to copy your structures when creating a "work item"? Why not just pass a reference to it (that is - put it in a class and pass the instance) - zmbq 2012-04-04 20:12

I don't see anything more needed than a simple single producer/multiple consumer algorithm - Hans Passant 2012-04-04 20:14

@zmbq: The problem is that I require a set of LARGE scratch pad areas to do my processing. These scratchpads must, necessarily, be pre-setup as they aren't all that quick to setup. What I would have done with my C++ system is pre-setup 8 such structures and then re-use them on whatever data is currently ready to be processed. The other problem is these scratch pads are pretty big so I can't just arbitrarily create 50 of them (50 would require about 0.5GB of memory, for example) - Goz 2012-04-04 20:21

@Hans Passant: Thats exactly what I want ... I just really don't want to implement it myself. I'm having serious problems finding a mechanism where i can, simply, block the producer thread until a consumer becomes available .. - Goz 2012-04-04 20:24

possible duplicate of Creating a blocking Queue<T> in .NET?Hans Passant 2012-04-04 20:37

@HansPassant: TBh the more I look at the blocking queue implementation the more I wonder why its so unnecessarily complicated. My old C++ task manager had various tasks derived from "BaseTask" and you had an "Add and wait until thread free" function ... I'm really considering just using it via C++/CLI because it seems so much easier to use than anything else I'm finding ... I was just trying to avoid NIH syndrome : - Goz 2012-04-04 20:42

Everything looks simple if all you see is a library and not how it is implemented. You can cover your eyes and use .NET 4's BlockingCollection class - Hans Passant 2012-04-04 20:48

@HansPassant: Very true ... So far tbh just using the TaskFactory looks to be the easiest plan ... I'm implementing it now and will see how well it works - Goz 2012-04-04 20:50

Wrong focus, it doesn't do anything to help you implement a producer/consumers algorithm. And your consumer doesn't need a Task, it's just a thread that does only one thing. And a Task by default isn't a good consumer that does lots of work repeatedly, not without TaskCreationOptions.LongRunning. Don't fall for the simplest library function. Do the queue - Hans Passant 2012-04-04 21:22

@HansPassant: You may be right ... as it is the TaskFactory doubles my speed. I'll see what the Blocking Queue implementation gives me - Goz 2012-04-04 21:54

Task Parallel Library was designed and implemented especially for tasks you are trying to solve! Also you can pipeline this process as well.

So you have to ensure:

You can run on .NET Framework 4.X
You are familar with Pipelines design pattern
You are familar with .NET Framework 4 BlockingCollecition class (BlockingQueue implementation especially to simplify solving of the Producer Consumer problem)

2012-04-04 20:13
by sll

I rather hoped the TPL would do what I want it to do .. but so far I've not had much luck figuring it out ... For example how do I ask an item to be queued but to block the queueing until a thread is available - Goz 2012-04-04 20:23

@Goz - that is under the BlockingCollecition link - Henk Holterman 2012-04-04 20:27

@HenkHolterman: Any chance you can point me towards an example - Goz 2012-04-04 20:28

There is an example under the Pipelines link - Henk Holterman 2012-04-04 20:31

Well, as always in these cases, I recommend using ZeroMQ . It'll allow you to control the numbers of consumers quite easily.

As for your scratch-pad areas, first, 0.5GB is not a lot of memory this day and age. I think my phone has more RAM than that, let alone my desktop... If you want to go real easy on memory consumption, just create one scratch-pad area per thread, put all of them in a pool and have the producer get on scratch-pad area before queuing a task, attaching that area of the task. When the consumer is done, return the scratch-pad area back to the pool.

2012-04-04 20:33
by zmbq

I dunno about you but half a gig when I only need about 80 meg seems like a horrendous waste of memory ... Alas I wish more programmers thought like I do : - Goz 2012-04-04 20:36

It's not a waste of memory - it'll be there after your program is done... Seriously, there's no point optimizing your memory footprint unless it's a real problem. Is it a real problem - zmbq 2012-04-04 20:40

Yes it is ... plainly .. wasting any resources unnecessarily is bad ... but lets not get into an argument over why pointless excessive memory usage is a bad thing - Goz 2012-04-04 20:46

It's a memory/time trade-off. In this case, the more memory you use, the less time you have to spend coding and debugging. From then on it depends on your own priorities - zmbq 2012-04-04 20:48

But its not is it? I don't need 50 items queued? It doesn't help my processing time and it doesn't help my memory usage? Where is the trade off - Goz 2012-04-04 20:50

I would use the Task Parallel dataflow library here. Its designed to allow creation of process blocks that can be chained together with explicit control over degree of parallelism and blocking semantics.

2012-11-29 12:42
by DanH