I am new to OpenCL and I am writing an RSA factoring application. Ideally the application should work across both NV and AMD GPU targets, but I am not finding an easy way to determine the total number of cores/stream procs on each GPU.
Is there an easy way to determine how many total cores/stream procs there are on any hardware platform, and then spawn a factoring thread on each available core? The target RSA modulus would be in shared memory, and with each factoring thread using a Rho factoring attack against the modulus.
Also, any idea if OpenCL support multi-precision math libraries similar to GNU MP, to store large semi prime numbers?
Thanks in advance
On the GPU, you don't spawn one thread for each core, like you would on a CPU. Instead, you want to start many more threads than there are cores. I wouldn't worry about the exact number of cores available on a given target platform. Instead, focus no what fits best for your problem.
To add to Roger's answer, the reason why you would want to have many more threads than cores is because GPUs implement very efficient context switching to hide memory latency. Generally, each memory access is a very expensive operation in terms of the amount of time it takes for a processor to receive the requested data. But if a thread is waiting on a memory transaction, it can be "paused" and another thread can be activated to do computations (or other memory accesses) in the meantime. So if you have enough threads, you can essentially hide the memory access latency, and your software can run at the full computational capacity of the hardware (which would rarely happen otherwise).
I would have put this in a comment to Roger's post, but its size is beyond the limit.