Offloading coordinate transformations to GPU - 【StackMirror】|c#|opencl|gpu|gpgpu|coordinate-transformation

I have a legacy map viewer application using WinForms. It is sloooooow. (The speed used to be acceptable, but Google Maps, Google Earth came along and users got spoiled. Now I am permitted to make if faster :)

After doing all the obvious speed improvements (caching, parallel execution, not drawing what does not need to be drawn, etc), my profiler shows me that the real choking point is the coordinate transformations when converting points from map-space to screen-space. Normally a conversion code looks like this:

    public Point MapToScreen(PointF input)
    {
        // Note that North is negative!
        var result = new Point(
           (int)((input.X - this.currentView.X) * this.Scale),
           (int)((input.Y - this.currentView.Y) * this.Scale));
        return result;
    }

The real implementation is trickier. Latitudes/longitues are represented as integers. To avoid loosing precision, they are multiplied up by 2^20 (~ 1 million). This is how a coordinate is represented.

public struct Position
{
    public const int PrecisionCompensationPower = 20;
    public const int PrecisionCompensationScale = 1048576; // 2^20
    public readonly int LatitudeInt; // North is negative!
    public readonly int LongitudeInt;
}

It is important that the possible scale factors are also explicitly bound to power of 2. This allows us to replace the multiplication with a bitshift. So the real algorithm looks like this:

    public Point MapToScreen(Position input)
    {
        Point result = new Point();
        result.X = (input.LongitudeInt - this.UpperLeftPosition.LongitudeInt) >>
                     (Position.PrecisionCompensationPower - this.ZoomLevel);
        result.Y = (input.LatitudeInt - this.UpperLeftPosition.LatitudeInt) >> 
                     (Position.PrecisionCompensationPower - this.ZoomLevel);
        return result;
    }

(UpperLeftPosition representents the upper-left corner of the screen in the map space.) I am thinking now of offloading this calculation to the GPU. Can anyone show me an example how to do that?

We use .NET4.0, but the code should preferably run on Windows XP, too. Furthermore, libraries under GPL we cannot use.

2012-04-04 16:35
by user256890

Now one year later the problem arose again, and we found a very banal answer. I feel a bit stupid not realizing it earlier. We draw the geographic elements to bitmap via ordinary WinForms GDI. GDI is hardware accelerated. All we have to do is NOT to do the transformation by ourselves but set the scale parameters of System.Drawing.Graphics object: Graphics.TranslateTransform(...) and Graphics.ScaleTransform(...) We do not even need the trick with the bit shifting.

2013-06-14 08:27
by user256890

I suggest you look at using OpenCL and Cloo to do this - take a look at the vector add example and then change this to map the values by using two ComputeBuffers (one for each of LatitudeInt and LongitudeInt in each point) to 2 output ComputeBuffers. I suspect the OpenCL code would looks something like this:

__kernel void CoordTrans(__global int *lat, 
                         __global int *lon, 
                         __constant int ulpLat,
                         __constant int ulpLon,
                         __constant int zl,
                         __global int *outx,
                         __global int *outy)
{
    int i = get_global_id(0);        
    const int pcp = 20;

    outx[i] = (lon[i] - ulpLon) >> (pcp - zl);
    outy[i] = (lat[i] - ulpLat) >> (pcp - zl);
}

but you would do more than one coord-transform per core. I need to rush off, I recommend you read up on opencl before doing this.

Also, if the number of coords is reasonable (<100,000/1,000,000) the non-gpu based solution will likely be faster.

2012-04-04 17:01
by Callum Rogers

I'm coming from a CUDA background, and can only speak for NVIDIA GPUs, but here goes.

The problem with doing this on a GPU is your operation/transfer time.

You have on the order of 1 operation to perform per element. You'd really want to do more than this per element to get a real speed improvement. The bandwidth between global memory and the threads on a GPU is around 100GB/s. So, if you have to load one 4 Byte integer to do one FLOP, you theoretical maximum speed is 100/4 = 25 FLOPS. This is far from the hundreds of FLOPS advertised.

Note this is the theoretical maximum, the real result might be worse. And this is even worse if you're loading more than one element. In your case, it looks like 2, so you might get a maximum of 12.5 FLOPS from it. In practice, it will almost certainly be lower.

If this sounds ok to you though, then go for it!

2012-04-05 01:35
by P O'Conbhui

+1 for showing the theoretical boundaries - user256890 2012-04-05 09:27

Just to put the numbers in perspectives, what is the approximate speed of an average 2 core CPU in FLOPs - user256890 2012-04-05 09:29

It depends on what you call a FLOP. Let's say your 2 core CPU has a clock speed of 2 GHz, and a FLOP takes 4 clock cycles. You could do 2*2/4 = 1 GFLOP. That's a very crude estimate - P O'Conbhui 2012-04-06 11:18

XNA can be used to do all the transformations you require and gives very good performance. It can also be displayed inside a winforms application: http://create.msdn.com/en-US/education/catalog/sample/winforms_series_1

2012-04-05 01:46
by bouvierr