I'm trying to write some code to convert an array of a native C++ type into an appropriately sized vector-type defined by the OpenCL standard.
Endian-ness and packing are OpenCL implementation specific. The OpenCL types do not provide a convenient operator[]. (actually the API is C ) Another issue: cl_int4
has a .s3
member, but cl_int2
does not.
I have something that functionally works, but you can see that I've wandered off into template crazy land.
Can this be done in a better way? These functions will not be called often, so better would be a combination of reduced program binary size and less lengthy source code.
Here's what I've got so far. I'm not showing you all dimensional specializations ( omitting 3-6), and I would also like to implement for at least the integer type too.
#include <CL/cl.h>
template < typename HOST_T, int NUM_DIM >
struct Payload_t;
// Vector length needs to be (for dims 1-6): 2, 4, 8, 8, 16, 16
//single precision
template < >
struct __attribute__((packed)) Payload_t <float, 1> {
cl_float2 vec;
void setElement( int pos, float value )
{
switch (pos) {
case 0: vec.s0 = value; return;
case 1: vec.s1 = value; return;
default: return;
}
}
};
template < >
struct __attribute__((packed)) Payload_t <float, 2> {
cl_float4 vec;
void setElement( int pos, float value )
{
switch (pos) {
case 0: vec.s0 = value; return;
case 1: vec.s1 = value; return;
case 2: vec.s2 = value; return;
case 3: vec.s3 = value; return;
default: return;
}
}
};
/// double-precision
template < >
struct __attribute__((packed)) Payload_t <double, 1> {
cl_double2 vec;
void setElement( int pos, double value )
{
switch (pos) {
case 0: vec.s0 = value; return;
case 1: vec.s1 = value; return;
default: return;
}
}
};
template < >
struct __attribute__((packed)) Payload_t <double, 2> {
cl_double4 vec;
void setElement( int pos, double value )
{
switch (pos) {
case 0: vec.s0 = value; return;
case 1: vec.s1 = value; return;
case 2: vec.s2 = value; return;
case 3: vec.s3 = value; return;
default: return;
}
}
};
I guess you might be curious how I would use this class. In one example, I have a class templated on type REAL that has an instance of the following member class, which therein has an instance of the Payload_t
.
template <int NUM_DIM >
struct cartesian_box_descriptor_t : cartesian_box_descriptor_base_t
{
static const int vectorLengthArray[6];
void set_dx( REAL * dx_vec )
{
for (int i = 0; i < NUM_DIM; ++i)
payload.setElement( i, dx_vec[i] );
};
void set_startx( REAL * startx_vec )
{
for (int i = 0; i < NUM_DIM; ++i)
payload.setElement( NUM_DIM + i , startx_vec[i] );
};
virtual WxAny getDescriptorStruct() const
{
return WxAny( payload ); // packages this simple structure as 'scalar' with hidden type
};
Payload_t< REAL, NUM_DIM> payload;
};
The getDescriptorStruct()
packages the OpenCL supported type in a way that I can send to the OpenCL API as a kernel argument with all the bytes falling in the right place.
If anyone is thinking about a paradigm shift, I will only ever need to set the entire vector at once.
I'm not sure whether to be proud or ashamed of this, but it works. You do need to make sure all calls to set() use the exact right types. It handles new cl_ types automatically and new sizes of cl_ types with changes in only 3 places. It could probably be cleaned up further, if anyone felt so inclined.
#include<iostream>
#include<assert.h>
struct cl_float1 {
float s0;
};
struct cl_float2 {
float s0;
float s1;
};
#define ZERO_THROUGH_15(pre) \
pre i0; \
pre i1; \
pre i2; \
pre i3; \
pre i4; \
pre i5; \
pre i6; \
pre i7; \
pre i8; \
pre i9; \
pre i10; \
pre i11; \
pre i12; \
pre i13; \
pre i14; \
pre i15
template<typename SIMD, typename POD>
struct offset {
static POD SIMD::* data[16];
ZERO_THROUGH_15(static bool);
offset() {
ZERO_THROUGH_15();
}
};
template<typename SIMD, typename POD>
/*static*/ POD SIMD::* offset<SIMD,POD>::data[16];
template<int n>
struct offsetGetter {
template<typename SIMD, typename POD>
static POD SIMD::* get(...) {
return NULL;
}
};
#define GET_OFFSET(n) \
template<> \
struct offsetGetter<n> { \
template<typename SIMD, typename POD, POD SIMD::* OFS> \
struct check {}; \
\
template<typename SIMD, typename POD> \
static POD SIMD::* get(check<SIMD, POD, &SIMD::s ## n>*) { \
return &SIMD::s ## n; \
} \
\
template<typename SIMD, typename POD> \
static POD SIMD::* get(...) { \
return NULL; \
} \
template<typename SIMD, typename POD> \
static bool init() { \
offset<SIMD,POD>::data[n] = get<SIMD,POD>(NULL); \
}; \
}; \
template<typename SIMD, typename POD> \
/*static*/ bool offset<SIMD,POD>::i##n = offsetGetter<n>::init<SIMD,POD>()
GET_OFFSET(0);
GET_OFFSET(1);
GET_OFFSET(2);
GET_OFFSET(3);
GET_OFFSET(4);
GET_OFFSET(5);
GET_OFFSET(6);
GET_OFFSET(7);
GET_OFFSET(8);
GET_OFFSET(9);
GET_OFFSET(10);
GET_OFFSET(11);
GET_OFFSET(12);
GET_OFFSET(13);
GET_OFFSET(14);
GET_OFFSET(15);
template<typename SIMD, typename POD>
void set(SIMD& simd, int n, POD val) {
offset<SIMD,POD> ignoreme;
POD SIMD::* ofs = offset<SIMD,POD>::data[n];
assert(ofs);
simd.*ofs = val;
}
main(){
cl_float2 x;
set(x, 0, 42.0f);
std::cout << x.s0 << std::endl; // prints 42
set(x, 1, 52.0f);
std::cout << x.s1 << std::endl; // prints 52
cl_float1 y;
set(y, 1, 42.0f); // assertion failure
}
assert(ofs)
line even before any set()
line in main()
.
I'm also getting 80 warnings, , I think all related to the static bool init()
function doesn't look to return. Can you comment on those two issues - NoahR 2013-01-10 23:43