4

This is a thought exercise, not a particular problem, but I'd like to hear your opinion. Suppose I have some matrix expression DSL using templates (Eigen, ublas, etc.).

Now suppose I have some matrices which are constants, for example:

```
Matrix2 sigma1 = {{0,1}, {1,0}};
Matrix2 sigma2 = {{0,i}, {-i,0}};
... etc
```

... and I have some operations with those matrices involving runtime values:

```
a*sigma1 + b*sigma2; // a,b runtime
```

What ideas do you have to implement constant matrices such that compiler can optimize the expressions the most? In particular, how do I resolve `(i,j)`

operator to a constant?

Just to make it a real-life scenario, transformation matrices are exactly that, so, e.g., rotation by some given angle or scaling by some given factor could be optimized away - TC1 2012-04-06 23:41

Exactly as @TC1 pointed out. I think the key here is determining the closed form of the expression under evaluation. There are excellent resources on this topic if you set to looking. : - MrGomez 2012-04-07 00:17

You can use expression templates, but you should notice that some (most?) compilers aren't able to optimize multiplication by 0 or by 1 when dealing with floating points. That's because they are still supposed to work if

`a`

or `b`

from your code are `Nan`

or `inf`

. Gcc has an option (-ffinite-math-only) which (I think) can solve this issue - sbabbi 2012-04-12 00:35
@sdabbi ffast-math discards some IEEE requirements. My goal is to see if there is a way to optimize those muls through template means someho - Anycorn 2012-04-12 00:43

@dpiskyulev yes, I had expression templates but the question is how. It's a bit tricky : - Anycorn 2012-04-12 00:44

since the c++ standard doesnt allow the use of float/double as non-type template arguments, you could perhaps find a way to use int template arguments. To use floating point values in static initialization, have an additional integer template argument for the fractional part (or as a 2 part rational number). Likely to be very complicated but it ought to be possible. Of course supplying a float via this method won't be feasible so it's not a neat solution - Pete 2012-04-12 12:14

4

From what I understand of the problem space: given a domain-specific language, we want to determine the minimal transformation such that some operator over the data (for example, `(i,j)`

), results in a lookup of a constant instead of the computation of mathematical formulae (for example, `a*sigma1 + b*sigma2`

).

Let's explore some of the possibilities here:

*Performing the mathematical operations directly*The 0th-level implementation. Given no explicit optimization by your compiler, what happens if we execute instructions directly?

The answer is it depends. But, on most processors, your code execution will fall onto the CPU cache, where your assembly and branch executions will be optimized to the best of your cores' abilities. The guts of that process are actually really cool, but we'll assume you want to move beyond these capabilities and address the constantification of your operation directly in your code.

*Bound and capture the space using a compiler-compiler*The first-order optimization would be to bound and capture the possible input and output space using a compiler-compiler. While this will address your input range effectively, limiting it only to the set of inputs and outputs that you desire, it does

*nothing*for performance otherwise. So, we must move on.*Stringification and Macro Expansion*The second-order optimization would be to perform string or macro expansion of your value space directly. While this is fraught with corner-cases and surprising implementation-level tar pits, it can be done directly by your compiler if the operation is desired. (See also: loop unwinding)

*Manual derivation and stack-bound satisfiability of the closed-form expression*

(using, for example, a lookup table)Finally, our third-order optimization would be to bound your space directly. This requires you to have a well-defined closed-form relation with a bounded input and output space to work effectively. If this relation cannot be determined or carries no bounds, you're out of luck and should consider keeping your current implementation if a better one is not known to exist.

Of these optimization techniques, the most applicable to linear algebraic operations are the latter two, given the problem bounds that you've described. Because most operations, such as a matrix translation, rotation, and scale operation are deterministic in nature, you can indeed optimize and bound your space effectively.

For a more theoretical answer, I recommend consulting http://cs.stackexchange.com, http://cstheory.stackexchange.com, and http://math.stackexchange.com. Both have many, many threads devoted to the decidability and proof of closed-form, polynomial solutions for entire classes of equations.

2

Ok, well this is hideous, but related to my comment on the original post.

Using this structure it should be possible to define the relevant operations you need, but that will be a lot of work to write all the appropriate specializations. You may also prefer your rows/columns to be swapped.

Defining the matrices at the end certainly isn't as elegant as in your original post, but perhaps this could be improved, especially with the use of 'auto' in C++11.

```
//-----------------------------------------------------------------------------
struct Unused {};
struct Imaginary {
Imaginary() {}
Imaginary(Unused const& unused) {}
};
struct MinusImaginary {
MinusImaginary() {}
MinusImaginary(Unused const& unused) {}
};
//-----------------------------------------------------------------------------
template <int I, int F = 0>
struct Fixed {
Fixed() {}
Fixed(Unused const& unused) {}
};
//-----------------------------------------------------------------------------
struct Float
{
Float(float value) : value_(value) {}
const float value_;
};
//-----------------------------------------------------------------------------
template <typename COL0, typename COL1>
struct Vector2
{
typedef COL0 col0_t;
typedef COL1 col1_t;
template <typename T0, typename T1>
Vector2(T0 const& t0, T1 const& t1)
: col0_(t0)
, col1_(t1)
{}
COL0 col0_;
COL1 col1_;
};
//-----------------------------------------------------------------------------
template <typename ROW0, typename ROW1>
struct Matrix2
{
typedef ROW0 row0_t;
typedef ROW1 row1_t;
Matrix2()
: row0_(Unused(), Unused())
, row1_(Unused(), Unused())
{
}
template <typename M00, typename M01, typename M10, typename M11>
Matrix2(M00 const& m00, M01 const& m01, M10 const& m10, M11 const& m11)
: row0_(m00, m01)
, row1_(m10, m11)
{
}
ROW0 row0_;
ROW1 row1_;
};
//-----------------------------------------------------------------------------
Matrix2<
Vector2< Fixed<0>, Fixed<1> >,
Vector2< Fixed<1>, Fixed<0> >
> sigma1;
const float f = 0.1f;
//-----------------------------------------------------------------------------
Matrix2<
Vector2< Fixed<0>, Imaginary >,
Vector2< MinusImaginary, Fixed<0> >
> sigma2;
//-----------------------------------------------------------------------------
Matrix2<
Vector2< Fixed<0>, Float >,
Vector2< Float, Fixed<0> >
> m3(Unused(), 0.2f,
0.8f, Unused());
// EDIT: Nicer initialization syntax in c++11
//-----------------------------------------------------------------------------
template <typename M00, typename M01, typename M10, typename M11>
Matrix2< Vector2<M00, M01>, Vector2<M10, M11> >
MakeMatrix(M00 const& m00, M01 const& m01, M10 const& m10, M11 const& m11)
{
return Matrix2< Vector2<M00, M01>, Vector2<M10, M11> >(m00,m01,m10,m11);
}
auto m4 = MakeMatrix(Fixed<0>(), Float(0.2f),
Float(0.8f), Fixed<0>() );
```