c++ - Personal SSE library -
ok, i've been using operator overloading of sse/avx intrinsics facilitate usage in more trivial situations vector processing useful. class definition looks this:
#define float16a float __attribute__((__aligned__(16))) class sse { private: __m128 vec __attribute__((__aligned__(16))); float16a *temp; public: //================================================================= sse(); sse(float *value); //================================================================= void operator + (float *param); void operator - (float *param); void operator * (float *param); void operator / (float *param); void operator % (float *param); void operator ^ (int number); void operator = (float *param); void operator == (float *param); void operator += (float *param); void operator -= (float *param); void operator *= (float *param); void operator /= (float *param); };
with each individual function bearing resemblance to:
void sse::operator + (float *param) { vec = _mm_add_ps(vec, _mm_load_ps(param)); _mm_store_ps(temp, vec); }
thus far have had few problems writing code have run few performance problems, when using when compared farly trivial scalar code sse/avx code has significant performance bump. know type of code can difficult profile, i'm not sure the bottleneck is. if there pointers can thrown @ me appreciated.
note person project i'm writing further own knowledge of sse/avx, replacing external library not of help.
it seem me amount of overhead introducing overwhelm speed gain through use of sse operations.
without looking @ assembly produced can't happening, here 2 possible forms of overhead.
calling function (unless inlined) involves call
, ret
, , push
, pop
etc.. create stack frame.
you're calling _mm_store_ps
each operation, if chain more 1 operation you're paying cost of more times necessary.
also, isn't clear code if problem, make sure temp
valid pointer.
hope helps somewhat. luck.
follow comment.
not sure if c++ or not, please educate me if isn't, here's i'd propose given limited knowledge. i'd interested if other people have better suggestions.
use believe called "conversion operator", since you're return isn't single float , instead 4 floats need add type.
typedef struct float_data { float data[4]; }; class sse { ... float_data floatdata; ... operator float_data&(); ... }; sse::operator float_data&() { _mm_store_ps(floatdata.data, vec); return &float_data; }
Comments
Post a Comment