c++ - Personal SSE library -


ok, i've been using operator overloading of sse/avx intrinsics facilitate usage in more trivial situations vector processing useful. class definition looks this:

#define float16a float __attribute__((__aligned__(16)))  class sse {     private:          __m128 vec  __attribute__((__aligned__(16)));          float16a *temp;      public:  //=================================================================          sse();         sse(float *value);  //=================================================================          void operator + (float *param);         void operator - (float *param);         void operator * (float *param);         void operator / (float *param);         void operator % (float *param);          void operator ^ (int number);         void operator = (float *param);          void operator == (float *param);         void operator += (float *param);         void operator -= (float *param);         void operator *= (float *param);         void operator /= (float *param); }; 

with each individual function bearing resemblance to:

void sse::operator + (float *param) {     vec = _mm_add_ps(vec, _mm_load_ps(param));     _mm_store_ps(temp, vec); } 

thus far have had few problems writing code have run few performance problems, when using when compared farly trivial scalar code sse/avx code has significant performance bump. know type of code can difficult profile, i'm not sure the bottleneck is. if there pointers can thrown @ me appreciated.

note person project i'm writing further own knowledge of sse/avx, replacing external library not of help.

it seem me amount of overhead introducing overwhelm speed gain through use of sse operations.

without looking @ assembly produced can't happening, here 2 possible forms of overhead.

calling function (unless inlined) involves call , ret, , push , pop etc.. create stack frame.

you're calling _mm_store_ps each operation, if chain more 1 operation you're paying cost of more times necessary.

also, isn't clear code if problem, make sure temp valid pointer.

hope helps somewhat. luck.


follow comment.

not sure if c++ or not, please educate me if isn't, here's i'd propose given limited knowledge. i'd interested if other people have better suggestions.

use believe called "conversion operator", since you're return isn't single float , instead 4 floats need add type.

typedef struct float_data {   float data[4]; };  class sse {   ...   float_data floatdata;   ...   operator float_data&();   ... };  sse::operator float_data&() {   _mm_store_ps(floatdata.data, vec);   return &float_data; } 

Comments

Popular posts from this blog

java - JavaFX 2 slider labelFormatter not being used -

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -