c++ - SSE and AVX intrinsics mixture -


in addition sse-copy, avx-copy , std::copy performance. suppose need vectorize loop in following manner: 1) vectorize first loop-batch (which multiple 8) via avx. 2) split loop's remainder 2 batches. vectorize batch multiple of 4 via sse. 3) process residual batch of entire loop via serial routine. let's consider example of copying arrays:

#include <immintrin.h>  template<int length,          int unroll_bound_avx = length & (~7),          int unroll_tail_avx  = length - unroll_bound_avx,          int unroll_bound_sse = unroll_tail_avx & (~3),          int unroll_tail_last = unroll_tail_avx - unroll_bound_sse> void simd_copy(float *src, float *dest) {     auto src_  = src;     auto dest_ = dest;      //vectorize first part of loop via avx     for(; src_!=src+unroll_bound_avx; src_+=8, dest_+=8)     {          __m256 buffer = _mm256_load_ps(src_);          _mm256_store_ps(dest_, buffer);     }      //vectorize remainder part of loop via sse     for(; src_!=src+unroll_bound_sse+unroll_bound_avx; src_+=4, dest_+=4)     {         __m128 buffer = _mm_load_ps(src_);         _mm_store_ps(dest_, buffer);     }      //process residual elements     for(; src_!=src+length; ++src_, ++dest_)         *dest_ = *src_; }  int main() {       const int sz = 15;     float *src = (float *)_mm_malloc(sz*sizeof(float), 16);     float *dest = (float *)_mm_malloc(sz*sizeof(float), 16);     float a=0;     std::generate(src, src+sz, [&](){return ++a;});      simd_copy<sz>(src, dest);      _mm_free(src);     _mm_free(dest); } 

is correct use both sse , avx? need avoid avx-sse transitions?

you can mix sse , avx intrinsics want.

the thing want make sure specify correct compiler flag enable avx.

  • gcc: -mavx
  • visual studio: /arch:avx

failing either result in code not compiling (gcc), or in case of visual studio,
kind of crap:

what flag forces simd instructions use vex encoding avoid state-switching penalties described in question above.


Comments

Popular posts from this blog

java - JavaFX 2 slider labelFormatter not being used -

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -