c - Is vectorization profitable in this case? -
i broke kernel down several loops, in order vectorize each 1 of them afterwards. 1 of loops looks like:
int *array1; //its size "size+1"; int *array2; //its size "size+1"; //all positions of array1 , array2 set 0 here; int *sarray1 = array1+1; //shift 1 position start writing on pos 1 int *sarray2 = array2+1; //shift 1 position start writing on pos 1 int bb = 0; for(int i=0; i<size; i++){ if(a[i] + bb > b[i]){ bb = 1; sarray1[i] = s; sarray2[i] = 1; } else bb = 0; }
please note loop-carried dependency, in bb
- each comparison depends upon bb
's value, modified on previous iteration.
what thought about:
- i can absolutely of cases. example, when
a[i]
greaterb[i]
, not need know valuebb
carries previous iteration; - when
a[i]
equalsb[i]
, need know valuebb
carries previous iteration. however, need account case when happens in 2 consecutive positions; when started shape these cases, seemed these becomes overly complicated , vectorization doesn't pay off.
essentially, i'd know if can vectorized in effective manner or if better run without vectorization whatsoever.
you might not want iterate on single elements, have loop on chunks (where chunk defined elements within yielding same bb
).
the search chunk boundraries vectorized (by hand using compiler specific simd intrinics probably). , action taken single chunk of bb=1 vectorized, too. loop transformation follows:
size_t i_chunk_start = 0, i_chunk_end; int bb_chunk = a[0] > b[0] ? 1 : 0; while (i_chunk_start < isize) { if(bb_chunk) { /* find end of current chunk */ (i_chunk_end = i_chunk_start + 1; i_chunk_end < isize; ++i_chunk_end) { if(a[i_chunk_end] < b[i_chunk_end]) { break; } } /* process current chunk */ for(size_t = i_chunk_start; < i_chunk_end; ++i) { sarray1[i] = s; sarray2[i] = 1; } bb_chunk = 0; } else { /* find end of current chunk */ (i_chunk_end = i_chunk_start + 1; i_chunk_end < isize; ++i_chunk_end) { if(a[i_chunk_end] > b[i_chunk_end]) { break; } } bb_chunk = 1; } /* prepare next chunk */ i_chunk_start = i_chunk_end; }
now, each of inner loops (all loops) potentially vectorized.
whether or not vectorization in manner superior non-vectorization depends on whether chunks have sufficient length in average. find out benchmarking.
Comments
Post a Comment