Vectorizing list with zeros

Vectorizing list with zeros

Colleagues,

I watched the recent Intel webinare on parallelization/vectorization and found extremely useful. I've been working through the K's of lines of code I'm responsble for, changing things when necessary to effect vectorization. One kind of simple task I've not been able to vectorize:  packing an integer list to remove zeros. What I have now won't do:

n=0

do m=1,Num

if ( List(m) == 0 ) cycle

n=n+1

NewList(n) =List(m)

end do

I have not been able to think of a better algorithm or procedure. Can anyone suggest something that could take advantage of vectorization?

7 帖子 / 0 new

Please see the section "PACK Function" in the Intel Fortran documentation. The compiler option /assume:realloc_lhs may be of use, as well.

For illustration:

```program tst
integer, parameter :: m=10
integer n
integer :: a(10)=[0,1,0,2,0,3,0,0,4,5]
integer, allocatable :: b(:)
!
n=count(a /= 0)
allocate(b(n))
b=pack(a,a /= 0)
write(*,*)'n=',n
write(*,*)b
end program tst```

I've tried the pack function and abandoned it long ago. It is slower (by far!) than even the simple loop in my original post.

My integer lists are usually a few K elements long.

PACK works well in my tests for target -mmic, but not so much, as David said, for Xeon host.  If someone is interested enough to post an issue report with an example to premier.intel.com, the question could be posed whether it might be made more efficient.

How often do 0's appear in the list?

If infrequent, then consider finding the zones of non-zero and using A[out:outEnd] = A[in:inEnd] in a loop. I will leave this a exercize for you to figure out the positions of the zones, and don't forget you may need a wipe of the residual data. For this to be effective, the copy zones would have to be relatively large (or zeros infrequent).

David, you also have a bug in your first example. You do not null out the place of the variable being copied. IOW your end array may end up with duplicate entries and interspersed residual 0's.

Jim Dempsey

Thanks Jim,

The difficult, low-efficiency part of the task, appears to be recording the locations of the zeros that can be vectorized. Copying the data is easily vectorized.  I can, for example, use the count function on the original list to determine the number of zeros. It is vectorized (so reports the compiler) and the timings (a peak at the assembler generated) supports that. In the loop in my example, having to maintain/update "n" as a counter of the current number of non-zero values and the location in the compacted list, prevents the process of being piplined; vectorized. Apparently.

I don't actually need a compacted list in most cases: I could use a linked-list that points from one non-zero location to the next. Then I wouldn't need another list to hold the compacted data. But I have not been able to devise a way to generate such a linked-list that can be piplined, viectorized. I wished I hadn't lent (and lost!) my copies of Knuth's Art of Computer Programming.  Might be something in there.

David

Why is it you need to squish the 0's out?

Can you not leave them in place, use the 0's in the vectorized computation sections, and then ignore results for those locations with 0's?

Jim Dempsey