# Efficient ways to count setted bits in bytes/words?

## Efficient ways to count setted bits in bytes/words?

I read the IA-32/64 developer's handbook, so far there isnt anything even at the machine code level, that can count the total numbers of setted bits for a given bytes or word without using loops, am I right about this?

10 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

>>I read the IA-32/64 developer's handbook, so far there isnt anything even at the machine code level, that can count the total
>>numbers of setted bits for a given bytes or word without using loops, am I right about this?

If you need a very fast algorithm to count the total number of bits which are set to 1 in a byte or in a word two look-up tables have to be used:

- 256 elemets for bytes
- 65536 elements for words ( I assume that sizeof( word ) is 2 bytes )

You can try using popcnt instruction, if supported by your target CPU.

Quote:

Sergey Kostrov wrote:

>>I read the IA-32/64 developer's handbook, so far there isnt anything even at the machine code level, that can count the total
>>numbers of setted bits for a given bytes or word without using loops, am I right about this?

If you need a very fast algorithm to count the total number of bits which are set to 1 in a byte or in a word two look-up tables have to be used:

- 256 elemets for bytes
- 65536 elements for words ( I assume that sizeof( word ) is 2 bytes )

Very interesting way to do bit-wise operation, I will definitly give it a try, btw, have you benchmark this algorthim against traditional loop test?

>>...Very interesting way to do bit-wise operation, I will definitly give it a try, btw, have you benchmark this algorthim against
>>traditional loop test?

No. It will be faster because only a couple of CPU instructions will be needed ( after translation a C code to assembler ). Here is an example:
...
__int8 iLUTNumOfBits[ 256 ] =
{
/*
0, 1, 2, 3, 4, 5, 6, ... , 254, 255
*/
0, 1, 2, 3, 1, 2, 3, ... , 7, 8
};
...
byte byVariable = 7;
...
int iNumOfBits = iLUTNumOfBits[ byVariable ]; // ( 7 - 1 ) = 6 ( our index value ) -> 3 ( bits ) from LUT -> 00000111 = 0x07
...
Note: Take into account that just one Look-Up Table could be used.

>>...You can try using popcnt instruction, if supported by your target CPU...

There are two intrinsic functions if you don't want to use assembler:

POPCNT: int _mm_popcnt_u32( unsigned int a );
POPCNT: int64_t _mm_popcnt_u64( unsigned __int64 a );

Also, for bytes XLAT/XLATB Table Look-up Translation instruction.could be used and it is supported on all Intel CPUs.

If you do this for words, you might also use a Look-up Table for one byte with 256 entries.

Then you go in the table with every byte of the word and sum the values up.

I am not sure, but in my opinion it might be faster than a larger table. The small table might be kept in the changes very near to the CPU. And selecting bytes of a word is a simple AND operation.

But this is only an assumption.

>>...Then you go in the table with every byte of the word and sum the values up.

This is actually a good idea and you need to decide which way of counting is the best for your task.

Some more ways: en.wikipedia.org/wiki/Hamming_weight

It is easy to modify code shown there for 8,16,32 bit variables in a traditional c language and for 128, 256 bits using sse/avx registers/operations.

Which way (_mm_popcnt_uXX, byte/word-table lookup, simple arithmetics, arithmetics with multiple, etc) is faster (and, in general, is available) depends on your application and target system(s).

>>Which way (_mm_popcnt_uXX, byte/word-table lookup, simple arithmetics, arithmetics with multiple, etc) is faster (and, in general,
>>is available) depends on your application and target system(s).

This is absolutely right point of view. I regret to see that XLAT/XLATB instructions are almost forgotten. By the way, in the article about Hamming Weight there are two very interesting statements:

>>...
>>With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer.
>>...

It looks like overcompilation of a really simple thing and the author didn't think about practical applications of that method.

and

>>...
>>In C++ STL, the bit-array data structure bitset has a count() method that counts the number of bits that are set.
>>...

I tested it some time ago and it is very slow when compared to already mentioned methods.

## Leave a Comment

Please sign in to add a comment. Not a member?