There are plenty of bit-wise operations you can do to make your codes faster and sometimes cleaner. Bitwise optimization are sometimes in order of magnitude faster than conventional programs.

The simplest way to explain this is that a job done in parallel is much faster and depends upon the number of entities working on the job. In our computing world we have computers with different architectures. Some are 16 bit, some other are 32 or 64 bit. So if we have a register size of 32 bits, then it can manipulate 32 bits in 1 instruction cycle. Some of the operations that can be done in 1 instruction cycle are +, -, |, &, ^ ...etc.

I assume you have understanding of the bit-wise operators.

~A

-A

A |= (1 << i)

A &= ~(1 << i)

A = A^B

B = A^B

A = A^B

if A & (A-1) == 0 implies A is a power of 2.

least significant bit = A & -A

A -= A & -A

This technique can also be used to count the number of set bits of A.

__builtin_popcount(A): This is a built-in function in C and C++ and does its job.

Lets learn some cool things you can do with bit-wise operations.

Suppose we have a set A = { 1,2,3,4,5 };

We would like to print the power set of A.

let N be the size of the set. Size of power set = pow(2, N)

In C, C++,

int A[] = {1,2,3,4,5};

int N = 5;

int Total = 1 << N;

for ( int i = 0; i < Total; i++ )

{

for ( int j = 0; j < N; j++)

if ( (i >> j) & 1 )

cout << A[j];

cout << endl;

}

In fact this is a very widely used technique. Also it is much faster than the recursive version.

This technique could be used to solve the subset sum problem also if the size of the set is small (less than 50).

A brute force of 2 ^ 50 would take ages to complete.

Hence the trick is to divide the set into 2 sets of almost equal size. 2 ^ 25 = 33554432 which is a small number.

Generate all the power sets of one of the sets. Compute the sum of numbers in each set. Sort and store them.

Generate all the power sets of another set. Compute the sum of numbers in each set. For each sum, do a binary search for the number ( K-sum ) in the sorted array.

http://p--np.blogspot.com/2011/04/binary-indexed-tree.html

Suppose we have 'n' boys and 'm' girls. Some boys don't like some girls and vice-versa.

Lets say we have a boolean matrix valid[i][j] which says if ith boy can be paired with jth girl.

We need to know how many ways can 'k' pairs be chosen for a dance competition. 1 <= n,m,k <=10, and k <= min (m , n)

This is a little difficult to explain in layman's words. Nevertheless I will try here.

Let 'mask' be a bitwise array. Each set bit in the mask represents which girls have already been paired with some boys (we dont know who but we are sure that she is paired).

Now read this line very carefully.

This means number of set bits of mask, __builtin_popcount(mask) <= i.

Also __builtin_popcount(mask) = number of pairs selected so far.

memset( dp, 0, sizeof dp );

dp[0][0] = 1;

for (int i = 0; i < n; i++ )

{

for ( int mask = 0; mask < (1 << m); mask++ )

{

dp[i+1][mask] += dp[i][mask];

for( int t = 0; t < m; t++)

if ( (mask >> t) & 1 == 0 && valid[i][t] )

dp[i+1][mask | (1 << t)] += dp[i][mask];

}

}

long long ans = 0;

for ( int mask = 0; mask < (1 << m); mask++ )

if ( __builtin_popcount(mask) == k )

ans += dp[n][mask];

Variable 'ans' contains the answer to the problem.

The complexity of the whole algorithm is n * O( 2^m ).

dp[0][0] = 1; //1 way to choose no boys and no girls.

for each set of girls that have been already paired we can do one of the following:

Now finally all we need to do is count dp[n][mask] for all those mask whose number of set bits is 'k'.

The space complexity is n * O( 2^m ).

With a small observation it can be reduced to O( 2^m ).

That's for homework :)

**So you might ask why is bit-wise faster??**The simplest way to explain this is that a job done in parallel is much faster and depends upon the number of entities working on the job. In our computing world we have computers with different architectures. Some are 16 bit, some other are 32 or 64 bit. So if we have a register size of 32 bits, then it can manipulate 32 bits in 1 instruction cycle. Some of the operations that can be done in 1 instruction cycle are +, -, |, &, ^ ...etc.

**Basics**I assume you have understanding of the bit-wise operators.

__1's Complement of A__~A

__2's Complement of A__-A

__Set ith bit of A__A |= (1 << i)

__Unset ith bit of A__A &= ~(1 << i)

__Swapping A and B numbers using bit-wise operations__A = A^B

B = A^B

A = A^B

__Checking if A is a power of 2__if A & (A-1) == 0 implies A is a power of 2.

__Extracting least significant bit of A__least significant bit = A & -A

__Removing least significant bit of A__A -= A & -A

This technique can also be used to count the number of set bits of A.

__Test ith bit of A__- A & (1 << i) //this can cause problems if i = 32 as it would be and integer overflow. use
**A & (1LL << i)** - (A>>i) & 1 //this is recommended

__Count the number of bits of A____builtin_popcount(A): This is a built-in function in C and C++ and does its job.

**Advanced**Lets learn some cool things you can do with bit-wise operations.

__Generating Power Set__Suppose we have a set A = { 1,2,3,4,5 };

We would like to print the power set of A.

let N be the size of the set. Size of power set = pow(2, N)

In C, C++,

**(1 << N)**is equivalent to computing pow(2, N).int A[] = {1,2,3,4,5};

int N = 5;

int Total = 1 << N;

for ( int i = 0; i < Total; i++ )

{

for ( int j = 0; j < N; j++)

if ( (i >> j) & 1 )

cout << A[j];

cout << endl;

}

In fact this is a very widely used technique. Also it is much faster than the recursive version.

This technique could be used to solve the subset sum problem also if the size of the set is small (less than 50).

A brute force of 2 ^ 50 would take ages to complete.

Hence the trick is to divide the set into 2 sets of almost equal size. 2 ^ 25 = 33554432 which is a small number.

Generate all the power sets of one of the sets. Compute the sum of numbers in each set. Sort and store them.

Generate all the power sets of another set. Compute the sum of numbers in each set. For each sum, do a binary search for the number ( K-sum ) in the sorted array.

__Binary Indexed Tree__http://p--np.blogspot.com/2011/04/binary-indexed-tree.html

__Some Dynamic Programming applications__Suppose we have 'n' boys and 'm' girls. Some boys don't like some girls and vice-versa.

Lets say we have a boolean matrix valid[i][j] which says if ith boy can be paired with jth girl.

We need to know how many ways can 'k' pairs be chosen for a dance competition. 1 <= n,m,k <=10, and k <= min (m , n)

This is a little difficult to explain in layman's words. Nevertheless I will try here.

Let 'mask' be a bitwise array. Each set bit in the mask represents which girls have already been paired with some boys (we dont know who but we are sure that she is paired).

Now read this line very carefully.

**dp[i][mask]**represents number of ways of making pairs using some boys numbered from 1 to i and some girls represented by set bits of mask.This means number of set bits of mask, __builtin_popcount(mask) <= i.

Also __builtin_popcount(mask) = number of pairs selected so far.

memset( dp, 0, sizeof dp );

dp[0][0] = 1;

for (int i = 0; i < n; i++ )

{

for ( int mask = 0; mask < (1 << m); mask++ )

{

dp[i+1][mask] += dp[i][mask];

for( int t = 0; t < m; t++)

if ( (mask >> t) & 1 == 0 && valid[i][t] )

dp[i+1][mask | (1 << t)] += dp[i][mask];

}

}

long long ans = 0;

for ( int mask = 0; mask < (1 << m); mask++ )

if ( __builtin_popcount(mask) == k )

ans += dp[n][mask];

Variable 'ans' contains the answer to the problem.

The complexity of the whole algorithm is n * O( 2^m ).

dp[0][0] = 1; //1 way to choose no boys and no girls.

for each set of girls that have been already paired we can do one of the following:

- The current boy can be paired with a girl that is not paired. [line 8 to 10]
- The current boy may not be paired with any girl. [line 7]

Now finally all we need to do is count dp[n][mask] for all those mask whose number of set bits is 'k'.

The space complexity is n * O( 2^m ).

With a small observation it can be reduced to O( 2^m ).

That's for homework :)