Tuesday, June 17, 2014

Ubuntu - Gnu parallel - It's awesome

GNU parallel is a shell package for executing jobs in parallel using one or more nodes. If you have used xargs in shell scripting then you will find it easier to learn GNU parallel,
because GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.

To install the package

sudo apt-get install parallel

Here is an example of how to use GNU parallel.

If you have a directory which is having large log files and if you need to compute no of lines per each file and get the largest file. You can do it efficiently with GNU Parallel and it can utilize all your cpu cores in the server very efficient way.

In this case most heavier operation is calculating the number of lines of each file, instead of doing this operation sequentially we can do this operation parallely using GNU Parallel.

Sequencial way

ls | xargs wc -l | sort -n -r | head -n 1

Parallel way

ls | parallel wc -l | sort -n -r | head -n 1

This is only one example, like this you can optimize your operations using GNU parallel. :)