Parallelize repetitive tasks on list of files
suggest changeMany repetitive jobs can be performed more efficiently if you utilize more of your computer’s resources (i.e. CPU’s and RAM). Below is an example of running multiple jobs in parallel.
Suppose you have a < list of files >
, say output from ls
. Also, let these files are bz2 compressed and the following order of tasks need to be operated on them.
- Decompress the bz2 files using
bzcat
to stdout - Grep (e.g. filter) lines with specific keyword(s) using
grep <some key word>
- Pipe the output to be concatenated into one single gzipped file using
gzip
Running this using a while-loop may look like this
filenames="file_list.txt"
while read -r line
do
name="$line"
## grab lines with puppies in them
bzcat $line | grep puppies | gzip >> output.gz
done < "$filenames"
Using GNU Parallel, we can run 3 parallel jobs at once by simply doing
parallel -j 3 "bzcat {} | grep puppies" ::: $( cat filelist.txt ) | gzip > output.gz
This command is simple, concise and more efficient when number of files and file size is large. The jobs gets initiated by parallel
, option -j 3
launches 3 parallel jobs and input to the parallel jobs is taken in by :::
. The output is eventually piped to gzip > output.gz
Found a mistake? Have a question or improvement idea?
Let me know.
Table Of Contents