How to you take a list of files and do something with them in the UNIX shell? xargs is the key.

If you’ve run in to xargs, it’s probably in it’s most simple form:

1
xargs rm < list.txt

Where list.txt is a list of white-space separated file names. < list.txt is shell speak for send the file list.txt STDIN of the command and has the same effect that cat list.txt | xargs rm without the overhead of an extra command.

1
2
3
4
5
one.txt
two.txt
three.txt
...
nine-thousand-four-two.txt

What the above command line does is break the file up in to chunks of 5000 (the default many vary from system to system) and pass them to the command rm. The effect being the same as running:

1
2
rm one.txt two.txt ... four-hundred-ninety-nine.txt five-thousand.txt
rm five-thousand-one.txt ... nine-thousand-four-one.txt nine-thousand-four-two.txt

xargs will run the command as many times as it needs to consume the chunks of 5000, the last chunk will be however many are left over. You can change the number of arguments per chunk with -n i.e. -n 500.

That’s handy, but not super usefully, much more handy is using xargs with a pipe:

1
find . -user alice -print0 | xargs -0 -n 500 chgrp staff

The find command finds all of the files owned by “alice” (more on find some other time) and passes them to xargs which changes the group ID of the files, in chunks of 500, to be group “staff”. What’s with the -print0 and -0 arguments?

As I noted above, xargs expects arguments to be separated by white space, meaning that:

1
2
3
one.txt
two.txt
three.txt

and:

1
one.txt two.txt three.txt

both appears a three separate files to xargs. Of course UNIX like systems allow spaces in file names, given a file called My Secret Plans xargs would see three files, My, Secret, and Plans.

The -0 argument tells xargs to instead use a NUL (“\0”) as the separator. find normally prints each file name followed by the a newline. -print0 causes it to instead follow the file name with a NUL. The combination of these to arguments allows xargs to handle file names with spaces, tabs, and even newlines in them.

That’s as far as most people get with xargs, but it has a few other tricks up it’s sleeves.

Before we get to them, it’s important to note that there’s a bit of a schism in xargs versions. The two main variants are the GNU version, which ships with most, if not all Linux destros, and the BSD version which ships with the BSDs and with OS X.

First, verbosity, the -t option will print each command line before executing it. If you’re not sure you trust the xargs command you can take it one step further with -p. In addition to printing the command you will be prompted before each execution:

1
2
ls | xargs -p rm
rm 1.txt 2.txt 3.txt 4.txt?...

There’s positional arguments. Most examples of xargs pass the as the final arguments to the command. However, by using the -I option you can specify a string that will be replaced with the name of the with the arguments. For example to copy a list of directories and files to “destdir”.

1
2
3
xargs -I % cp -rp % destdir < list-of-things-to-copy.txt
# cp -rp file-one.txt destdir
# cp -rp some-directory destdir

This form will run one command per argument, as if you had specified -n 1. The BSD version also has the -J option that works like -I but replaces the string with the whole chunk of arguments. The GNU version lacks this option.

1
2
xargs -J % cp -rp % destdir < list-of-things-to-copy.txt
# cp -rp file-one.txt some-directory destdir

Instead, the GNU version has a (deprecated) -i option which is the equivalent of -I {}:

1
2
3
xargs -i cp -rp {} destdir < list-of-things-to-copy.txt
# cp -rp file-one.txt destdir
# cp -rp some-directory destdir

Finally, there’s Parallel mode. xargs default behavior it to run one process with each chunk of data, waiting for the command to complete before continuing. -P 5 will cause it to run 5 processes at a time. Some versions of xargs allow -P 0 which means “run as many processes as possible”.

It’s important to note that while in most examples it’s being passed list of files, xargs doesn’t care, arguments are arguments. For example, to add a list of users to the developers group:

1
sudo xargs -n 1 useradd -G developers < list-of-users.txt

So, next time you have need process a big list of arguments, reach for xargs. You can find a little more on this history and usage of xargs at the Wikipedia.

Comments