Duplicating streams in *nix and pointful shell programming

September 23, 2014

Ever needed to print info from the middle of pipeline, and continue using that info? tee is exactly what you need!

% cat /etc/fstab | tac | tee /dev/tty | grep /dev/ | md5sum
#  ^                ^     ^              ^ use the stream just printed
#  |                |     └-> print and continue piping
#  |                └-> manipulate it a bit
#  └->  do stuff

But you do not need to limit yourself with directing one of tee’s outputs to a file — send them both to programs!

% mkfifo /tmp/f
% cat /etc/fstab | tac | tee /tmp/f | grep /dev/; md5sum /tmp/f

So ugly! Thankfully, we’ve got these cool shell redirections!

% cat /etc/fstab | tac | tee >(md5sum) | grep /dev/

The >() and <() should work on bash and zsh and the direction of the comparsion sign denotes the direction of data flow.

Pointful shell programming

Usually the common points of interaction are std{in,out,err}, the ones specified by command line arguments or some hardcoded filename. That means you have to make files with filenames for the program to be able to reach them if you need more than 1 input or 2 outputs. In cases where you want to access content generated by other programs, that’s a hassle much like as it would be in programming languages — if you had to make a variable for every intermediate stage of application. For every f (g x) you’d have to write a = g x; f a. Time spent writing that is wasted. This only happens if std{in,out,err} is not enough, nevertheless it still happens.

So the std* points of interaction are actually not that special, they’re just conventional. You can make more, the same way you make any new file descriptors — with open(3). The std* are easy to use because they’re always there and shells make it easy to operate with them and you can magically access them through /dev/stdin, /dev/stdout, /dev/stderr. Just kidding they’re not magical!

% ls -l /dev/std{in,out,err}
lrwxrwxrwx 1 root root 15 Sep 18 15:11 /dev/stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root 15 Sep 18 15:11 /dev/stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root 15 Sep 18 15:11 /dev/stdout -> /proc/self/fd/1

And if the shells make yet another pipe(3), then you can supply a path to it, just like with the ones above.

% echo <(echo)
/proc/self/fd/11
% echo >(echo)
/proc/self/fd/12

What that allows you to do is making shell code that defines points of interaction between themselves by just virtual files — no need to clutter fs with permanent files. I guess Haskell users would call this code pointful. It is also much easier to think about code, if it doesn’t change the state of anything — like the state of your harddrive through the creation of files.

Finally, it makes me sorta feel like an electrican. Imagine, if you will, that a program is a ball with plugs hanging in space. Instantiate some of these balls, plug them together with these wires in their own isolated universe. Wrap them in another ball define some plugs that come out of it and you’ve just mind-soldered yourself a new program. I guess both electricians and us need to make interfaces.

❦