Filter Definition

A filter is a small and (usually) specialized program in Unix-like operating systems that transforms plain text (i.e., human readable) data in some meaningful way and that can be used together with other filters and pipes to form a series of operations that produces highly specific results.

As is generally the case with command line (i.e., all-text mode) programs in Unix-like operating systems, filters read data from standard input and write to standard output. Standard input is the source of data for a program, and by default it is text typed in at the keyboard. However, it can be redirected to come from a file or from the output of another program. Standard output is the destination of output from a program, and by default it the display screen. This means that if the output of a command is not redirected to a file or another device (such as a printer) or piped to another filter for further processing, it will be sent to the monitor where it will be displayed.

Numerous filters are included on Unix-like systems, a few of which are awk, cat, comm, csplit, cut, diff, expand, fold, grep, head, join, less, more, paste, sed, sort, spell, tail, tr, unexpand, uniq and wc.

It is a simple matter to construct a pipeline of commands with a highly specific function by stringing multiple filters together with pipes. As a trivial example, the following would display the last three files in the directory /sbin (which contains basic programs used for system maintenance or administrative tasks) whose names contain the string (i.e., sequence of characters) mk:

ls /sbin | grep mk | sort -r | head -3

The ls command lists the contents of /sbin and pipes its output (using the pipe operator, which is designated by the vertical bar character) to the filter grep, which searches for all files and directories that contain the letter sequence mk in their names. grep then pipes its output to the sort filter, which, with its -r option, sorts it in reverse alphabetical order. sort, in turn, pipes its output to the head filter. The default behavior of head is to read the first ten lines of text or output from another command, but the -3 option here tells it read only the first three. head thus writes the first three results from sort (i.e., the last three filenames or directory names from /sbin that contain the string mk) to the display screen.

cat is one of the most frequently used commands on Unix-like systems. It is best known for its ability to display the contents of files rather than for its ability to transform them, and thus it might not initially appear to fall into the category of a filter. However, it has the two additional (and not unrelated) functions of creating files and concatenating (i.e., combining) copies of them, which clearly makes it a filter.

In the next example, cat combines copies of file1, file2 and file3, and this is piped to wc, a filter which by default writes the number of bytes, words and lines to the display monitor:

cat file1 file2 file3 | wc

Although most filters are highly specialized programs with a very limited range of functions, there are a few exceptions. Most notable among them is awk, a pattern matching program that has evolved into a powerful and full-fledged programming language.

An important tenet of the Unix philosophy has been to try to develop every program (at least every command line program) so that it is a filter rather than just a stand-alone program.

Created June 15, 2004. Updated May 21, 2005.
Copyright © 2005 The Linux Information Project. All Rights Reserved.