# 05-D.7.2: Handling Text Files - sort/diff Commands

## The sort Command

The sort command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts a file assuming the contents are ASCII. Using options in sort command, it can also be used to sort numerically.

Some features of the command are as follows:

• sort command sorts the contents of a text file, line by line.
• sort is a standard command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order.
• The sort command is a command line utility for sorting lines of text files. It supports sorting alphabetically, in reverse order, by number, by month and can also remove duplicates.
• The sort command can also sort by items not at the beginning of the line, ignore case sensitivity and return whether a file is sorted or not. Sorting is done based on one or more sort keys extracted from each line of input.
• By default, the entire input is taken as sort key. Blank space is the default field separator.

Syntax:

sort [ OPTION ] filename


Command Options:

Options Option Meaning
-d, --dictionary-order consider only blanks and alphanumeric characters
-f, --ignore-case fold lower case to upper case characters
-g, --general-numeric-sort compare according to general numerical value
-i, --ignore-nonprinting consider only printable characters
-M, --month-sort compare (unknown) < 'JAN' < ... < 'DEC'
-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
-n, --numeric-sort compare according to string numerical value
-R, --random-sort shuffle, but group identical keys. See shuf(1)
--random-source=,FILE/ get random bytes from FILE
-r, --reverse reverse the result of comparisons
--sort=,WORD/ sort according to WORD: general-numeric -g, human-numeric -h, month -M, numeric -n, random -R, version -V
-V, --version-sort natural sort of (version) numbers within text

The sort command is another command that has an abundance of options, and only a few are shown in the table above. The example below is a very straightforward sort. The cat command shows the random names of states. Then the sort command produces an output list of the states sorted in alphabetic order. NOTE: the original file, states, is not altered at all. The new list is simply output to the terminal.

pbmac@pbmac-server $cat states California New York Florida Texas North Carolina Alabama South Dakota Washington Georgia Ohio pbmac@pbmac-server$ sort states
Alabama
California
Florida
Georgia
New York
North Carolina
Ohio
South Dakota
Texas
Washington


With the plethora of options sort can sort according to alpha or numeric values, or reverse sort. For columnar data it can sort by any one of the columns, and specifying any character as the column delimiter.

This command is very useful and very powerful.

## The diff Command

The diff command is used to display the differences in the files by comparing the files line by line. Unlike its fellow members, cmp and comm, it tells us which lines in one file have to be changed to make the two files identical.

The important thing to remember is that diff uses certain special symbols and instructions that are required to make two files identical. It tells you the instructions on how to change the first file to make it match the second file.

Special symbols are:

a : add
c : change
d : delete

Syntax :

diff [ OPTIONS ] File1 File2


Command Options

Options Option Meaning
-b Ignore spacing differences.
-c Display a list of differences with three lines of context.
-i Ignore case differences.
-t Expand tab characters in output lines.
-u Output results in unified mode, which presents a more streamlined format.
-w Ignore spacing differences and tabs.

Let's say we have two files with names a.txt and b.txt containing 5 American states.

pbmac@pbmac-server $cat states.1 New York Florida Texas Alabama South Dakota Washington pbmac@pbmac-server$ cat states.2
California
New York
Florida
Texas
North Carolina
Alabama
Washington
Ohio


Now, applying diff command without any option we get the following output:

pbmac@pbmac-server \$ diff states.1 states.2
0a1
> California
3a5
> North Carolina
5d6
< South Dakota
6a8
> Ohio


NOTE: neither file is altered, only output of the differences is sent to the terminal.

Let’s take a look at what this output means. The first line of the diff output will contain:

• Line numbers corresponding to the first file
• A special symbol
• Line numbers corresponding to the second file.

Like in our case, 0a1 which means after lines 0(at the very beginning of file) you have to add California to match the second file line number 1. It then tells us what those lines are in each file proceeded by the symbol:

• Lines preceded by a < are lines from the first file.
• Lines preceded by > are lines from the second file.

Next line contains 3a5 which means at line 3 of the first file we need to add line 5 from the second file. The we have to delete from line 5 to line 6 (BUT not deleting line 6) from the first file. Finally, after line 6 of the first file we add line 8 from the second file.