Skip to main content
Engineering LibreTexts

05-D.7.2: Handling Text Files - sort/diff Commands

  • Page ID
    32339
  • The sort Command

    The sort command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts a file assuming the contents are ASCII. Using options in sort command, it can also be used to sort numerically.

    Some features of the command are as follows:

    • sort command sorts the contents of a text file, line by line.
    • sort is a standard command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order.
    • The sort command is a command line utility for sorting lines of text files. It supports sorting alphabetically, in reverse order, by number, by month and can also remove duplicates.
    • The sort command can also sort by items not at the beginning of the line, ignore case sensitivity and return whether a file is sorted or not. Sorting is done based on one or more sort keys extracted from each line of input.
    • By default, the entire input is taken as sort key. Blank space is the default field separator.

    Syntax:

    sort [ OPTION ] filename
    

    Command Options:

    Options Option Meaning
    -b, --ignore-leading-blanks ignore leading blanks
    -d, --dictionary-order consider only blanks and alphanumeric characters
    -f, --ignore-case fold lower case to upper case characters
    -g, --general-numeric-sort compare according to general numerical value
    -i, --ignore-nonprinting consider only printable characters
    -M, --month-sort compare (unknown) < 'JAN' < ... < 'DEC'
    -h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
    -n, --numeric-sort compare according to string numerical value
    -R, --random-sort shuffle, but group identical keys. See shuf(1)
    --random-source=,FILE/ get random bytes from FILE
    -r, --reverse reverse the result of comparisons
    --sort=,WORD/ sort according to WORD: general-numeric -g, human-numeric -h, month -M, numeric -n, random -R, version -V
    -V, --version-sort natural sort of (version) numbers within text

    The sort command is another command that has an abundance of options, and only a few are shown in the table above. The example below is a very straightforward sort. The cat command shows the random names of states. Then the sort command produces an output list of the states sorted in alphabetic order. NOTE: the original file, states, is not altered at all. The new list is simply output to the terminal.

    pbmac@pbmac-server $ cat states
    
    California
    New York
    Florida
    Texas
    North Carolina
    Alabama
    South Dakota
    Washington
    Georgia
    Ohio
    pbmac@pbmac-server $ sort states
    Alabama
    California
    Florida
    Georgia
    New York
    North Carolina
    Ohio
    South Dakota
    Texas
    Washington
    

    With the plethora of options sort can sort according to alpha or numeric values, or reverse sort. For columnar data it can sort by any one of the columns, and specifying any character as the column delimiter.

    This command is very useful and very powerful.

    The diff Command

    The diff command is used to display the differences in the files by comparing the files line by line. Unlike its fellow members, cmp and comm, it tells us which lines in one file have to be changed to make the two files identical.

    The important thing to remember is that diff uses certain special symbols and instructions that are required to make two files identical. It tells you the instructions on how to change the first file to make it match the second file.

    Special symbols are:

    a : add
    c : change
    d : delete

    Syntax :

    diff [ OPTIONS ] File1 File2 
    

    Command Options

    Options Option Meaning
    -b Ignore spacing differences.
    -c Display a list of differences with three lines of context.
    -i Ignore case differences.
    -t Expand tab characters in output lines.
    -u Output results in unified mode, which presents a more streamlined format.
    -w Ignore spacing differences and tabs.

    Let's say we have two files with names a.txt and b.txt containing 5 American states.

    pbmac@pbmac-server $ cat states.1
    New York
    Florida
    Texas
    Alabama
    South Dakota
    Washington
    pbmac@pbmac-server $ cat states.2
    California
    New York
    Florida
    Texas
    North Carolina
    Alabama
    Washington
    Ohio
    

    Now, applying diff command without any option we get the following output:

    pbmac@pbmac-server $ diff states.1 states.2
    0a1
    > California
    3a5
    > North Carolina
    5d6
    < South Dakota
    6a8
    > Ohio
    

    NOTE: neither file is altered, only output of the differences is sent to the terminal.

    Let’s take a look at what this output means. The first line of the diff output will contain:

    • Line numbers corresponding to the first file
    • A special symbol
    • Line numbers corresponding to the second file.

    Like in our case, 0a1 which means after lines 0(at the very beginning of file) you have to add California to match the second file line number 1. It then tells us what those lines are in each file proceeded by the symbol:

    • Lines preceded by a < are lines from the first file.
    • Lines preceded by > are lines from the second file.

    Next line contains 3a5 which means at line 3 of the first file we need to add line 5 from the second file. The we have to delete from line 5 to line 6 (BUT not deleting line 6) from the first file. Finally, after line 6 of the first file we add line 8 from the second file.

    Adapted from:
    "SORT command in Linux/Unix with examples" by Mohak Agrawal, Geeks for Geeks is licensed under CC BY-SA 4.0
    "diff command in Linux with examples" by AKASH GUPTA 6, Geeks for Geeks is licensed under CC BY-SA 4.0

    • Was this article helpful?