05-D.7.2: Handling Text Files - sort/diff Commands

Last updated
Save as PDF

Page ID: 32339

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

The sort Command

The sort command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts a file assuming the contents are ASCII. Using options in sort command, it can also be used to sort numerically.

Some features of the command are as follows:

sort command sorts the contents of a text file, line by line.
sort is a standard command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order.
The sort command is a command line utility for sorting lines of text files. It supports sorting alphabetically, in reverse order, by number, by month and can also remove duplicates.
The sort command can also sort by items not at the beginning of the line, ignore case sensitivity and return whether a file is sorted or not. Sorting is done based on one or more sort keys extracted from each line of input.
By default, the entire input is taken as sort key. Blank space is the default field separator.

Syntax:

sort [ OPTION ] filename

Command Options:

Options	Option Meaning
-b, --ignore-leading-blanks	ignore leading blanks
-d, --dictionary-order	consider only blanks and alphanumeric characters
-f, --ignore-case	fold lower case to upper case characters
-g, --general-numeric-sort	compare according to general numerical value
-i, --ignore-nonprinting	consider only printable characters
-M, --month-sort	compare (unknown) < 'JAN' < ... < 'DEC'
-h, --human-numeric-sort	compare human readable numbers (e.g., 2K 1G)
-n, --numeric-sort	compare according to string numerical value
-R, --random-sort	shuffle, but group identical keys. See shuf(1)
--random-source=,FILE/	get random bytes from FILE
-r, --reverse	reverse the result of comparisons
--sort=,WORD/	sort according to WORD: general-numeric -g, human-numeric -h, month -M, numeric -n, random -R, version -V
-V, --version-sort	natural sort of (version) numbers within text

The sort command is another command that has an abundance of options, and only a few are shown in the table above. The example below is a very straightforward sort. The cat command shows the random names of states. Then the sort command produces an output list of the states sorted in alphabetic order. NOTE: the original file, states, is not altered at all. The new list is simply output to the terminal.

pbmac@pbmac-server $ cat states
California
New York
Florida
Texas
North Carolina
Alabama
South Dakota
Washington
Georgia
Ohio
pbmac@pbmac-server $ sort states
Alabama
California
Florida
Georgia
New York
North Carolina
Ohio
South Dakota
Texas
Washington

With the plethora of options sort can sort according to alpha or numeric values, or reverse sort. For columnar data it can sort by any one of the columns, and specifying any character as the column delimiter.

This command is very useful and very powerful.

The diff Command

The diff command is used to display the differences in the files by comparing the files line by line. Unlike its fellow members, cmp and comm, it tells us which lines in one file have to be changed to make the two files identical.

The important thing to remember is that diff uses certain special symbols and instructions that are required to make two files identical. It tells you the instructions on how to change the first file to make it match the second file.

Special symbols are:

a : add
c : change
d : delete

Syntax :

diff [ OPTIONS ] File1 File2

Command Options

Options	Option Meaning
-b	Ignore spacing differences.
-c	Display a list of differences with three lines of context.
-i	Ignore case differences.
-t	Expand tab characters in output lines.
-u	Output results in unified mode, which presents a more streamlined format.
-w	Ignore spacing differences and tabs.

Let's say we have two files with names a.txt and b.txt containing 5 American states.

pbmac@pbmac-server $ cat states.1
New York
Florida
Texas
Alabama
South Dakota
Washington
pbmac@pbmac-server $ cat states.2
California
New York
Florida
Texas
North Carolina
Alabama
Washington
Ohio

Now, applying diff command without any option we get the following output:

pbmac@pbmac-server $ diff states.1 states.2
0a1
> California
3a5
> North Carolina
5d6
< South Dakota
6a8
> Ohio

NOTE: neither file is altered, only output of the differences is sent to the terminal.

Let’s take a look at what this output means. The first line of the diff output will contain:

Line numbers corresponding to the first file
A special symbol
Line numbers corresponding to the second file.

Like in our case, 0a1 which means after lines 0(at the very beginning of file) you have to add California to match the second file line number 1. It then tells us what those lines are in each file proceeded by the symbol:

Lines preceded by a < are lines from the first file.
Lines preceded by > are lines from the second file.

Next line contains 3a5 which means at line 3 of the first file we need to add line 5 from the second file. The we have to delete from line 5 to line 6 (BUT not deleting line 6) from the first file. Finally, after line 6 of the first file we add line 8 from the second file.

Adapted from:
"SORT command in Linux/Unix with examples" by Mohak Agrawal, Geeks for Geeks is licensed under CC BY-SA 4.0
"diff command in Linux with examples" by AKASH GUPTA 6, Geeks for Geeks is licensed under CC BY-SA 4.0

Search

Text Color

Text Size

Margin Size

Font Type