Skip to main content
Engineering LibreTexts

Data wrangling

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

      The efficient collection of data can be critical to project success.

      Methods of Data wrangling are introduced.

      In the first example the download URL will be copied and used by wget.


      Open the page in the following link;

      A simple list of files for download are shown.

      In this example the aim is to download the data file named dataFile-53.dat. 

      For your specific OS pick up the content of the link.  For Windows it can be done as follows;

      1. Right click on the URL of interest (dataFile-53.dat)
      2. In the menu select Copy link location
      3. Paste the link into an editor to confirm you have it in your buffer.
      4. Change directories to where you want to put the data. Example ‘Download’ (or make a project directory).
      5. Type wget then paste from the buffer. 

      It should look like this;


      Running the above command will output similar to the following.

      --2020-07-20 12:17:50--
      Resolving (
      Connecting to (||:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 34
      Saving to: âdataFile-53.datâ

      dataFile-53.dat                100%[=================================================>]      34  --.-KB/s    in 0s      

      2020-07-20 12:17:50 (799 KB/s) - âdataFile-53.datâ saved [34/34]


      The destination filename can be changed as follows;

      wget  -O ~/Downloads/new-name-dataFile-53.dat

      In the above example the ‘-O’ switch is used to set the target name, that is the capital O.


      This manual method is fine for a small collection of small files.

      This next example uses a script to deal with downloading large files.

      The aim is to only download files 54 through 60.

      First open a new script name in the preferred editor. And write and test the following for-loop in the script.




      for N in {54..60}
        echo $N

      Save it and open another terminal to run it.

      bash ./

      If it runs without error return to the terminal with the editor and make the next modifications.

      for N in {54..60}
        echo wget$N.dat

      Again edit save and run in the other terminal.

      bash ./

      If it runs without errors and outputs what looks like a functional wget command then remove the echo and comment out the exit.

      After saving it is now a script to automate data download.


      A similar method can be used to efficiently process the data.

      Start a new script called

      for FILE in ./dataFile-*.dat
        ls -l $FILE

       If it runs without errors the command ‘ls -l’ demonstrates that you have a data file in the variable ready for further processing per the data type.




      Data wrangling is shared under a GNU General Public License 3.0 license and was authored, remixed, and/or curated by LibreTexts.

      • Was this article helpful?