# Data wrangling


The efficient collection of data can be critical to project success.

Methods of Data wrangling are introduced.

In the first example the download URL will be copied and used by wget.

Open the page in the following link;

In this example the aim is to download the data file named dataFile-53.dat.

For your specific OS pick up the content of the link.  For Windows it can be done as follows;

1. Right click on the URL of interest (dataFile-53.dat)
3. Paste the link into an editor to confirm you have it in your buffer.
4. Change directories to where you want to put the data. Example ‘Download’ (or make a project directory).
5. Type wget then paste from the buffer.

It should look like this;

Running the above command will output similar to the following.

--2020-07-20 12:17:50--  http://bioinformaticstoolspw.us/files/dataFile-53.dat
Resolving bioinformaticstoolspw.us (bioinformaticstoolspw.us)... 162.241.217.126
Connecting to bioinformaticstoolspw.us (bioinformaticstoolspw.us)|162.241.217.126|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34
Saving to: âdataFile-53.datâ

dataFile-53.dat                100%[=================================================>]      34  --.-KB/s    in 0s

2020-07-20 12:17:50 (799 KB/s) - âdataFile-53.datâ saved [34/34]

The destination filename can be changed as follows;

In the above example the ‘-O’ switch is used to set the target name, that is the capital O.

----------------------

This manual method is fine for a small collection of small files.

First open a new script name in the preferred editor. And write and test the following for-loop in the script.

vim wgetInForloop.sh

or

nano wgetInForloop.sh

for N in {54..60}
do
echo $N exit done  Save it and open another terminal to run it. bash ./wgetInForloop.sh If it runs without error return to the terminal with the editor and make the next modifications. for N in {54..60} do echo wget http://bioinformaticstoolspw.us/files/dataFile-$N.dat

exit

done


Again edit save and run in the other terminal.

bash ./wgetInForloop.sh

If it runs without errors and outputs what looks like a functional wget command then remove the echo and comment out the exit.

-----------------------------

A similar method can be used to efficiently process the data.

Start a new script called processData.sh.

for FILE in ./dataFile-*.dat
do
ls -l \$FILE

exit

done


If it runs without errors the command ‘ls -l’ demonstrates that you have a data file in the variable ready for further processing per the data type.

Data wrangling is shared under a GNU General Public License 3.0 license and was authored, remixed, and/or curated by LibreTexts.