Bash script to download files from URLs to specified directories listed in a text file

Fooby
2014-07-07
2014-07-07
  •  Fooby
    Fooby
    2014-07-07

    I originally posted this script in my review but markdown formatting is apparently not supported in a review so I am posting it here.

    As I mentioned in my review, I have been looking for a CLI alternative to DownThemAll for Firefox because DTA cripples Firefox and slows the entire system on my pretty fast 8-core Mac Pro. It just struggles to maintain a difficult connection, and I have plenty of those. aria2 handles these with ease and does not in any way slow my system down. It just sits there in the background downloading away at max speed. I tried wget and curl but they are single-threaded and very slow on the files I am downloading. I tried puf but it didn't work at all with HTTPS protocol. I tried axel, which worked with HTTPS, BUT tcpdump showed that it had somehow switched over to HTTP when I wasn't looking, which really pissed me off (can I say that?). Most people would probably not have even checked and been lulled into a false sense of security--like what happened with Heartbleed. Bad, bad axel, misleading us like that… aria2 not only worked first time, right out of the box, really handling HTTPS traffic (believe me, I checked…) flawlessly, but it downloaded files at record speeds--my full allotted bandwidth. Needless to say, "Goodbye DTA…", "Sayonara axel…".

    I used the following syntax to get these speeds:

    aria2c --file-allocation=none -c -x 10 -s 10 -d "mydir" URL
    


    Note that aria2c is the binary HomeBrew has installed--don't ask me why, I don't know…

    --file-allocation=none speeds up the initialization of the download which can take quite a long time for a multi-GB file otherwise.

    -c allows continuation of a download if it was incomplete the first time. This came in really handy when, for some reason, the speed started flagging and I ctrl-c-ed out of the download and restarted it. It resumed right where it left off at max speed. Nice.

    -x 10 and -s 10 give 10 connections per server to speed things along. I suspect the -s 10 is unnecessary but I prefer to err on the side of overkill.

    - d downloads files to a directory.


    I wrote a script that reads directory names and URLs from a text file and automatically creates the directories and downloads the files to the directories--similar to the way DTA works only much faster/better ;).

    aria2files.sh:

    #!/bin/bash
    
    filename="$1" # get filename from command line argument
    
    while read -r line
    do
        if [ "$line" ] # skip blank lines
        then
            if [[ "$line" =~ (https?|ftp)\:\/\/ ]] # line contains a URL, download file
            then
                echo "URL: '$line'"
                aria2c --file-allocation=none -c -x 10 -s 10 -d "$currdir" "$line"
            else # line contains a directory name, create directory if not already present
                echo "Directory: '$line'"
                currdir="$line"
                if [ ! -d "$currdir" ]
                then
                    mkdir -p "$currdir" # '-p' enables creation of nested directories in one command
                fi
            fi
        fi
    done < "$filename"
    


    The regex will detect HTTP(S) and FTP URLs. Note that if [ "$line" ] works in OS X bash but you may have to use if [ -z "$line" ] on *NIX, which BTW doesn't work in OS X bash--again, don't ask me why…

    The text file has the format:

    files.txt:

    dierctory 1
    url1
    url2
    
    directory 2/subdirectory/sub-subdirectory/
    url3
    url4
    
    
    
    


    The script reads the filename from the command line:

    aria2files.sh files.txt
    


    files.txt is in the PWD and listed directories are created as subdirectories of the PWD. Notice that you can list nested directories on one line and the entire hierarchy will be created. There is no checking done so if, for example, the first non-empty line of files.txt is not a directory name but a URL, the file will be saved to the PWD and subsequent URLs will do the same until a directory name is encountered. If the script hasn't yet finished, you can keep adding directories/URLs to the bottom of the text file and saving it.

    I put the script in

    /usr/local/bin
    


    so it is in my PATH.

     
    Last edit: Fooby 2014-07-08