Recursively download all the contents of a Google Drive folder using Python, wget and a Bash script

Muntasir Wahed
DataDrivenInvestor
Published in
2 min readJul 7, 2020

--

When will you need this?

Google Drive is currently one of the most popular places to store datasets, some of which are publicly available. However, in some cases, the datasets are split into multiple folders and hundreds of individual files. In such cases, it’s extremely inefficient to download these files one by one. And it’s even more difficult when you have to work on a remote server, since you do not have any GUI access and can not access the browser.

The Script

To serve my purpose, I have written a Python code that will generate a bash script, which I will then copy to the remote server and execute. So, here are the steps.

Step 1

I have used the PyDrive library. To use this library, you have to complete the instructions described in this link. For the sake of completeness, I will mention the steps here.

  1. Go to Google APIs Console and make your own project.
  2. Search for ‘Google Drive API’, select the entry, and click ‘Enable’.
  3. Select ‘Credentials’ from the left menu, click ‘Create Credentials’, select ‘OAuth client ID’.
  4. Now, the product name and consent screen need to be set -> click ‘Configure consent screen’ and follow the instructions. Once finished:
  5. Select ‘Application type’ to be Web application.
    Enter an appropriate name.
  6. Input http://localhost:8080 for ‘Authorized JavaScript origins’.
  7. Input http://localhost:8080/ for ‘Authorized redirect URIs’.
  8. Click ‘Save’.
  9. Click ‘Download JSON’ on the right side of Client ID to download client_secret_<really long ID>.json.
  10. The downloaded file has all authentication information of your application. Rename the file to “client_secrets.json” and place it in your working directory.

Step 2

Now, create a python script or notebook in the same working directory, where you have saved the “client_secrets.json” file. I have attached the notebook below.

Step 3

Using scp, copy the “script.sh” to the remote server.
`scp ~/path/script.sh username@ip:path`

Step 4

  1. Login to the remove server, navigate to the path. Then make the script executable using the following command.
    `chmod 777 script.sh`
  2. Run the script and voila!
    `./script.sh`

This will create the subfolders and download all the files in the parent directory you have specified in the python code.

I have used this folder to test my code: https://drive.google.com/drive/folders/1NIGvjHBuUQHWnMqzboyg-zLI1q_bOuCH?usp=sharing

--

--