Using large amounts of files, that are placed on Google Drive may be challenging. In my case I needed to embed images from Google Drive storages. Standard Google functionality was not useful, as it shows controls and external links as a foreground layer on a picture. But we needed just plain image on html page. So decision was made to download those images and put them on web-server.
Source storage with images contains multiple folders and subfolders, files named by users and do not match any naming masks, source database, made with Google Spreadsheets contains only files urls but not actual names. Additionally every file should be renamed by mask.
There are different approaches to accomplish such tasks, for example, write python script or bash script together with Google Drive desktop app. But we will make it with Apache NiFi, which is more manageble in my opinion and makes whole process really automatic.
Let's start.
Prepare Google Service Account credentials by using https://console.cloud.google.com/ console:
- Create a project, giving it proper name.
- Activate Google Drive API in Library
- Create service account in "IAM & Admin" section of console main menu. This will create custom service email address like
This email address is being protected from spambots. You need JavaScript enabled to view it. . - Create access key in the same section for your new service account. This should be dowloaded in .json format on to your computer.
Go to Google Drive console and share needed files with your new service account email address.
Prepare Apache NiFi instance.
- Set up any server platform at least 2Gb RAM, 16Gb of storage and 1 or 2 CPU.
- To use Apache NiFi on any OS you can use Docker virtual environment. For example, this Docker image https://hub.docker.com/r/apache/nifi. Or alternatively you can start NiFi as desktop app by using official instructions here: https://nifi.apache.org/
- Let's assume you have one running and ready for next steps. If not, feel free to ask in comments.
Create NiFi process for downloading files:
- Put GenerateFlowFile processor and in properties section hit "+" button and add drive.id property. We will use it later
- Put FetchGoogleDrive processor and choose create GCPCredentialsControllerService, after it appears in line, click arrow buutom at the right:
- In Controller seervices window click settings buttom in the right corner, copy and paste content of your Google Drive Api key.json file into marked line. Click "OK"
- Return to the NiFi pane by closing opened popup windows and put PutFile processor on the pane. In it's properties just set up the destiantion path, you need files to be saved:
- Last but not least, put Wait processor just to give PutFile processor room to show it's work results. Connect your processors with queues, choosing succes ones. Terminate all other queues in processors configuration. Result process sould look similar to my schema:
To test it working, put needed Google file id in the GenerateFlowFile processor inside drive.id property. To get Google file id, choose it in your Google Drive account and click copy link, paste this link in notepad app and you will see quite large set of random symbols, this is what you need.
Google file sharing url example:
https://docs.google.com/presentation/d/1lG8sdaur8oK-dePztkd98NRlFzjSYOsnEG4frCuUQ7fdirYE/edit?usp=sharing
Google file id example:
1lG8sdaur8oK-dePztkd98NRlFzjSYOsnEG4frCuUQ7fdirYE
Step-by-step make right click on each processor and choose "Run once" option. You will see fter PutFile processor successfuly did his job, now navigate to the folder, you have set as destination, and verify the file is there.
This workaround is a proof of concept, that task can be acoplished in minutes. Our next steps will be to extend this flow with getting large set of Google files IDs from database, renaming each file before putting it down to the storage, writing new generated names back to our databse and so on.