API to access data over internet

I want to download a dataset of around 50 million rows using get APIs which have the following limit:
Each request can return a maximum of 1000 rows.
Maximum of 30 requests per minute per IP address.

My initial thoughts were to run a python script from compute nodes, but since they don’t have internet access, that won’t be possible. (if this can be made possible, it would reduce the total time a lot)
I plan to run the download script using 2 transfer nodes and 1 login node every minute, which amounts to around 9 hours to download the dataset.

Is there any other efficient/faster approach?

@dpwani That’s the approach we would recommend, and to run the script within a tmux session in case of disconnect/logout. Not sure what your script looks like, but you could use a sleep 60 system command, for example, within a for or while loop, to accommodate that request limit.

1 Like

Thank you @dstrong !
Sure, I will run my script in tmux session and yes, I am using sleep 60 in my script :slight_smile: