What is the fastest way to transfer data between directories and filesystems (e.g., /scratch to /scratch2). I use WinSCP and FileZilla and they do not allow me to transfer between these 2 filesystems? (maybe I am doing something wrong, if yes, please let me know). I also tried command line (copy -r source destination) but the folders I have are huge so it always times out. Any suggestions or anything I am doing wrong? Much appreciated.
@habbaspo You could use the CARC OnDemand file manager instead. It would not be any faster than the command line though.
At the command line, there are a few things you could try. Copying between the file systems happens over a network. You could try compressing files first, such as by using
xz -T4 or with
rsync -z on-the-fly. If there is also a large number of files, you could put them in a compressed archive first using
tar -cJ and then copy that single file. Both of these would take extra time though, so on net you might not gain any speedup for the transfer.
Finally, for long-running transfers, I recommend using
rsync -rltP within a
tmux session, so that you can run it in the background and continue the transfer even if you log out or get disconnected. See a basic tmux guide here: https://www.carc.usc.edu/user-information/user-guides/software-and-programming/tmux
And using the hpc-transfer1 or hpc-transfer2 nodes might help.
To speed up compression and archiving, you could use multiple cores with
xz. For example, you could use a debug node with 16 cores and all memory. Then run something like this:
module load tar xz
tar -c -I 'xz -T16' -f files.tar.xz /path/to/dir
It depends on how big the files are though. You might need more memory than the debug nodes have. You could also increase the compression level.
Thank you so much Derek, I appreciate your help, this is super useful. Just a quick check:
I cannot use
tmux with hpc-transfer1 or hpc-transfer2 with
module load tmux. Is there another way to load it with that filesystem or it’s not available there? Thanks.
Oh, right, on the transfer nodes it would be
module load usc tmux.
thank you! I will use that!
Does having parallel transfers slow down the process? for example, if I transfer some files with transfer1 and some other files with transfer2? Is this slower that each of those happening at different times?
I don’t think so, in general. The metadata load from having to read/write 100s or 1000s of files at the same time can be a bottleneck. Archiving them into a single tar file first reduces that issue, which will then speed up the transfer.