There are a number of tools that can be used to transfer data from a source to a destination. The rsync utility efficiently syncs a source directory to a destination directory (local or remote) and uses checksums to verify data integrity. The fpsync utility uses parallel rsync processes to speed up data transfers. These tools can be used to efficiently transfer data from one CARC storage system to another. Examples are given below.
First, remove any source files that do not need to be transferred.
Then, submit a Slurm job like the following (for a data transfer from /project to /project2 using fpsync):
#!/bin/bash
#SBATCH --partition=main,epyc-64,oneweek
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=16G
#SBATCH --time=24:00:00
module purge
module load ver/2506
module load gcc/14.3.0
module load fpart/1.7.0
module load rsync/3.4.1
fpsync -vvv -t ~/fpsync-$SLURM_JOB_ID -o "-lt" -n $SLURM_CPUS_PER_TASK /project/ttrojan_123/ /project2/ttrojan_123/
This saves the transfer log files to your home directory at ~/fpsync-$SLURM_JOB_ID in case you need to view them after the transfer completes, but you can remove them once you are satisfied with the transfer.
If the transfer does not complete before the Slurm job times out, then simply resubmit the job and the transfer will resume where it had previously stopped.
With the rsync options used here, all files in the destination directory will be owned by you and group ownership will change to the destination’s default (e.g., the project group ttrojan_123).
Note that if you do not have permission to read certain files or directories from the source, then these files will not be transferred to the destination and there will be a message in the log indicating so. The owner of these files will need to change the file permissions or run their own transfer.
If you want to preserve hard links, then you will need to run an extra rsync transfer like the following:
#!/bin/bash
#SBATCH --partition=main,epyc-64,oneweek
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G
#SBATCH --time=24:00:00
module purge
module load ver/2506
module load gcc/14.3.0
module load rsync/3.4.1
rsync -rlHtvh /project/ttrojan_123/ /project2/ttrojan_123/
Finally, note that rsync (and fpsync) leave the source files untouched. If needed, you can delete the source files once you are satisfied with the transfer.