The CARC’s new project file system, /project, is now ready for you to migrate your data from /home/rcf-proj. The deadline for migration is October 11 at midnight. After this deadline, /home/rcf-proj will be taken offline.
Note: These instructions can be used for migrating your data from /home/rcf-proj, /scratch, and /scratch2, but the examples used are specific to migrating from /home/rcf-proj. To migrate from /scratch or /scratch2: Use either hpc-transfer1 or hpc-transfer2, not hpc-transfer, in step 1. Then simply substitute the source path to your data in /scratch or /scratch2 in step 4.
How should I perform the data migration?
The hpc-transfer node has dedicated 40 GB/s links to both the old /home/rcf-proj and the new /project file systems, so data migration in and out is much faster than using other login nodes.
Because the data transfer could take a long time, we’ll use
screen in combination with
rsync to do the migration so that your progress doesn’t get interrupted if your SSH connection gets dropped.
We recommend having a designated person in your group who can lead this data migration effort. That way, you will avoid any confusion with respect to file permission issues or creating duplicated copies. Communication between research groups and the CARC support team will also be much clearer if you experience any trouble during the process.
Step 1: Log in to the transfer node
From your computer, log in to
hpc-transfer.usc.edu and authenticate via Duo:
ttrojan with your username, which is your USC NetID (the first part of your USC email address).
Note: If you get an SSH error about “remote host identification has changed” when attempting to connect, the solution is to clear your “known_hosts” file that is referenced in the error message. Open the “known_hosts” file in your .ssh directory under your /home (note the “.” in front of the directory name) and manually delete the line beginning with “hpc-transfer.usc.edu” and then save. Try the login command again, confirm the new authenticity of the host, and the error should no longer occur.
Step 2: Start a Screen session
screen at the command line to start a
screen session. This will allow the migration to continue if your SSH connection is dropped or you want to log out.
Step 3: Identify data to be migrated
Identify and document unwanted or re-generatable/re-downloadable data from your /home/rcf-proj directory, especially those folders with large amounts of small files (e.g., custom Python/R/Anaconda installations). These small files will slow down the migration substantially. Specific files and directories can be excluded from the transfer (see below).
myquota command to find the path to the project directory you belong to. It will be of the form:
PI_name is the username of the project owner,
xxx is a 2 or 3 digit project ID. You can also find the project ID and path on the project page in the User Portal.
If you belong to the project
<PI_name>_xxx, you will be able to create your own subdirectory under it:
Step 4: Start migration using rsync
rsync command that looks something like the following:
rsync -rltvP /home/rcf-proj/xxx/yyy/ /project/aaa/bbb/
Note: Using the
rsync -aoption will cause a group ID mismatch between the two systems, and leads to wrong quota info.
/home/rcf-proj will be your source and
/project will be your destination. Be sure to substitute your correct
/project directory paths, and pay attention to the trailing
/ in the paths. This will start the transfer and display its progress.
If there are specific files or directories that you do not need to transfer, add the
--exclude option and specify the files or directories with a relative path based on the source directory. For example, if you want to exclude
/home/rcf-proj/xxx/yyy/R, then use:
rsync -rltvP --exclude 'Anaconda' --exclude 'R' /home/rcf-proj/xxx/yyy/ /project/aaa/bbb/
Information on the exclude option: https://linuxize.com/post/how-to-exclude-files-and-directories-with-rsync/
Step 5: Detach from the Screen session
Press the keys
ctrl-a d to detach the Screen session, which will let the
rsync job run in the background. You can then continue other work or log out and the transfer will continue on hpc-transfer.
Step 6: Check on the migration process
To reattach your Screen session and check the migration progress, enter
Step 7: Migration is done - close Screen session
Once your data migration is complete, you can close the Screen session by entering
exit from within the session. Your data has now been migrated to the new project file system (/project).