Migrating your data from rcf-proj to the new project file system

The CARC’s new project file system, /project, is now ready for you to migrate your data from /home/rcf-proj. The deadline for migration is October 11 at midnight. After this deadline, /home/rcf-proj will be taken offline.

Note: These instructions can be used for migrating your data from /home/rcf-proj, /scratch, and /scratch2, but the examples used are specific to migrating from /home/rcf-proj. To migrate from /scratch or /scratch2: Use either hpc-transfer1 or hpc-transfer2, not hpc-transfer, in step 1. Then simply substitute the source path to your data in /scratch or /scratch2 in step 4.

How should I perform the data migration?

The hpc-transfer node has dedicated 40 GB/s links to both the old /home/rcf-proj and the new /project file systems, so data migration in and out is much faster than using other login nodes.

Because the data transfer could take a long time, we’ll use screen in combination with rsync to do the migration so that your progress doesn’t get interrupted if your SSH connection gets dropped.

We recommend having a designated person in your group who can lead this data migration effort. That way, you will avoid any confusion with respect to file permission issues or creating duplicated copies. Communication between research groups and the CARC support team will also be much clearer if you experience any trouble during the process.

Step 1: Log in to the transfer node

From your computer, log in to hpc-transfer.usc.edu and authenticate via Duo:

ssh ttrojan@hpc-transfer.usc.edu

Substitute ttrojan with your username, which is your USC NetID (the first part of your USC email address).

Note: If you get an SSH error about “remote host identification has changed” when attempting to connect, the solution is to clear your “known_hosts” file that is referenced in the error message. Open the “known_hosts” file in your .ssh directory under your /home (note the “.” in front of the directory name) and manually delete the line beginning with “hpc-transfer.usc.edu” and then save. Try the login command again, confirm the new authenticity of the host, and the error should no longer occur.

Step 2: Start a Screen session

Enter screen at the command line to start a screen session. This will allow the migration to continue if your SSH connection is dropped or you want to log out.

Information on screen: https://linuxize.com/post/how-to-use-linux-screen/

Step 3: Identify data to be migrated

Identify and document unwanted or re-generatable/re-downloadable data from your /home/rcf-proj directory, especially those folders with large amounts of small files (e.g., custom Python/R/Anaconda installations). These small files will slow down the migration substantially. Specific files and directories can be excluded from the transfer (see below).

Use the myquota command to find the path to the project directory you belong to. It will be of the form:

/project/<PI_name>_xxx

where PI_name is the username of the project owner, xxx is a 2 or 3 digit project ID. You can also find the project ID and path on the project page in the User Portal.

If you belong to the project <PI_name>_xxx, you will be able to create your own subdirectory under it:

mkdir /project/<PI_name>_xxx/username

Step 4: Start migration using rsync

Enter an rsync command that looks something like the following:

rsync -rltvP /home/rcf-proj/xxx/yyy/ /project/aaa/bbb/

Note: Using the rsync -a option will cause a group ID mismatch between the two systems, and leads to wrong quota info.

/home/rcf-proj will be your source and /project will be your destination. Be sure to substitute your correct /home/rcf-proj and /project directory paths, and pay attention to the trailing / in the paths. This will start the transfer and display its progress.

If there are specific files or directories that you do not need to transfer, add the --exclude option and specify the files or directories with a relative path based on the source directory. For example, if you want to exclude /home/rcf-proj/xxx/yyy/Anaconda and /home/rcf-proj/xxx/yyy/R, then use:

rsync -rltvP --exclude 'Anaconda' --exclude 'R' /home/rcf-proj/xxx/yyy/ /project/aaa/bbb/

Information on rsync: https://linuxize.com/post/how-to-use-rsync-for-local-and-remote-data-transfer-and-synchronization/

Information on the exclude option: https://linuxize.com/post/how-to-exclude-files-and-directories-with-rsync/

Step 5: Detach from the Screen session

Press the keys ctrl-a d to detach the Screen session, which will let the rsync job run in the background. You can then continue other work or log out and the transfer will continue on hpc-transfer.

Step 6: Check on the migration process

To reattach your Screen session and check the migration progress, enter screen -r.

Step 7: Migration is done - close Screen session

Once your data migration is complete, you can close the Screen session by entering exit from within the session. Your data has now been migrated to the new project file system (/project).

2 Likes

These instructions can be used for migrating your data from /home/rcf-proj, /scratch, and /scratch2, but the examples used are specific to migrating from /home/rcf-proj.

To migrate from /scratch or /scratch2: Use either hpc-transfer1 or hpc-transfer2, not hpc-transfer, in step 1. Then simply substitute the source path to your data in /scratch or /scratch2 in step 4.

I did the migration as described above. But when I login into HPC using hpc-login2.usc.edu, the /project directory is not there (whereas it is there when I login using hpc-transfer.usc.edu).

Hi @hrayrhar, this is intentional. We want to make sure that people get good transfer rates so access between the old /home/rcf-proj and new /project was only set up on hpc-transfer.usc.edu

Two ways to find the path to the project directory you belong to:

  1. Using the myquota command on the command line.
  2. On the User Portal homepage (hpcaccount.usc.edu):

I run into an rsync error code 23 saying that there are some files failed to be migrated, because of permission denied. How can I change permissions of these files in the rcf-proj?

Do I need to migrate my data under /home/rcf-proj2?

@kerenxu That sounds like you don’t have read permissions for those files. Can you check who owns those files/directories and what the permissions are? You can use ls -ld /path/to/file, for example. The owner can either transfer those files or change the permissions to allow you to transfer them, using the chmod or setfacl commands. You would need read permissions for files and read/execute permissions for directories.

1 Like

@yli272 Yes, files under any of the rcf-proj directories should be migrated.

1 Like

I migrated another user’s folder from /rcf-proj to the new /project/<PI_name> directory but it now has my user as the owner. How can I change the directory’s ownership back to the other user? I tried using chown but got an “Operation not permitted” even though I was the current owner.

I got the following error (please see attached screenshot)
How can I find the files not transferred? I migrated my entire rcf-proj/xxx folder so don’t know an efficient way of finding the missing/non-transferred files.
Thank you

1 Like

It’s possible some files were not transferred due to a permissions issue. You can check by re-running rsync and directing the output to a log file. Then you can search through the log to see which files weren’t transferred and check permissions.

You can do something like this:

rsync -rltvP /home/rcf-proj/xxx/yyy/ /project/aaa/bbb/ &> rsync_log

That should create a file named rsync_log which you can look through and find which files caused the issue.

1 Like

thank you, Cesar! i will do this

1 Like

I have an HSDA account, and I tried to transfer the files from our rcf-proj folder and it said permission denied. Am I supposed to transfer these files? I also have an rcf-40 account, will those files be okay

HSDA data should stay on the login-pd system. Any data saved in your home directory, /home/rcf-XX/username will need to be copied over to your new home directory, /home1/username, assuming it is not HSDA data.

I have been able to create a new directory under the new project directory, but for some reason when running the rsync command I can’t seem to access anything in the /rcf-proj folder. It just keeps saying no such file or directory. I know I have the right directory, because I can access and see in the rcf-proj while on the head node.

Can anyone point me in the right direction?

Thanks!

@rsschell Could you share the command you are using? Does it transfer some files? This could be a permissions issues. Are you the owner of the files?

Yes I am the owner. I am a ‘user’ in the /home/rcf-proj/ime folder but the owner of /home/rcf-proj/ime/rsschell

This is the command:
rsync -rltvP /home/rcf-proj/ime/rsschell/ /project/ehrenrei_315/rsschell/

This is the output/error:
sending incremental file list
rsync: change_dir “/home/rcf-proj/ime/rsschell” failed: No such file or directory (2)

sent 20 bytes received 12 bytes 64.00 bytes/sec
total size is 0 speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]

But also even ls or trying to cd to the /home/rcf-proj/ime/rsschell/ folder on the transfer node produces this error:

cannot access /home/rcf-proj/ime/rsschell/: No such file or directory

Meanwhile copy paste the same code on login2 I can look and cd to that folder.

We have extended the deadline to Sunday, October 11 before midnight . If you haven’t already done so, please complete your data migration this week/this weekend.

Okay, that all looks correct. Which transfer node are you on? This should work only from hpc-transfer.usc.edu (not hpc-transfer1/2).