I am trying to run lammps and i-PI on HPC. When I open two terminals and run two programs separately on the shell, they can find each other and transfer data back and forth. However, the jobs consume a lot of resources. So I submit two jobs using a slurm file, but they can not find each other once they are running on nodes. Does anyone have any suggestions? Thanks in advance.
Here are the input files of i-PI and lammps and their slurm script. The parts which are setting up the port and socket are highlighted.
When you say that you opened two terminals and ran lammps and i-PI, it seems like you ran them on the login node and that is why they were able to find each other. Is that correct?
If you want to run these programs as a slurm job you will have to decide if you want them to both run on the same node or on two separate nodes.
If a single compute node can support both programs running at the same time that is easiest. You can simply start each program and then have the job wait until each process is complete before terminating.
If you want to run one program on one compute node, and the other on a different compute node, you will need to find a way to tell each program which host to look on.
From slurm you can get the names of each compute node assigned to your job with the
$SLURM_NODELIST environment variable. You can parse this list to assign one host to run one program and the other host to run the other program. This assumes there is some kind of way to configure each program to know where to look for its partner.
From your job script, it looks like you are running them both on the same node. Was there an error message with your job?
Thank you for your reply. Yes, I am running it on the same node, I have figured out how to set up slurm script according to your suggestion. Here is what I am using.