On the old cluster, users could manually ssh into any node which they were allocated (ie: through sbatch or salloc). However, when I attempt this on the discovery cluster (ie: ssh b02-38), I get the message “Permission denied (publickey)”, even though I have a job actively running on that node. This used to be a beneficial option for checking if a submitted job has stalled (through “top”). Is there a way to do this, or are there plans to implement this feature? Thanks!
@bhcooper This is enabled, but perhaps there’s a configuration issue. Can you check if you have both a
~/.ssh/cluster file and
~/.ssh/cluster.pub file owned by you? And your
~/.ssh/config file should contain a line with
IdentityFile ~/.ssh/cluster. There should also be an
~/.ssh/authorized_keys file with a Warewulf key included. Also try testing a different partition/node, such as the
shared partition on Endeavour.
Thanks Derek! I was missing the Warewolf key in my authorized_keys, so I just copied over from cluster.pub.