I am submitting an Array Job running from 0-154 in discovery cluster.
I am noticing that few log files are missing
For example, the log files for 0,1,2,3 are missing out of the 154 log files.
All the jobs are running and they are not in pending state.
Also noticing that certain jobs in the array are getting stuck forever, typically these jobs take 200 seconds to run.
I see some of them taking more than 20 minutes.
I tried running these jobs manually, they take only the expected 200 seconds.
Is there anything wrong with certain nodes ?
10222509_0 epyc-64 filter.j arunbaal R 12:24 | 1 a03-19 |
---|---|
10222509_1 epyc-64 filter.j arunbaal R 12:24 | 1 a03-19 |
(base) [arunbaal@discovery2 scripts]$ ls slurm-10222509_*
slurm-10222509_100.out slurm-10222509_129.out slurm-10222509_17.out slurm-10222509_46.out slurm-10222509_74.out
slurm-10222509_101.out slurm-10222509_12.out slurm-10222509_18.out slurm-10222509_47.out slurm-10222509_75.out
slurm-10222509_102.out slurm-10222509_130.out slurm-10222509_19.out slurm-10222509_48.out slurm-10222509_76.out
slurm-10222509_103.out slurm-10222509_131.out slurm-10222509_20.out slurm-10222509_49.out slurm-10222509_77.out
slurm-10222509_104.out slurm-10222509_132.out slurm-10222509_21.out slurm-10222509_4.out slurm-10222509_78.out
slurm-10222509_105.out slurm-10222509_133.out slurm-10222509_22.out slurm-10222509_50.out slurm-10222509_79.out
slurm-10222509_106.out slurm-10222509_134.out slurm-10222509_23.out slurm-10222509_51.out slurm-10222509_7.out
slurm-10222509_107.out slurm-10222509_135.out slurm-10222509_24.out slurm-10222509_52.out slurm-10222509_80.out
slurm-10222509_108.out slurm-10222509_136.out slurm-10222509_25.out slurm-10222509_53.out slurm-10222509_81.out
slurm-10222509_109.out slurm-10222509_137.out slurm-10222509_26.out slurm-10222509_54.out slurm-10222509_82.out
slurm-10222509_10.out slurm-10222509_138.out slurm-10222509_27.out slurm-10222509_55.out slurm-10222509_83.out
slurm-10222509_110.out slurm-10222509_139.out slurm-10222509_28.out slurm-10222509_56.out slurm-10222509_84.out
slurm-10222509_111.out slurm-10222509_13.out slurm-10222509_29.out slurm-10222509_57.out slurm-10222509_85.out
slurm-10222509_112.out slurm-10222509_140.out slurm-10222509_2.out slurm-10222509_58.out slurm-10222509_86.out
slurm-10222509_113.out slurm-10222509_141.out slurm-10222509_30.out slurm-10222509_59.out slurm-10222509_87.out
slurm-10222509_114.out slurm-10222509_142.out slurm-10222509_31.out slurm-10222509_5.out slurm-10222509_88.out
slurm-10222509_115.out slurm-10222509_143.out slurm-10222509_32.out slurm-10222509_60.out slurm-10222509_89.out
slurm-10222509_116.out slurm-10222509_144.out slurm-10222509_33.out slurm-10222509_61.out slurm-10222509_8.out
slurm-10222509_117.out slurm-10222509_145.out slurm-10222509_34.out slurm-10222509_62.out slurm-10222509_90.out
slurm-10222509_118.out slurm-10222509_146.out slurm-10222509_35.out slurm-10222509_63.out slurm-10222509_91.out
slurm-10222509_119.out slurm-10222509_147.out slurm-10222509_36.out slurm-10222509_64.out slurm-10222509_92.out
slurm-10222509_11.out slurm-10222509_148.out slurm-10222509_37.out slurm-10222509_65.out slurm-10222509_93.out
slurm-10222509_120.out slurm-10222509_149.out slurm-10222509_38.out slurm-10222509_66.out slurm-10222509_94.out
slurm-10222509_121.out slurm-10222509_14.out slurm-10222509_39.out slurm-10222509_67.out slurm-10222509_95.out
slurm-10222509_122.out slurm-10222509_150.out slurm-10222509_3.out slurm-10222509_68.out slurm-10222509_96.out
slurm-10222509_123.out slurm-10222509_151.out slurm-10222509_40.out slurm-10222509_69.out slurm-10222509_97.out
slurm-10222509_124.out slurm-10222509_152.out slurm-10222509_41.out slurm-10222509_6.out slurm-10222509_98.out
slurm-10222509_125.out slurm-10222509_153.out slurm-10222509_42.out slurm-10222509_70.out slurm-10222509_99.out
slurm-10222509_126.out slurm-10222509_154.out slurm-10222509_43.out slurm-10222509_71.out slurm-10222509_9.out
slurm-10222509_127.out slurm-10222509_15.out slurm-10222509_44.out slurm-10222509_72.out
slurm-10222509_128.out slurm-10222509_16.out slurm-10222509_45.out slurm-10222509_73.out
(base) [arunbaal@discovery2 scripts]$ ls slurm-10222509_* | grep slurm_1022509
(base) [arunbaal@discovery2 scripts]$ ls slurm-10222509_* | grep "0.out"
(base) [arunbaal@discovery2 scripts]$ ls slurm-10222509* | grep "10.out"
slurm-10222509_10.out
(base) [arunbaal@discovery2 scripts]$ ls slurm-10222509* | grep "0.out"
(base) [arunbaal@discovery2 scripts]$ ls slurm-10222509* | grep “_1.out”