Some of my jobs cannot read /project file system. This problem does not happen at the beginning of training, instead, it randomly happens during the training time. Here is the error message.
This problem happens a lot of times since the last maintenance. I also submitted one ticket two weeks ago. Although the ticket is closed, but I am still facing with the same problem again.
Most of my jobs need to run for over 20 hours. If they fail randomly, I will have to wake up at midnight to resume the job, which is quite painful.