Job Manager is not responding
The job manager, SLURM is not responding to request to start new jobs. Submit a job request and SLURM responds with a job number but the job is never started nor shows
Check logs
2. Check /etc/hosts
to see if logs host name is in there
Note that the nodes cannot be seen in the file. This can be due to the Azure VMSS nodes restarting and not having the same name if it's hosted on an Azure VMSS
3. Check nodes with:
Should return:
4. Check if the slurmctld
is up and running
5. If the service is not running properly restart it or start/stop it
6. Check if service is running and the servers are up and processing jobs
Last updated