Allocation in Epyc-64 partition is broken

I am trying to allocate a single epyc-64 partition job. I am not able to do that because, there are multiple jobs used by single users.
Single user is using more than 31 jobs in the same partition.
There seems to be no limits in that partition.
It is becoming extremely hard to run my experiments since most resources are consumed by few users.
The queue is also full with these multi-job requests by single user.
Squeue Output:

   10006632   epyc-64 sh_s1_c0 rtjohnso  R    5:59:27      1 b22-23
          10006633   epyc-64 sh_s2_c6 rtjohnso  R    5:59:27      1 b22-24
          10006634   epyc-64 sh_s5_c0 rtjohnso  R    5:59:27      1 b22-25
          10006636   epyc-64 sh_s5_c2 rtjohnso  R    5:59:27      1 a01-03
          10006637   epyc-64 sh_s2_c4 rtjohnso  R    5:59:27      1 a01-04
          10006639   epyc-64 sh_s5_c4 rtjohnso  R    5:59:27      1 b22-18
          10006640   epyc-64 sh_s1_c4 rtjohnso  R    5:59:27      1 b22-19
          10006641   epyc-64 sh_s2_c4 rtjohnso  R    5:59:27      1 b22-20
          10006642   epyc-64 sh_s4_c6 rtjohnso  R    5:59:27      1 b22-21
          10006643   epyc-64 sh_s2_c6 rtjohnso  R    5:59:27      1 b22-01
          10006644   epyc-64 sh_s5_c0 rtjohnso  R    5:59:27      1 b22-02
          10006646   epyc-64 sh_s2_c2 rtjohnso  R    5:59:27      1 b22-04
          10006647   epyc-64 sh_s1_c4 rtjohnso  R    5:59:27      1 b22-05
          10006618   epyc-64 sh_s5_c6 rtjohnso  R    5:59:30      1 a02-12
          10006619   epyc-64 sh_s1_c6 rtjohnso  R    5:59:30      1 a02-14
          10006622   epyc-64 sh_s2_c0 rtjohnso  R    5:59:30      1 a04-04
          10006623   epyc-64 sh_s1_c2 rtjohnso  R    5:59:30      1 b01-04
          10006624   epyc-64 sh_s5_c0 rtjohnso  R    5:59:30      1 b22-14
          10006625   epyc-64 sh_s2_c4 rtjohnso  R    5:59:30      1 a01-11
          10006627   epyc-64 sh_s1_c4 rtjohnso  R    5:59:30      1 a04-08
          10006628   epyc-64 sh_s5_c4 rtjohnso  R    5:59:30      1 a04-09
          10006629   epyc-64 sh_s2_c2 rtjohnso  R    5:59:30      1 a03-07
          10006630   epyc-64 sh_s1_c6 rtjohnso  R    5:59:30      1 a03-08
          10006631   epyc-64 sh_s5_c6 rtjohnso  R    5:59:30      1 a03-09
          10006650   epyc-64 sh_s1_c0 rtjohnso  R    5:15:30      1 a01-12
          10006652   epyc-64 sh_s2_c0 rtjohnso  R    5:02:05      1 b22-06
          10006651   epyc-64 sh_s5_c2 rtjohnso  R    5:05:31      1 a01-05
          10006653   epyc-64 sh_s1_c2 rtjohnso  R    4:20:10      1 a01-02
          10006654   epyc-64 sh_s2_c2 rtjohnso  R    3:36:33      1 a01-07
          10007204   epyc-64 2_concoc cykojima  R    1:54:31      1 a01-08
          10006655   epyc-64 sh_s5_c6 rtjohnso  R    3:24:18      1 a01-09
          10006656   epyc-64 sh_s1_c6 rtjohnso  R    3:16:39      1 a02-07
          10006386   epyc-64 interact  wentaoy  R    7:50:56      1 a02-17
          10007345   epyc-64 MMI_LMP. ziyuhuan  R       5:11      2 a04-07,b22-03
          10007344   epyc-64 MMI_LMP. ziyuhuan  R      17:17      3 a02-[17,19],a04-02
          10007343   epyc-64 MMI_LMP. ziyuhuan  R      17:27      2 a02-[09,19]
          10007339   epyc-64 MMI_LMP. ziyuhuan  R      23:19      3 b22-[07-09]
          10007340   epyc-64 MMI_LMP. ziyuhuan  R      23:19      2 b22-[09-10]
          10007341   epyc-64 MMI_LMP. ziyuhuan  R      23:19      3 b22-[10-12]
          10007342   epyc-64 MMI_LMP. ziyuhuan  R      23:19      5 a01-08,a03-18,b01-08,b22-[12,22]
          10007338   epyc-64 MMI_LMP. ziyuhuan  R      30:22      3 a01-14,a02-08,a03-18
           9984338   epyc-64      QC1 shafikov  R   23:18:08      1 b01-07
          10003133   epyc-64      QC1 shafikov  R    9:32:27      1 b01-02
          10003125   epyc-64      QC1 shafikov  R    9:36:30      1 a03-19
          10006247   epyc-64   run.sl  cuichen  R    8:56:50     17 a02-[02-05],a03-[02-05,11-14],a04-[16-19],b22-15
          10006825   epyc-64     G_16 homingla  R    4:05:19      1 a03-18
          10006826   epyc-64     G_17 homingla  R    4:05:19      1 b01-08
          10006823   epyc-64     G_14 homingla  R    4:13:50      1 a02-08
          10006824   epyc-64     G_15 homingla  R    4:13:50      1 a02-08
          10006818   epyc-64      G_9 homingla  R    4:16:59      1 a01-14
          10006819   epyc-64     G_10 homingla  R    4:16:59      1 a01-13
          10006820   epyc-64     G_11 homingla  R    4:16:59      1 a01-13
          10006821   epyc-64     G_12 homingla  R    4:16:59      1 a01-13
          10006822   epyc-64     G_13 homingla  R    4:16:59      1 a01-13
          10006809   epyc-64      G_0 homingla  R    4:30:50      1 a03-18
          10006810   epyc-64      G_1 homingla  R    4:30:50      1 a02-17
          10006811   epyc-64      G_2 homingla  R    4:30:50      1 a01-08
          10006812   epyc-64      G_3 homingla  R    4:30:50      1 b01-08
          10006813   epyc-64      G_4 homingla  R    4:30:50      1 b01-08
          10006814   epyc-64      G_5 homingla  R    4:30:50      1 a02-13
          10006815   epyc-64      G_6 homingla  R    4:30:50      1 a02-13
          10006816   epyc-64      G_7 homingla  R    4:30:50      1 a02-13
          10006817   epyc-64      G_8 homingla  R    4:30:50      1 a02-13
           9973796   epyc-64 15h-cd-f  sgumber  R 1-16:59:58      8 a01-[18-19],a03-[16-17],b22-[29-32]
          10006544   epyc-64   d62222    tjani  R    7:03:54      1 b22-22
          10006314   epyc-64 TCGA_chr  mpostel  R    7:20:51      1 a04-03
          10006313   epyc-64 TCGA_chr  mpostel  R    8:11:18      1 b22-28
          10006309   epyc-64 TCGA_chr  mpostel  R    8:11:51      1 b22-27
          10006308   epyc-64 TCGA_chr  mpostel  R    8:12:19      1 b22-26
          10006307   epyc-64 TCGA_chr  mpostel  R    8:12:51      1 b22-16
           9976310   epyc-64      100  souravd  R 1-06:50:32      1 a02-11
           9976306   epyc-64     1000  souravd  R 1-06:51:33      1 a02-16
           9976301   epyc-64      800  souravd  R 1-06:53:04      1 a01-17
           9976294   epyc-64      700  souravd  R 1-06:54:05      1 a01-16
           9976292   epyc-64      600  souravd  R 1-06:54:35      1 b22-17
           9976289   epyc-64      500  souravd  R 1-06:56:06      1 b22-13
           9976285   epyc-64      400  souravd  R 1-06:57:37      1 b01-03
           9976281   epyc-64      300  souravd  R 1-06:58:37      1 a04-05
           9976279   epyc-64      200  souravd  R 1-06:59:38      1 a02-18

Hi there,

As far as I can tell, the epyc-64 partition is working as intended although we are open to hearing suggestions for improvements for our policies.

Due to the shared nature of the Discovery cluster, it’s always going to be tough to balance the needs of all of our users. That said there are limits to the number of jobs and CPUs a user can. You can read more about it here https://www.carc.usc.edu/user-information/user-guides/hpc-basics/running-jobs (look under “Job Limits”)

I took a look at some of your current running jobs and it looks like they were in the queue for a few hours which I think is fairly reasonable given the size and duration of the jobs.

Let me know if you have any further questions/comments.

Best,
Cesar Sul
CARC