How to configure Cromwell Backends to run on HPC?

osinski · April 15, 2022, 6:11pm

Cromwell is not dealing well with singularity on the computing nodes with disabled internet.
I am not able to run or see your workflow, and error messages are telling me that singularity tries to build the container which should not occur unless the config file is incorrect

I got it to work today with my test case from gatk. While it still requires providing docker image URL, it actually uses downloaded image that was converted to sif

the steps to reproduce my setup are below (I adjusted the files to include your username and account):
Cromwell config (this version includes using sif files directly) : cromwellslurmsingularitynew.conf

# This line is required. It pulls in default overrides from the embedded cromwell
# `reference.conf` (in core/src/main/resources) needed for proper performance of cromwell.
include required(classpath("application"))

# Cromwell HTTP server settings
webservice {
  #port = 8000
  #interface = 0.0.0.0
  #binding-timeout = 5s
  #instance.name = "reference"
}

# Cromwell "system" settings
system {
  # If 'true', a SIGINT will trigger Cromwell to attempt to abort all currently running jobs before exiting
  #abort-jobs-on-terminate = false

  # If 'true', a SIGTERM or SIGINT will trigger Cromwell to attempt to gracefully shutdown in server mode,
  # in particular clearing up all queued database writes before letting the JVM shut down.
  # The shutdown is a multi-phase process, each phase having its own configurable timeout. See the Dev Wiki for more details.
  #graceful-server-shutdown = true

  # Cromwell will cap the number of running workflows at N
  #max-concurrent-workflows = 5000

  # Cromwell will launch up to N submitted workflows at a time, regardless of how many open workflow slots exist
  #max-workflow-launch-count = 50

  # Number of seconds between workflow launches
  #new-workflow-poll-rate = 20

  # Since the WorkflowLogCopyRouter is initialized in code, this is the number of workers
  #number-of-workflow-log-copy-workers = 10

  # Default number of cache read workers
  #number-of-cache-read-workers = 25

  io {
    # throttle {
    # # Global Throttling - This is mostly useful for GCS and can be adjusted to match
    # # the quota availble on the GCS API
    # #number-of-requests = 100000
    # #per = 100 seconds
    # }

    # Number of times an I/O operation should be attempted before giving up and failing it.
    #number-of-attempts = 5
  }

  # Maximum number of input file bytes allowed in order to read each type.
  # If exceeded a FileSizeTooBig exception will be thrown.
  input-read-limits {
    #lines = 128000
    #bool = 7
    #int = 19
    #float = 50
    #string = 128000
    #json = 128000
    #tsv = 128000
    #map = 128000
    #object = 128000
  }

  abort {
    # These are the default values in Cromwell, in most circumstances there should not be a need to change them.

    # How frequently Cromwell should scan for aborts.
    scan-frequency: 30 seconds

    # The cache of in-progress aborts. Cromwell will add entries to this cache once a WorkflowActor has been messaged to abort.
    # If on the next scan an 'Aborting' status is found for a workflow that has an entry in this cache, Cromwell will not ask
    # the associated WorkflowActor to abort again.
    cache {
      enabled: true
      # Guava cache concurrency.
      concurrency: 1
      # How long entries in the cache should live from the time they are added to the cache.
      ttl: 20 minutes
      # Maximum number of entries in the cache.
      size: 100000
    }
  }

  # Cromwell reads this value into the JVM's `networkaddress.cache.ttl` setting to control DNS cache expiration
  dns-cache-ttl: 3 minutes
}

docker {
  hash-lookup {
    # Set this to match your available quota against the Google Container Engine API
    #gcr-api-queries-per-100-seconds = 1000

    # Time in minutes before an entry expires from the docker hashes cache and needs to be fetched again
    #cache-entry-ttl = "20 minutes"

    # Maximum number of elements to be kept in the cache. If the limit is reached, old elements will be removed from the cache
    #cache-size = 200

    # How should docker hashes be looked up. Possible values are "local" and "remote"
    # "local": Lookup hashes on the local docker daemon using the cli
    # "remote": Lookup hashes on docker hub, gcr, gar, quay
    #method = "remote"
    enabled = "false"
  }
}

# Here is where you can define the backend providers that Cromwell understands.
# The default is a local provider.
# To add additional backend providers, you should copy paste additional backends
# of interest that you can find in the cromwell.example.backends folder
# folder at https://www.github.com/broadinstitute/cromwell
# Other backend providers include SGE, SLURM, Docker, udocker, Singularity. etc.
# Don't forget you will need to customize them for your particular use case.
backend {
  # Override the default backend.
  default = slurm

  # The list of providers.
  providers {
    # Copy paste the contents of a backend provider in this section
    # Examples in cromwell.example.backends include:
    # LocalExample: What you should use if you want to define a new backend provider
    # AWS: Amazon Web Services
    # BCS: Alibaba Cloud Batch Compute
    # TES: protocol defined by GA4GH
    # TESK: the same, with kubernetes support
    # Google Pipelines, v2 (PAPIv2)
    # Docker
    # Singularity: a container safe for HPC
    # Singularity+Slurm: and an example on Slurm
    # udocker: another rootless container solution
    # udocker+slurm: also exemplified on slurm
    # HtCondor: workload manager at UW-Madison
    # LSF: the Platform Load Sharing Facility backend
    # SGE: Sun Grid Engine
    # SLURM: workload manager

    # Note that these other backend examples will need tweaking and configuration.
    # Please open an issue https://www.github.com/broadinstitute/cromwell if you have any questions
    slurm {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        # Root directory where Cromwell writes job results in the container. This value
        # can be used to specify where the execution folder is mounted in the container.
        # it is used for the construction of the docker_cwd string in the submit-docker
        # value above.
        dockerRoot = "/cromwell-executions"

        concurrent-job-limit = 10
        # If an 'exit-code-timeout-seconds' value is specified:
        #     - check-alive will be run at this interval for every job
        #     - if a job is found to be not alive, and no RC file appears after this interval
        #     - Then it will be marked as Failed.
        ## Warning: If set, Cromwell will run 'check-alive' for every job at this interval
        exit-code-timeout-seconds = 360
        filesystems {
         local {
           localization: [
             # soft link does not work for docker with --contain. Hard links won't work
             # across file systems
             "copy", "hard-link", "soft-link"
           ]
            caching {
                  duplication-strategy: ["copy", "hard-link", "soft-link"]
                  hashing-strategy: "file"
            }
         }
        }



        #
        runtime-attributes = """
        Int runtime_minutes = 600
        Int cpus = 2
        Int requested_memory_mb_per_core = 8000
        String? docker
        String? partition
        String? account
        String? IMAGE
        """

        submit = """
            sbatch \
              --wait \
              --job-name=${job_name} \
              --chdir=${cwd} \
              --output=${out} \
              --error=${err} \
              --time=${runtime_minutes} \
              ${"--cpus-per-task=" + cpus} \
              --mem-per-cpu=${requested_memory_mb_per_core} \
              --partition=${partition} \
              --account=${account} \
              --wrap "/bin/bash ${script}"
        """

        submit-docker = """
            # SINGULARITY_CACHEDIR needs to point to a directory accessible by
            # the jobs (i.e. not lscratch). Might want to use a workflow local
            # cache dir like in run.sh
            source /scratch2/rklotz/set_singularity_cachedir.sh
            SINGULARITY_CACHEDIR=/scratch2/rklotz/singularity-cache
            echo "SINGULARITY_CACHEDIR $SINGULARITY_CACHEDIR"
            if [ -z $SINGULARITY_CACHEDIR ]; then
                CACHE_DIR=$HOME/.singularity
            else
                CACHE_DIR=$SINGULARITY_CACHEDIR
            fi
            mkdir -p $CACHE_DIR
                        echo "SINGULARITY_CACHEDIR $SINGULARITY_CACHEDIR"
            LOCK_FILE=$CACHE_DIR/singularity_pull_flock

            # we want to avoid all the cromwell tasks hammering each other trying
            # to pull the container into the cache for the first time. flock works
            # on GPFS, netapp, and vast (of course only for processes on the same
            # machine which is the case here since we're pulling it in the master
            # process before submitting).
            #flock --exclusive --timeout 1200 $LOCK_FILE \
            #    singularity exec --containall docker://${docker} \
            #    echo "successfully pulled ${docker}!" &> /dev/null

            # Ensure singularity is loaded if it's installed as a module
            #module load Singularity/3.0.1

            # Build the Docker image into a singularity image
            #IMAGE=${docker}.sif
            #singularity build $IMAGE docker://${docker}

            # Submit the script to SLURM
            sbatch \
              --wait \
              --job-name=${job_name} \
              --chdir=${cwd} \
              --output=${cwd}/execution/stdout \
              --error=${cwd}/execution/stderr \
              --time=${runtime_minutes} \
              ${"--cpus-per-task=" + cpus} \
              --mem-per-cpu=${requested_memory_mb_per_core} \
              --partition=${partition} \
              --account=${account} \
              --wrap "singularity exec --containall --bind ${cwd}:${docker_cwd} ${IMAGE} ${job_shell} ${docker_script}"
        """

        kill = "scancel ${job_id}"
        check-alive = "squeue -j ${job_id}"
        job-id-regex = "Submitted batch job (\\d+).*"
      }
    }

  }
}

Workflow: gatktest.wdl

workflow helloCountBasesCaller {
  call CountBasesCaller
}

task CountBasesCaller {
  String GATKcontainer
  String sampleName
  String partition
  String account
  String IMAGE
  File inputBAM

  command {
        gatk \
        CountBases \
        -I ${inputBAM} \
        > ${sampleName}.txt
  }
  output {
    File rawTXT = "${sampleName}.txt"
  }
  runtime {
    docker: "${GATKcontainer}"
    IMAGE: "${IMAGE}"
    partition: "${partition}"
    account: "${account}"
  }
}

json file with inputs: gatktest_inputs.json

{
  "helloCountBasesCaller.CountBasesCaller.inputBAM": "/project/biodb/NA12878.bam",
  "helloCountBasesCaller.CountBasesCaller.sampleName": "outdata",
  "helloCountBasesCaller.CountBasesCaller.GATKcontainer": "broadinstitute/gatk:4.2.3.0",
  "helloCountBasesCaller.CountBasesCaller.partition": "main",
  "helloCountBasesCaller.CountBasesCaller.account": "rklotz_600",
  "helloCountBasesCaller.CountBasesCaller.IMAGE": "/scratch2/rklotz/singularity-cache/pull/gatk_4.2.3.0.sif"
}

Then prepare for test run

On the login node:

mkdir -p /scratch2/rklotz/singularity-cache/pull
mkdir -p /scratch2/rklotz/singularity-cache/pull
cd /scratch2/rklotz
cat set_singularity_cachedir.sh <<EOF
#!/bin/bash
export SINGULARITY_CACHEDIR=/scratch2/rklotz/singularity-cache
export SINGULARITY_TMPDIR=$SINGULARITY_CACHEDIR/tmp
export SINGULARITY_PULLDIR=$SINGULARITY_CACHEDIR/pull
export CWL_SINGULARITY_CACHE=$SINGULARITY_PULLDIR
EOF
source set_singularity_cachedir.sh
singularity pull docker://broadinstitute/gatk:4.2.3.0

Start an interactive session:
salloc --nodes=1 --ntasks=4 --cpus-per-task=4 --mem=16GB --account=rklotz_600 --partition=main --time=8:00:00

Inside the interactive session:

module purge
module load USC openjdk
java -jar -Dconfig.file=/scratch2/rklotz/cromwell/cromwellslurmsingularitynew.conf /scratch2/rklotz/cromwell/cromwell-71.jar run /scratch2/rklotz/cromwell/gatktest.wdl -i /scratch2/rklotz/cromwell/gatktest_inputs.json

If that goes well and you see something like this:

[2022-04-15 11:09:07,31] [info] WorkflowExecutionActor-b3cdada3-6972-4d8d-94f1-96874c9533b1 [b3cdada3]: Workflow helloCountBasesCaller complete. Final Outputs:
{
  "helloCountBasesCaller.CountBasesCaller.rawTXT": "/home1/osinski/cromwell-executions/helloCountBasesCaller/b3cdada3-6972-4d8d-94f1-96874c9533b1/call-CountBasesCaller/execution/outdata.txt"
}
[2022-04-15 11:09:10,36] [info] WorkflowManagerActor: Workflow actor for b3cdada3-6972-4d8d-94f1-96874c9533b1 completed with status 'Succeeded'. The workflow will be removed from the workflow store.
[2022-04-15 11:09:12,95] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "helloCountBasesCaller.CountBasesCaller.rawTXT": "/home1/osinski/cromwell-executions/helloCountBasesCaller/b3cdada3-6972-4d8d-94f1-96874c9533b1/call-CountBasesCaller/execution/outdata.txt"
  },
  "id": "b3cdada3-6972-4d8d-94f1-96874c9533b1"
}
[2022-04-15 11:09:15,40] [info] Workflow polling stopped

Then once you adjust your WDL and json files (add IMAGE to the runtime section as in this test case) everything should work

Best regards,
Tomek

rklotz · April 18, 2022, 5:15pm

Dear Tomek,

Thank you very much for spending the time to put all your workflow together, really appreciate it! I tried to apply it with Smart-seq2 Multi-Sample v2.2.1 pipeline. First adding IMAGE in runtime section resulted for the workflow to fail in importing wdl tasks
Failed to import 'RSEM.wdl' (reason 1 of 1): Failed to process task definition 'RSEMExpression' (reason 1 of 1): Cannot lookup value 'IMAGE', it is never declared.
If I try to add IMAGE path in json file it failed because IMAGE is not supported as input in json file. If I try to replace

IMAGE: "${IMAGE}"

by

IMAGE: "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-rsem_v0.2.2-1.3.0.sif"

In the runtime section I get this error

Failed to process workflow definition 'MultiSampleSmartSeq2' (reason 1 of 1): Failed to process 'call single_cell_run.SmartSeq2SingleCell as sc_se' (reason 1 of 1): To be called as a sub-workflow it must declare and pass-through the following values via workflow inputs: HISAT2SingleEnd.IMAGE

Very difficult to handle a pipeline containing so many sub-tasks! I feel at this point I spent so much time on this I should give up and do it in the old school way and process every step (aligner, QC, count) separately.

osinski · April 18, 2022, 6:16pm

Hi,
were you able to run successfully the example gatk workflow I suggested in my previous post?
I need confirmation. Once it is established that Cromwell configuration is correct, it will be possible for you to adjust wdl and json files.
The error you got clearly states what the issue is, the variable IMAGE was never declared in the wdl file. If you take a look at the example gatk workflow that variable is declared at the beginning of the CountBasesCaller task.

Once again, I suggest we meet over zoom, so the issue can be debugged interactively. I have time tomorrow in the morning or you can come to our office hours and someone will assist you

rklotz · April 18, 2022, 9:57pm

Thank you, Tomek! Yes gatk workflow worked fine. I edited wdl and json files to include the variable IMAGE. As an example I am showing here one the the wdl task

task RSEMExpression {
  input {
    File trans_aligned_bam
    File rsem_genome
    String output_basename
    Boolean is_paired

    # runtime values
    String docker = "quay.io/humancellatlas/secondary-analysis-rsem:v0.2.2-1.3.0"
    String IMAGE
    Int machine_mem_mb = 32768
    Int cpu = 4
    # use provided disk number or dynamically size on our own, with 200GiB of additional disk
    Int disk = ceil(size(trans_aligned_bam, "GiB") + size(rsem_genome, "GiB") + 200)
    Int preemptible = 5
  }

  meta {
    description: "This task will quantify gene expression matrix by using RSEM. The output include gene-level and isoform-level results."
  }

  parameter_meta {
    trans_aligned_bam: "input transcriptome aligned bam"
    rsem_genome: "tar'd RSEM genome"
    output_basename: "basename used for output files"
    docker: "(optional) the docker image containing the runtime environment for this task"
    machine_mem_mb: "(optional) the amount of memory (MiB) to provision for this task"
    cpu: "(optional) the number of cpus to provision for this task"
    disk: "(optional) the amount of disk space (GiB) to provision for this task"
    preemptible: "(optional) if non-zero, request a pre-emptible instance and allow for this number of preemptions before running the task on a non preemptible machine"
  }

  command {
    set -e

    tar --no-same-owner -xvf ${rsem_genome}
    rsem-calculate-expression \
      --bam \
      ${true="--paired-end" false="" is_paired} \
       -p ${cpu} \
      --time --seed 555 \
      --calc-pme \
      --single-cell-prior \
      ${trans_aligned_bam} \
      rsem/rsem_trans_index  \
      "${output_basename}"
  }

  runtime {
    docker: docker
    account: "rklotz_600"
    partition: "main"
    IMAGE: "${IMAGE}"
    memory: "${machine_mem_mb} MiB"
    disks: "local-disk ${disk} HDD"
    cpu: cpu
    preemptible: preemptible
  }

  output {
    File rsem_gene = "${output_basename}.genes.results"
    File rsem_isoform = "${output_basename}.isoforms.results"
    File rsem_time = "${output_basename}.time"
    File rsem_cnt = "${output_basename}.stat/${output_basename}.cnt"
    File rsem_model = "${output_basename}.stat/${output_basename}.model"
    File rsem_theta = "${output_basename}.stat/${output_basename}.theta"
  }
}

Here is the json file

{
    "MultiSampleSmartSeq2.genome_ref_fasta": "/scratch1/rklotz/cromwell/GRCh38.primary_assembly.genome.fa",
    "MultiSampleSmartSeq2.rrna_intervals": "/scratch1/rklotz/cromwell/gencode.v27.primary_assembly.annotation.interval_list",
    "MultiSampleSmartSeq2.gene_ref_flat": "/scratch1/rklotz/cromwell/gencode.v27.primary_assembly.annotation.refflat.txt",
    "MultiSampleSmartSeq2.hisat2_ref_name": "hisat2_primary_gencode_human_v27",
    "MultiSampleSmartSeq2.hisat2_ref_trans_name": "hisat2_from_rsem_star_primary_gencode_human_v27",
    "MultiSampleSmartSeq2.hisat2_ref_index": "/scratch1/rklotz/cromwell/hisat2_primary_gencode_human_v27.tar.gz",
    "MultiSampleSmartSeq2.hisat2_ref_trans_index": "/scratch1/rklotz/cromwell/hisat2_from_rsem_star_primary_gencode_human_v27.tar.gz",
    "MultiSampleSmartSeq2.rsem_ref_index": "/scratch1/rklotz/cromwell/rsem_primary_gencode_human_v27.tar",
    "MultiSampleSmartSeq2.stranded": "NONE",
    "MultiSampleSmartSeq2.paired_end": false,
    "MultiSampleSmartSeq2.input_ids": ["B1", "B100"],
    "MultiSampleSmartSeq2.fastq1_input_files": [
        "/scratch1/rklotz/cromwell/B1_CKDL220004838-1a-AK31169-AK31170_HG5NCDSX3_L1_1.fq.gz",
        "/scratch1/rklotz/cromwell/B100_CKDL220004838-1a-AK31158-AK31159_HG5NCDSX3_L1_1.fq.gz"
    ],
    "MultiSampleSmartSeq2.batch_id": "ctc_paired_SSmulti",
    "MultiSampleSmartSeq2.SmartSeq2LoomOutput.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-loom-output_0.0.6-1.sif",
    "MultiSampleSmartSeq2.HISAT2PairedEnd.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-hisat2_v0.2.2-2-2.1.0.sif",
    "MultiSampleSmartSeq2.CollectRnaMetrics.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-picard_v0.2.2-2.10.10.sif",
    "MultiSampleSmartSeq2.GroupQCOutputs.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-sctools_v0.3.4.sif",
    "MultiSampleSmartSeq2.RSEMExpression.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-rsem_v0.2.2-1.3.0.sif",
    "MultiSampleSmartSeq2.HISAT2SingleEnd.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-hisat2_v0.2.2-2-2.1.0.sif",
    "MultiSampleSmartSeq2.CollectDuplicationMetrics.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-picard_v0.2.2-2.10.10.sif",
    "MultiSampleSmartSeq2.CollectMultipleMetrics.IMAGE": "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-picard_v0.2.2-2.10.10.sif"
}

This is the error I get

[2022-04-18 14:47:35,18] [info] WorkflowManagerActor: Workflow 7a199971-c137-408a-96a4-8860aeceeaae failed (during MaterializingWorkflowDescriptorState): cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
Failed to process workflow definition 'MultiSampleSmartSeq2' (reason 1 of 1): Failed to process 'call single_cell_run.SmartSeq2SingleCell as sc_se' (reason 1 of 1): To be called as a sub-workflow it must declare and pass-through the following values via workflow inputs: RSEMExpression.IMAGE, CollectMultipleMetrics.IMAGE, CollectRnaMetrics.IMAGE, HISAT2SingleEnd.IMAGE, CollectDuplicationMetrics.IMAGE, HISAT2PairedEnd.IMAGE, HISAT2Transcriptome.IMAGE, HISAT2SingleEndTranscriptome.IMAGE, SmartSeq2LoomOutput.IMAGE, GroupQCOutputs.IMAGE

We can Zoom tomorrow morning, thank you very much for your help!

rklotz · April 21, 2022, 11:17pm

Thanks to @osinski we made a lot of progress running cromwell! In WDL files I provided path to container IMAGE. Here is one example, I just applied something similar to all WDL

task RSEMExpression {
  input {
    File trans_aligned_bam
    File rsem_genome
    String output_basename
    Boolean is_paired

    # runtime values
    String docker = "quay.io/humancellatlas/secondary-analysis-rsem:v0.2.2-1.3.0"
    String IMAGE = "/scratch1/rklotz/singularity-cache/pull/secondary-analysis-rsem_v0.2.2-1.3.0.sif"
    Int machine_mem_mb = 32768
    Int cpu = 4
    # use provided disk number or dynamically size on our own, with 200GiB of additional disk
    Int disk = ceil(size(trans_aligned_bam, "GiB") + size(rsem_genome, "GiB") + 200)
    Int preemptible = 5
  }

  meta {
    description: "This task will quantify gene expression matrix by using RSEM. The output include gene-level and isoform-level results."
  }

  parameter_meta {
    trans_aligned_bam: "input transcriptome aligned bam"
    rsem_genome: "tar'd RSEM genome"
    output_basename: "basename used for output files"
    docker: "(optional) the docker image containing the runtime environment for this task"
    machine_mem_mb: "(optional) the amount of memory (MiB) to provision for this task"
    cpu: "(optional) the number of cpus to provision for this task"
    disk: "(optional) the amount of disk space (GiB) to provision for this task"
    preemptible: "(optional) if non-zero, request a pre-emptible instance and allow for this number of preemptions before running the task on a non preemptible machine"
  }

  command {
    set -e

    tar --no-same-owner -xvf ${rsem_genome}
    rsem-calculate-expression \
      --bam \
      ${true="--paired-end" false="" is_paired} \
       -p ${cpu} \
      --time --seed 555 \
      --calc-pme \
      --single-cell-prior \
      ${trans_aligned_bam} \
      rsem/rsem_trans_index  \
      "${output_basename}"
  }

  runtime {
    docker: docker
    account: "rklotz_600"
    partition: "main"
    IMAGE: "${IMAGE}"
	requested_memory_mb_per_core: 8192
    #disks: "local-disk ${disk} HDD"
    cpus: cpu
    #preemptible: preemptible
  }

  output {
    File rsem_gene = "${output_basename}.genes.results"
    File rsem_isoform = "${output_basename}.isoforms.results"
    File rsem_time = "${output_basename}.time"
    File rsem_cnt = "${output_basename}.stat/${output_basename}.cnt"
    File rsem_model = "${output_basename}.stat/${output_basename}.model"
    File rsem_theta = "${output_basename}.stat/${output_basename}.theta"
  }
}

Now the pipeline starts and most of the tasks ran successfully! However toward the end, I run into this error with GroupQCOutputs task.

ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_QCs.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_QCs.csv': Operation not permitted
ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_bait_bias_detail_metrics.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_bait_bias_detail_metrics.csv': Operation not permitted
ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_base_distribution_by_cycle_metrics.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_base_distribution_by_cycle_metrics.csv': Operation not permitted
ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_error_summary_metrics.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_error_summary_metrics.csv': Operation not permitted
ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_gc_bias.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_gc_bias.csv': Operation not permitted
ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_pre_adapter_detail_metrics.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_pre_adapter_detail_metrics.csv': Operation not permitted
ln: failed to create hard link '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/glob-277421a5bea4def6b2adf0b355442e80/B1_pre_adapter_summary_metrics.csv' => '/cromwell-executions/MultiSampleSmartSeq2/8feca656-12a2-4fc2-bf2d-961ce0f33eaa/call-sc_se/shard-0/SmartSeq2SingleCell/7619ec5a-f4a2-49fb-8bb2-d218514d9f67/call-GroupQCOutputs/execution/B1_pre_adapter_summary_metrics.csv': Operation not permitted

It wont create hard link for glob outputs. I am guessing this is not permitted by singularity? Hopefully this is something we can edit in the cromwell configuration file?

Thank you!

rklotz · April 22, 2022, 9:41pm

Update: the error happens when task tries to create glob() output like here:

output{
    Array[File] group_files = glob("${output_name}_*.csv")
  }

By default it uses hard link, but not permitted. I tried to force soft link in cromwell conf file:

glob-link-command = "ln -sL GLOB_PATTERN GLOB_DIRECTORY"

But next task failed to find file in glob output. Probably because soft link does not work with Singularity. I also tried “cached-copy” strategy in filesystems localization and caching. Does not fix the issue. Wondering whether the best approach would be to find a workaround to avoid glob()?

rklotz · April 26, 2022, 7:06pm

Update: Finally worked! To avoid glob() I manually listed all expected output files.

I replace

Array[File] group_files = glob("${output_name}_*.csv")

by

Array[File] group_files = ["${output_name}_QCs.csv", "${output_name}_bait_bias_detail_metrics.csv", "${output_name}_base_distribution_by_cycle_metrics.csv", "${output_name}_error_summary_metrics.csv", "${output_name}_gc_bias.csv", "${output_name}_pre_adapter_detail_metrics.csv", "${output_name}_pre_adapter_summary_metrics.csv"]