CWLCon 2024 Session 2
Session 2 Schedule (UTC Time) APAC-Americas Friendly time
Session 2 (Online only)
Thursday 9 May / Friday 10 May
APAC-Americas Friendly time
Session 2 Schedule - CWLCon 2024
The discussions on handling data in cloud and distributed environments were particularly engaging. It's clear that the development environments, including various editors, are becoming more robust, making development more accessible.
One notable trend was presentations on using only the CommandLineTool component of CWL or writing a complete unit of Workflow and using it as an executable part. These presentations often included custom implementations for distributing tasks, combining them, or retrying errors. These are designed with the assumption that they will be deployed across multiple environments.
For tasks intended for multiple environments, it seems many environments already provide APIs. Therefore, the abstract parts are written first, followed by environment-specific adapters for control, similar to how Toil operates.
Tasks were discussed in terms of granularity:
- Per sample: Execution completes on a single compute node (instance) per file.
- Per hardware: Differentiates targets like FPGA and GPU.
- Per job queue system: Distributes tasks between systems like Slurm and Kubernetes.
JSON-Schemas for Validating Your CWL Code and CWL Code Inputs
This relates to the CWL development environment. Using JSON-LD definitions can become a powerful tool during CWL development. This information was previously discussed at the APAC EMEA meeting.
Joining regional meetings, if time permits, is recommended for such insights:
Community | Common Workflow Language (CWL)
NGS360 - A NGS Data Management and Analysis Platform
The task execution part employs CWL's CommandLineTool. They are developing an implementation called PAML for orchestrating sample inputs and error handling.
NGS360/PAML: Multi-Platform Launcher Framework
Currently, it supports:
- Arvados
- Seven Bridges
CWL @ ICA (Illumina Connected Analytics)
CWL @ ICA (Illumina Connected Analytics) - CWLCon 2024 - Common Workflow Language Discourse
I personally found it interesting how they use temporaryFailure to handle instances when a Spot Instance goes down. The way it was presented was very engaging. I am extremely interested in the internal implementation.
CWL-Enabled Reusable and Reproducible Genomic Data Management and Analysis in R
Discussion centered on using R. Notably, the implementation rworkflow/RcwlCloud allows submitting CWL jobs to cloud environments from R.
Backends include:
- Anvil
- CAVATICA
- CANCERGENOMICS Cloud
Working Environment features interactive apps like Rstudio and Jupyter, supporting both CWL and WDL workflows.
Performance Evaluation of GPU-intensive Genome Analysis Workflows in HPC and Cloud
This presentation involved benchmarking analysis pipelines using CWL, focusing on GPU usage. Future discussions may include GPU scheduling challenges.
Extending CWL for High-Performance Computing: A Visual Workflow System with HPC Enhancements
Discussions covered operational uses in supercomputing centers, particularly visualizing job submissions. The API provided in the supercomputing environment uses Slurm and Kubernetes.
zetako/cwl.go: CWL Parser and Runner.
Developed in Go
, this project has sparked considerable interest.
TODO
TODO: Need to follow up with Alexis regarding input file validation methods discussed previously.
Discussion