Flexible Partitioning of Scientific Workflows Using the JX Workflow Language
TimeTuesday, July 306:30pm - 8:30pm
LocationCrystal Foyer and Crystal B
DescriptionScientific workflows are typically expressed as a graph of logical tasks, each one representing a single program along with its input and output files. A conventional workflow manager transforms each logical task into a discrete batch job and submits it to an underlying execution system. However, converting every logical task into one batch job is not necessarily the most efficient partitioning of a workflow. By grouping multiple logical tasks into a single batch job, we may decrease data transfer, increase system utilization, and reduce the execution time of a workflow. This paper presents JX (JSON eXtended), a declarative language that can express complex workloads as an assembly of sub-graphs that can be partitioned in flexible ways. We present a case study of using JX to represent complex workflows for the Lifemapper biodiversity project. We evaluate partitioning approaches across several computing environments, including HTCondor at the University of Notre Dame, TACC Stampede2, and SDSC Comet, and show that a coarse partitioning results in faster turnaround times, reduced data transfer, and lower master utilization across all three systems.