I/O of Scientific Workflows Monitored in Detail

Author:	Witzke, J., Lösser, A., Bountris, V., Scheuermann, B., Kundel, R., & Meuser, T.
Published in:	e-Science ’24: 20th IEEE International Conference on e-Science
Year:	2024
Type:	Academic articles
DOI:	10.1109/e-Science62913.2024.10678728

Correlating detailed local resource utilization data with the high-level concepts of distributed scientific workflow systems eventually causing it is challenging. When running a large-scale scientific data analysis workflow across a distributed execution environment, we want to analyze its I/O behaviour to identify potential bottlenecks. Since tasks are assigned to any available nodes, local resource usage on a node does not directly show which tasks are causing it. We acquire resource usage profiles of the involved nodes to link them to the individual workflow tasks. This is done by properly associating low-level trace metadata with high-level task information from log files and job management systems like Kubernetes. This information helps identifying areas of the workflow on a logical task level where improvements can make the biggest impact.

Visit publication