Author: |
Leser, U., Hilbrich, M., Draxl, C., Eisert, P., Grunske, L., Hostert, P., Kainmüller, D., Kao, O., Kehr, B., Kehrer, C., Koch, C., Markl, V., Meyerhenke, H., Rabl, T., Reinefeld, A., Reinert, K., Ritter, K., Scheuermann, B., Schintke, F., Schweikart, N., & Weidlich, M. |
Published in: |
Datenbank-Spektrum, 21, 255-260 |
Year: |
2021 |
Type: |
Academic articles |
DOI: |
https://doi.org/10.1007/s13222-021-00397-5 |
Today’s scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 “FONDA -– Foundations of Workflows for Large-Scale Scientific Data Analysis”, in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA’s internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the “making of” a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.