Interactive Storage Layout Visualization in Distributed Storage

Supercomputers are typically accompanied by high-performance storage systems to provide fast access to large volumes of data. Files containing scientific data routinely exceed 100s of GiB to TiB in size, even for single timesteps. A single storage device alone, such as a hard-drive of an SSD, does not offer the required access performance characteristics nor the capacity to allow meaningful analysis across large volumes of data. For this reason, high-performance storage systems aggregate the performance of many (tens of thousands) devices. Typically, in the order of 50 to 100 disks are managed by so-called storage servers, and 100s of storage servers form a high-performance storage system which may feature 100s of petabytes of data at data rates reaching into the TiB/s. In practice, most applications will yield far lower data throughput performance as a result of suboptimal data distribution across available storage servers. In fact, by default, many files will not be spread out across multiple storage servers at all, which is a fair strategy for smaller files.

Most parallel file systems and many object stores offer command-line utilities to inspect and fine-tune how a file is striped across available storage targets. To achieve the optimal read and write performance, however, it becomes necessary to take the topological relationship of the compute allocation, the network as well as the storage system into account. The textual responses provided by existing storage APIs provide no intuitive representations which allow to quickly spot problematic data mappings nor would they take relationships to the rest of the supercomputer into account.

As part of this thesis you will develop a specification and implement a Python module to conveniently access, expose for reuse, and visualize this information. Besides static visualizations using matplotlib also interactive visualization components which integrate with Jupyter Notebooks should be considered. This topic allows you to advance your Python and Javascript (for visualization) knowledge while also getting in touch with real supercomputing resources and complex Linux environments. You will learn to comfortably navigate using Linux shell and command-line interfaces and make use of expert optimizations of, for example, Lustre-based file systems trusted by supercomputing sites around the world to manage PBs of data. You will also learn advanced techniques to express yourself using Jupyter notebooks and popular JavaScript visualization libraries such as D3.js or Three.js.

Contact: Michael Kuhn and Jakob Lüttgau

Last Modification: 11.01.2021 - Contact Person: Webmaster