Mapping POSIX I/O to NetCDF

Some I/O workloads produce many files to store different types of data. Depending on the underlying storage system, managing large amounts of (small) files can be problematic and can significantly impact performance. Self-describing data formats such as NetCDF can manage multiple datasets within a single file. One approach to solve this problem could be to use NetCDF to reduce the number of files by mapping directories to NetCDF groups and files to NetCDF variables.

As part of this thesis, you will implement a software solution that maps POSIX I/O to NetCDF. Two possibilities for implementation are a user-space file system using FUSE or a pre-loadable library that overwrites the necessary I/O functions. Due to this, the implementation will likely have to be done in a system language such as C, C++ or Rust.

https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html
https://github.com/libfuse/libfuse
http://www.goldsborough.me/c/low-level/kernel/2016/08/29/16-48-53-the_-ld_preload-_trick/

Contact: Michael Kuhn and Piet Jarmatz