The research group that I'm working in is preparing for a move and will get a shiny new cluster using a Fraunhofer Parallel Cluster File System to store the bulk of the data. The FhGFS seems to work best for few large files since the metadata for all files needs to be distributed between the nodes.
Since a subset of our group has to work with many (millions of) small (< 1 MB) files, I'm worried about suboptimal performance in this setting. Do you think this will be a problem in practice? Any way to improve the performance?
I thought about letting users store groups of small files in Ext4 filesystems located in files on the FhGFS system and mounting them through FUSE, but I have no idea if this is a horrible idea performance-wise. Can anyone advise?
Edit: It looks like having an Ext4 filesystem mounted read-write through FUSE is not currently possible. Does anyone have a recommendation for a read-write filesystem that can be mounted through FUSE?
Thanks for your help!
[link][22 comments]