Hi all,
I’m a sysadmin in a Theoretical Physics Institute with ~70 workstations. They’re all running Ubuntu 12.04 or 14.04 and we have a queuing system for numerical calculations. Currently we export the spare capacity of the local HDDs (typically 200 to 800 GB) via NFS and every user has a directory on one of these NFS exports for large numerical results. Obviously this is not optimal for various reasons:
- Users who need very much space have to split their files across multiple workstations.
- The NFS export is unreachable while a machine reboots (or if a user pulls the network cord…).
- The data is lost in case of a HDD failure.
That’s why we’re looking for a distributed filesystem. We had a look at GlusterFS and Ceph but got some doubts whether these are the appropriate solutions.
- Gluster is really easy to set up. Bricks have to be added in pairs (for twofold replication) but that’s no problem. But as far as I can see metadata requests (e.g. ls) have to go to all hosts. I fear heavy performance problems (at least for interactive usage) when the cluster grows from 6 to 60 nodes.
- I’ve got some trouble with the gluster setup when a node went offline suddenly. The mounts on clients couldn’t be removed until I stopped and restarted the whole gluster volume. This would be unbearable in production.
- Ceph is much harder to configure. Furthermore it seems to target professional storage solution, i.e. homogeneous nodes which exclusively serve data. I am not sure whether Ceph would be appropriate in our situation (varying HDD capacity, storage only secondary use). I didn’t get the Ceph test cluster running yet.
Do you have any recommendations, either regarding one of the problems named above or maybe a completely different system I’m not aware of?
Best regards!
[link][7 comments]