I've been struggling with this problem. So far I've done one 5TB transfer, and another is being shipped to me next week on an encrypted USB3 drive with an NTFS filesystem. This data needs to be copied over to an NFS share. Yes, small file I/O is a problem. How do you deal with this? It took over a month to complete the last 5TB transfer. Shouldn't have taken that long, but the transfer was interrupted 3 times. The last time it was interrrupted, it took over 36 hours for rsync to rebuild the incremental file list.
I've tried rsync and robocopy. Rsync performed better. Also, I am using a usb3 port. Disk write speeds seem to be around 65MB/s, but I'm only seeing an average of 12MB/s on nload output on the actual rsync transfer.
Another thing I tried was running 4 different rsync jobs on subdirectories. There are about 9 levels of subdirectories, each having progressively more files/folders. There was no performance improvement over 4 rysnc jobs vs 1.
It's probably not helping that the filesystem on the USB drive is NTFS and I'm using the fuse userspace driver in linux, but it was still faster than robocopy.
Any ideas on how I can make this next 5TB transfer take less than a month to complete? I feel like a total failure taking that long to do something so simple.
The best idea I have so far is to start out with a plain old cp <source> <NFS destination>, then if it fails to pick it back up with rsync.
The USB drive will be connected to a server with 16 cores and 32gb of ram (usb3), and the destination NFS share can easily hit gigabit line speeds on large single file transfers. I cannot find any other bottleneck beyond the fact that it's millions of very small files.
Any ideas on how to do this better?
[link] [comments]