Quantcast
Channel: linuxadmin: Expanding Linux SysAdmin knowledge
Viewing all articles
Browse latest Browse all 17778

rsync; 12+ million files; 20TB data

$
0
0

Hello all,

I run a fairly significantly sized HPC cluster with a little more than 800 users. I just recently completed an update from a PanFS based parallel file storage appliance to a GPFS based parallel file storage appliance.

The upgrade went well, although slowly; Obviously the rsync's took several days to complete, not too bad with 10GbE and Infiniband. I used GNU parallel combined with an input list of all my users in order to parallelize the file transfer. I ran my parallel rsync command quite a few times coming up to the shutdown period where I could do the final transfer. After I kicked all the users off the cluster and stopped all the jobs, I ran the parallel rsync several times to be sure that everything was transferred.

After I allowed users back on the cluster, two users complained that they had files missing. I double checked the old PanFS appliance vs the GPFS appliance and sure enough they did have files missing. This incident prompted a bit of panic mode on my part as I scrambled to though all 800+ users to find if anyone else had files missing.

I fingered every user on the system to find users that hadn't logged in since the shutdown, then quickly wrote a script that did a count for every file they have in their home directory on the new GPFS appliance vs the old PanFS appliance. I found several users that were missing lots of files that the multiple rsync's missed.

Now I am a bit puzzled how this could have happened... worse yet, I use a very similar method of parallel rsyncs on my backup servers.

Can rsync simply not handle the volume of files I am feeding it? Has anyone else ran into this issue before?

Thanks!

submitted by xathor
[link][14 comments]

Viewing all articles
Browse latest Browse all 17778

Trending Articles