Friday, December 3, 2010

How to restore data from aggregate snapshot

Today one of our user found himself in wet pants when he noticed his robocopy job has overwritten a folder, rather than appending new data to it. Being panicked he run to me looking for any tape or snapshot backup of his original data, which unfortunately wasn’t there as previously he confirmed that they don’t need any kind of protection.

Now at this time I had only place left where I can recover the data, aggregate level snapshots; so I looked at aggregate snapshots and saw it goes back to time when he had data in place. Knowing that the data deleted from volume is still locked in aggregate’s snapshot I was feeling good that I have done a good job by having some space reserved for aggregate level snapshot, which no one ever advocated.


Now the next step is to recover the data, but problem was that if I revert aggregate using “snap restore –A” then all the volumes in that aggregate will be reverted which will be bigger problem. So had to go on a different way, use aggregate copy function to copy the aggregate’s snapshot to an empty aggregate and then restore the data from there.


Here’s the cookbook for this.


Pre-checks:


  • The volume you lost data from is a flexible volume 
  • Identify an aggregate which is empty so it can be used for destination (could be on another controller also) 
  • Make sure the destination aggregate is either equal or larger than source aggregate 
  • /etc/hosts.equiv has entry for the filer you want to copy data to and /etc/hosts has its IP address added, in case of copying on same controller loopback address (127.0.0.1) should be added in /etc/hosts file and local filername should be in hosts.equiv file 
  • Name of aggregate’s snapshot which you want to copy 

Example:

Let’s say the volume we lost data was ‘vol1’, the aggregate which has this volume is ‘aggr_source’, the aggregate’s snapshot which has lost data is ‘hourly.1’ and empty aggregate where we will be storing data to is ‘aggr_destination’


Execution:


  • Restrict the destination aggregate using  ‘aggr restrict aggr_destination’ 
  • Start the aggregate data copy using ‘aggr copy start –s hourly.1 aggr_source aggr_destination’ 
  • Once the copy is completed online the aggregate using ‘aggr online aggr_destination’ 
  • If you have done copy on same controller, system will rename the volume ‘vol1’ of ‘aggr_destination’ to ‘vol1(1)’ 
  • Now export the volume or lun and you have your all lost data available.
So here’s the answer to another popular question, why do I need to reserve space for aggregate level snapshot. Do you have the answer now?

3 comments:

equals42 said...

We all know you can recover this way but not many shops have extra disk or empty aggrs lying about that they can restore to. NetApp seems to be moving us toward ever bigger aggrs which will make this less of an option going forward unless you have 16+ TBs of disk unused. I'm certainly not going to recommend that people keep aggr snapshots running hourly, keep aggr snap reserve at 5% and keep an empty aggr the size of your largest aggregate just for aggr recovery. Kinda crazy. I'm glad for you and your user that you had the extra space but for most clients, this won't fly.

Unknown said...

Yeah, I agree with your comment that not everyone does have a free aggregate to do a aggr copy for recovery of lost data from a volume or recovery of whole volume which was deleted accidentally, but yes you can do snap restore of whole aggr if feasible and most appealing reason is recovery when your aggr is corrupted and you have to run wafliron on your aggregate.

I always recommend having 2-3% space reserved for aggr snapshots.

Anonymous said...

Why not change your standards to always have volume snapshots enabled? If you use aggr snapshots with a few day retention you're 'paying for' the blocks locked in snapshots anyway. Better to have this retention (few days) at the vol level as standard and ease recovery when the unexpected happens. Maybe even require some special approval for customers requesting configs without snapshots.

My 2 cents.