Now at this time I had only place left where I can recover the data, aggregate level snapshots; so I looked at aggregate snapshots and saw it goes back to time when he had data in place. Knowing that the data deleted from volume is still locked in aggregate’s snapshot I was feeling good that I have done a good job by having some space reserved for aggregate level snapshot, which no one ever advocated.
Now the next step is to recover the data, but problem was that if I revert aggregate using “snap restore –A” then all the volumes in that aggregate will be reverted which will be bigger problem. So had to go on a different way, use aggregate copy function to copy the aggregate’s snapshot to an empty aggregate and then restore the data from there.
Here’s the cookbook for this.
Pre-checks:
- The volume you lost data from is a flexible volume
- Identify an aggregate which is empty so it can be used for destination (could be on another controller also)
- Make sure the destination aggregate is either equal or larger than source aggregate
- /etc/hosts.equiv has entry for the filer you want to copy data to and /etc/hosts has its IP address added, in case of copying on same controller loopback address (127.0.0.1) should be added in /etc/hosts file and local filername should be in hosts.equiv file
- Name of aggregate’s snapshot which you want to copy
Example:
Let’s say the volume we lost data was ‘vol1’, the aggregate which has this volume is ‘aggr_source’, the aggregate’s snapshot which has lost data is ‘hourly.1’ and empty aggregate where we will be storing data to is ‘aggr_destination’
Execution:
- Restrict the destination aggregate using ‘aggr restrict aggr_destination’
- Start the aggregate data copy using ‘aggr copy start –s hourly.1 aggr_source aggr_destination’
- Once the copy is completed online the aggregate using ‘aggr online aggr_destination’
- If you have done copy on same controller, system will rename the volume ‘vol1’ of ‘aggr_destination’ to ‘vol1(1)’
- Now export the volume or lun and you have your all lost data available.
3 comments:
We all know you can recover this way but not many shops have extra disk or empty aggrs lying about that they can restore to. NetApp seems to be moving us toward ever bigger aggrs which will make this less of an option going forward unless you have 16+ TBs of disk unused. I'm certainly not going to recommend that people keep aggr snapshots running hourly, keep aggr snap reserve at 5% and keep an empty aggr the size of your largest aggregate just for aggr recovery. Kinda crazy. I'm glad for you and your user that you had the extra space but for most clients, this won't fly.
Yeah, I agree with your comment that not everyone does have a free aggregate to do a aggr copy for recovery of lost data from a volume or recovery of whole volume which was deleted accidentally, but yes you can do snap restore of whole aggr if feasible and most appealing reason is recovery when your aggr is corrupted and you have to run wafliron on your aggregate.
I always recommend having 2-3% space reserved for aggr snapshots.
Why not change your standards to always have volume snapshots enabled? If you use aggr snapshots with a few day retention you're 'paying for' the blocks locked in snapshots anyway. Better to have this retention (few days) at the vol level as standard and ease recovery when the unexpected happens. Maybe even require some special approval for customers requesting configs without snapshots.
My 2 cents.
Post a Comment