Tuesday, September 22, 2009

NetApp NFS agent for VCS - Part 3

In first post I wrote why I need this agent installed and what all are the features of this, and in last post I mentioned how to configure it on cluster node but that was incomplete because the post was going very big and I had to stop it, so here’s the remaining and very important part of that configuration.

How to configure different account name in NetApp NFS agent for VCS?

Hunting around in agent’s configuration guide from Veritas and NetApp didn’t reveal any result and even their KB search was not helpful. So I was left to choose my way and explore the stuff which I started with creating a new customized account on filer only for this purpose.

Here are the actual commands I used to create them starting from customized role to account.

‘useradmin role add exportfs -c "To manage NFS exports from CLI" -a cli-exportfs*,cli-lock*,cli-priv*,cli-sm_mon*’
‘useradmin group add cli-exportfs-group -r exportfs -c "Group to manage NFS exportfs from CLI"’
‘useradmin user add vcsagent -g cli-exportfs-group -c "To manage NFS exports from NetApp VCS Agent"’

And here’s the account after creation

testfiler1> useradmin user list vcsagent
Name: vcsagent
Info: To manage NFS exports from NetApp VCS Agent
Rid: 131090
Groups: cli-exportfs-group
Full Name:
Allowed Capabilities: cli-exportfs*,cli-lock*,cli-priv*,cli-sm_mon*
Password min/max age in days: 0/4294967295
Status: enabled

Now next thing was to give limited access to cluster node using vcsagent user and revoke its root access which was nothing more then removing dsa keys from /etc/sshd/root/.ssh/authorized_keys file and adding in /etc/sshd/vcsagent/.ssh/authorized_keys file.

After completing that I headed back to host and created a new file named config in .ssh directory of root with below content

Host testfiler1
User vcsagent
port 22
hostName testfiler1.lab.com

As a test I issued command “ssh testfiler1 version” on node terminal and I got access denied error which was perfectly fine because now when I do ‘ssh testfiler1’ system looks into config file in .ssh directory and uses vcsagent user which is not having access to run version command. Everything was looking good so I started running tests by moving resource from one node to another but to my surprise they were failing to make changes on filer and looking at filer audit logs it shown that they are still using root for ssh to filer.

Till the moment I didn’t run test I was thinking that agent is just relying to OS for ssh username as NetApp hasn’t set any username attribute in agent moreover as I haven’t configured in OS which account to use that’s why when agent executes command ‘ssh testfiler1 ’ OS directs the ssh connection to connect with root (cluster node’s local logged-in user).

But after going through my failed test it made me to believe that username is hardcoded in agent script so I started looking in script and soon found below line in file NetApp_VCS.pm

$cmd = "$main::ssh -n root\@$host '$remote_cmd'";’

After having this finding it was not a big brainer work to figure out what was going wrong and what I have to do. Just removed the word ‘root’ from script and it started working because now it is using config file from .ssh directory and uses vcsagent as username, alternatively I could have replaced word root with vcsagent directly in script also to make it simple and stay away from maintaining config file but I felt this to be much better.

Unfortunately till today there is no alternative apart from making changes in script as NetApp and Veritas both were not able to help us apart from a statement “we will raise a product enhancement request”.


Update: You need to give access "security-priv-advanced" also to user, so role should look like below.

testfiler01> useradmin role list exportfs

Name: exportfs

Info: To manage NFS exports from CLI

Allowed Capabilities: cli-exportfs*,cli-lock*,cli-priv*,cli-sm_mon*,security-priv-advanced


Monday, September 21, 2009

NetApp NFS agent for VCS - Part 2

In last post I have told why I need this agent installed and what all are the features of this, in this post I will write how I have implemented this agent on 4 node RHEL 5.2 VCS cluster in our test environment as this post is centred on NetApp NFS agent for VCS configuration therefore I will not talk about how to install and configure VCS on RHEL.

First I have created a NFS volume on our filer testfiler1, and exported it giving rw access to all 4 nodes of cluster (lablincl1n1, lablincl1n2, lablincl1n3, lablincl1n4) to keep it simple I used sec=sys rather than Kerberos or anything else. Next step was to download the agent from NOW site and install on all the cluster nodes, it was pretty straight forward and well documented in admin guide so no hurdles and went well.

Once agent installation and volume creation was done I started to configure NFS share in agent through GUI.

Updated FilerName as testfiler1 (which is the name of NetApp filer exporting NFS share), MountPoint attribute with local mount point name, MountOptions with Oracle specific mount options, FilerPathName with nfs volume name, NodeNICs with lablincl1n1, lablincl1n2, lablincl1n3, lablincl1n4 (name of all the nodes of cluster) and updated ClearNFSLocks to 2, UseSSH to 1,

I left rest all of the options untouched as they were good with their default values, like FilerPingTimeout=240, RebootOption=Empty, HostingFilerName=Empty, RebootOption=empty, RouteViaAddress=empty, along with MultiNIC and /etc/hosts file because NIC teaming was done at OS level and felt lazy to update lots of IP addresses in hosts file, as a matter of fact I knew that our BIND servers are robust enough.

Note:
Please don’t get confuse looking at HostingFilerName field as you need it only if you are using vfiler. If you are exporting NFS volume from vfiler then put vfiler name in FilerName field and physical filer name (on which vfiler is created) as HostingFilerName.

Now next step was configuring SSH which was pretty easy, just use “ssh-keygen -t dsa” command to generate public and private key of root from all your nodes and copy their public key “authorized_keys” file in folder /etc/sshd/root/.ssh of your filer.

Now configuration was completed and everything was working as expected just within 4 hrs of my effort.

At this point everything was completed except one very important thing i.e. security, as following agent’s admin guide I have added dsa keys in root’s authorized_keys file, therefore anyone having root access on any of 4 nodes of cluster will have root access on my filer also which I wasn’t comfortable at. So I started looking around in agent’s attributes to configure different account name used by agent but to my surprise nothing was there even none of the documents were speaking on that so I started going on my own way to solve it and it worked well after some extra effort.

Now as this post is going quite big so I will cover configuring different user name in VCS agent in next post.

Thursday, September 17, 2009

NetApp NFS agent for VCS

Recently again we found ourselves going through space crunch but this time it was on our DMX systems spinning 15k FC disks so we started looking around in space allocation and soon found a lots of low IOPS Oracle databases using these space and after adding their space allocation them came as 460TB in total.

WOW wasn’t that enough space to give us few more months to place new orders; Oh yes. So we decided to move them on NetApp boxes which are using 7.2k 1TB SATA disk storage but not on FC or iSCSI instead over NFS as I knew NetApp provides a VCS agent to work with their NFS export and gives some cool features. Though I have never used them but was confident enough that it will work, so I started implementing it in our test environment.

Here’s the detail of its features

The NetApp NFS client agent for VCS on Red Hat Linux/Solaris/SUSE Linux monitors the mount points on NetApp storage systems. In this environment, the clustered nodes or single node uses the NFS protocol to access the shared volume on NetApp storage systems and agent carries out the commands from VCS to bring resources on line, monitor their status, and take them off-line as needed.

Key Features for version 5.0 of agent are given below.
  • Supports VCS 4.1 and 5.0*
  • Supports Exportfs persistency
  • Supports IPMultiNIC and MultiNICA
  • Supports Data ONTAP 7.1.x or later
  • Supports fine granularity NFS lock clearing (requires Data ONTAP 7.1.1 or later)
  • Supports communication with the storage system through SSH, in addition to RSH
  • Multithreading (NumThreads >1) is supported (requires IPMultiNIC with MultiNICA)
  • Supports automatic fencing of export for ro access to other nodes in cluster as resource moves from one node to other
  • Supports failover of a single resource group when multiple resource groups of the same type are active on the same cluster node

Kernel Requirement
Linux Kernel 2.6.9-34.EL, 2.6.9-34.ELsmp for RHEL, 2.6.5-7.287.3-smp for SUSE

* VCS 4.1 is not supported for SUSE Linux
# With Solaris 10 local zones are also supported in addition to global zones.


In next part I will post how to implement it, which will need some modification to script also.


References:

Saturday, September 12, 2009

SSH broken if you disable Telnet in ontap 7.3.1

And here’s another bug which we hit a last month.

Last month when I was doing setup of our new filers I disabled telnet on the systems with along-with lots of other tweaking but later on when I tried to connect the system with SSH it refused. Thinking about that I might have turned off some other deep registry feature I went through entire registry but couldn’t find anything suspicious.

So I turned on SSH verbose login, tried to re-run SSH setup with different passkey sizes and what not, but no joy. Finally I tried with enabling telnet and voila it worked. By the time it worked it was around 7 pm so I called a day and left office scratching my head.

Next morning again I started looking around if there was something obvious I am missing but no, I couldn’t find anything even on NOW site, so I opened a case with NetApp and even NetApp guy was not able to understand why system is behaving like this, but finally in late evening that NetApp chap came to me with a BURT # 344484 which was fixed in 7.3.1.1P2.

Now there was a big problem as I wasn’t quite ready to upgrade my systems with a patched version so decided to let have telnet enable and wait for 7.3.2 to arrive. But since that time I was getting bugged with IT-security team because I was trying to get these systems connected in network so I can start allocating some space and get rid of space low warning but these guys were not allowing me because telnet was enabled on them. Finally past week when I noticed 7.3.2RC1 and 8.0RC1 availability on now site I got some sigh of relief as I believe now 7.3.2 GA should be available within a month and finally I can have my systems meeting my organization security policy more importantly I can get rid of pending space allocation request.