Clustering,High Availability,How To-s,Linux August 5, 2012 9:02 pm

Ultimate NAS How-To

Step 3: Resource Configuration

First we need to start the corosync service followed by the pacemaker service. Bring up all your nodes then run the command crm_mon -1. It should give you output similiar to whats below:

============
Last updated: Wed Aug  8 16:17:14 2012
Last change: Wed Aug  8 16:17:09 2012 via crm_attribute on nas1-3
Stack: openais
Current DC: nas1-2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
0 Resources configured.
============

Online: [ nas1-1 nas1-2 nas1-3 ]

If you have problems with nodes coming online, review your configuration. Make sure you distributed /etc/corosync/authkey to all your nodes and make sure that each node has a unique nodeid and that the bindaddr is correct in addition to checking host resolution.
Next, we are going to create a dlm and an o2cb resource. Execute the command crm configure edit. That will drop you into a vim like interface for configuring cluster resources. Pacemaker stores its configuration in what is known as the CIB (Cluster Information Base). The CIB is formatted in XML, but creating a configuration by hand is complex and prone to error. The crm interface has a much simpler language for creating resource, constraints and other configuration. In addition the crm is also used to manage the cluster.
After you drop in the crm editing interface, enter the commands below and save the file. Again, it’s a vim interface so if you aren’t familiar with vim, find a cheat sheet!:

primitive p_dlm ocf:pacemaker:controld \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="120s"
primitive p_o2cb ocf:pacemaker:o2cb \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="120s"
clone c_dlm-clone p_dlm \
        meta globally-unique="false" interleave="true"
clone c_o2cb-clone p_o2cb \
colocation o2cb-with-dlm inf: c_o2cb-clone c_dlm-clone
        meta globally-unique="false" interleave="true"
order start-o2cb-after-dlm inf: c_dlm-clone c_o2cb-clone
property $id="cib-bootstrap-options" \
        no-quorum-policy="ignore" \
        stonith-enabled="false"

After you drop back to the bash shell, execute the crm_mon -1 command and you should see something similiar to this:

============
Last updated: Wed Aug  8 16:39:56 2012
Last change: Wed Aug  8 16:38:35 2012 via crmd on nas1-2
Stack: openais
Current DC: nas1-2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
6 Resources configured.
============

Online: [ nas1-2 nas1-3 nas1-1 ]

 Clone Set: c_dlm-clone [p_dlm]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Clone Set: c_o2cb-clone [p_o2cb]
     Started: [ nas1-1 nas1-2 nas1-3 ]

The commands configured installed two resources into the cluster. For example, run the command crm ra meta o2cb

OCFS2 daemon resource agent (ocf:pacemaker:o2cb)

This Resource Agent controls the userspace daemon needed by OCFS2.

Parameters (* denotes required, [] the default):

sysfs (string, [/sys/fs]): Sysfs location
    Location where sysfs is mounted

configfs (string, [/sys/kernel/config]): Configfs location
    Location where configfs is mounted

stack (string, [pcmk]): Userspace stack
    Which userspace stack to use.  Known values: pcmk, cman

daemon_timeout (string, [10]): Daemon Timeout
    Number of seconds to allow the control daemon to come up

Operations' defaults (advisory minimum):

    start         timeout=90
    stop          timeout=100
    monitor       timeout=20

Now run the same command for the controld resource:

DLM Agent for OCFS2 (ocf:pacemaker:controld)

This Resource Agent can control the dlm_controld services needed by ocfs2.
It assumes that dlm_controld is in your default PATH.
In most cases, it should be run as an anonymous clone.

Parameters (* denotes required, [] the default):

args (string, [-q 0]): DLM Options
    Any additional options to start the dlm_controld service with

configdir (string, [/sys/kernel/config]): Location of configfs
    The location where configfs is or should be mounted

daemon (string, [dlm_controld.pcmk]): The daemon to start
    The daemon to start - supports gfs_controld(.pcmk) and dlm_controld(.pcmk)

Operations' defaults (advisory minimum):

    start         timeout=90
    stop          timeout=100
    monitor       interval=10 timeout=20 start-delay=0

Pacemaker resources are essentially scripts that can be found in /usr/lib/ocf/resource.d. As you can see from running the crm ra meta command, the resources we configured are required to use OCFS2. The o2cb resource allows the cluster to control the OCFS2 user space daemon and the controld ra controls the dlm_controld service(s) which are essentially an interfaces to the Linux kernel’s distributed lock management since the cluster stack run in userspace.
We configured these resources using the primitive command and then cloned them. I won’t try to explain away the concept of because you can read all about them here, but essentially they allow you to globally and indiscriminately (anonymously) run the resource on all nodes. We then added an ordering constraint that instructs the cluster in what order it should start the resources, that the configured resources should always be co-located and modified the cluster so that stonith is disabled (for the time being) and lack of quorum does not result in resources shutting down/stopping (needed if you aren’t running at least 3 nodes).

At this point, since we have the required services running, we can create an OCFS2 filesystem. For example, the command mkfs.ocfs2 -J block64 -C 16K --fs-feature-level=max-features /dev/xvdc1 will create an OCFS2 filesystem on /dev/xvdc1 with 64-bit journal, a cluster-size of 16K and all the filesystem features possible. I detailed some of the features in my first how-to entry.

Note: If you aren’t using the uek-kernel from Oracle’s repository and the latest ocfs2-tools, then you won’t be able to use the “max-features” value for --fs-features

Create a filesystem then try mounting it on one of your nodes. If you don’t get any errors, then proceed to the crm‘s configuration editing shell to append to your configuration a filesystem resource, a clone of that resource and additional constraints:

primitive p_sharedFS1 ocf:heartbeat:Filesystem \
        params options="acl,localalloc=16,atime_quantum=86400,data=writeback" device="/dev/xvdc2" directory="/srv/samba/shares/data" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
clone c_sharedFS1-clone p_sharedFS1 \
        meta globally-unique="false" interleave="true"
order start-sharedFS1-after-o2cb inf: c_o2cb-clone c_sharedFS1-clone

This configuration block tells the cluster to create a heartbeat class Filesystem resource and provides it all the options that you’d need to provide if you were going to mount it yourself. We then clone that resource so it can run it anonymously cluster-wide and instruct the cluster that the c_sharedFS1-clone resource should be started after the c_o2cb_clone resource.

Once again save the configuration and run the command crm_mon -1. It should look something like this:

============
Last updated: Wed Aug  8 17:59:30 2012
Last change: Wed Aug  8 17:58:32 2012 via cibadmin on nas1-1
Stack: openais
Current DC: nas1-2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
9 Resources configured.
============

Online: [ nas1-2 nas1-3 nas1-1 ]

 Clone Set: c_dlm-clone [p_dlm]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Clone Set: c_o2cb-clone [p_o2cb]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Clone Set: c_sharedFS1-clone [p_sharedFS1]
     Started: [ nas1-1 nas1-2 nas1-3 ]

Now that we have a working filesystem, we should proceed to configuring CTDB and at least one dedicated IP address for our cluster.
Since CTDB manages samba and it’s related services (winbind and nmbd) in addition to NFS, you’ll need to ensure that your /etc/samba/smb.conf is all you want it to be and that your /etc/exports reflects any filesystems you want to export. This part is a little complicated, so here is an overview of what we are going to do:

  1. Put all cluster nodes minus one on standby
  2. Configure csync2
  3. Configure a non-cloned IP resource including a portblock/unblock resource
  4. Add and configure a CTDB resource clone and necessary constraints
  5. Manually start the smb service and add the NAS cluster to Active Directory
  6. Manually stop the smb service and Modify the CTDB resource
  7. Validate the configuration

To put a node in standby, execute the command crm node stanby [node-name]. Again, do that for all your nodes save one.

We now need to create a configuration file for csync2. Csync2’s role will be to keep our configuration files are kept in sync across the cluster and that the tickle directory (yes, yes, I’ll get to explaining this 🙂 ) is kept in sync. Here’s a copy of my config to get you started. Copy it to /etc/csync2.cfg.

nossl * *;
group nas-group1 {
 host nas1-1;
 host nas1-2;
 host nas1-3;
 key /etc/csync2/ticklegroup.key;
 include /etc/csync2.cfg;
 include /dev/shm/tickle;
 include /etc/samba/smb.conf;
 include /etc/exports;
 include /etc/vsftpd/vsftpd.conf;
 auto younger;
 }

The file is self explanatory to some degree. The config basically says “sync these files across this group. No SSL is needed and automatically resolve conflicts using the file with the most recent time stamp (hence the reason I decided to change the CTDB resource agent).
You’ll notice that I’ve included /dev/shm/tickle to be synced as well. This directory will need to be created every time at boot so add an entry into /etc/rc.local to do it.

Now we’ll go back to the crm editor to append the rest of the resource configuration.

primitive p_block-ip1 ocf:heartbeat:portblock \
        params ip="172.24.100.15" protocol="tcp" portno="137,138,139,445,2049,595,596,597,598,599" action="block" \
        op monitor interval="10" timeout="60" depth="0"
primitive p_unblock-ip1 ocf:heartbeat:portblock \
        params ip="172.24.100.15" protocol="tcp" portno="137,138,139,445,2049,595,596,597,598,599" action="unblock" tickle_dir="/dev/shm/tickle" sync_script="/usr/sbin/csync2 -xvr" \
        op monitor interval="10" timeout="60" depth="0"
primitive p_ip1 ocf:heartbeat:IPaddr2 \
        params ip="172.24.100.15" cidr_netmask="24" \
        op monitor interval="5s"
primitive p_ctdb ocf:heartbeat:CTDB \
        params ctdb_recovery_lock="/srv/samba/shares/data/ctdb.lock" ctdb_manages_nfs="yes" ctdb_manages_samba="no" ctdb_manages_winbind="no" ctdb_start_as_disabled="no" \
        op monitor interval="10" timeout="30" \
        op start interval="0" timeout="360" \
        op stop interval="0" timeout="100"
primitive p_vsftpd lsb:vsftpd \
        op start interval="0" timeout="60s" \
        op monitor interval="5s" timeout="20s" \
        op stop interval="0" timeout="60s"
group g_ip1-group p_block-ip1 p_ip1 p_unblock-ip1
clone c_ctdb-clone p_ctdb \
        meta globally-unique="false" interleave="true"
clone c_vsftpd-clone p_vsftpd \
        meta globally-unique="false" interleave="true"
colocation ip1-with-ctdb-vsftpd-sharedFS1 inf: ( g_ip1-group ) c_ctdb-clone c_vsftpd-clone c_sharedFS1-clone
order start-block-ip1-after-sharedFS1 inf: c_sharedFS1-clone p_block-ip1
order start-ip1-after-block-ip1 inf: p_block-ip1 p_ip1

Save this configuration then run crm_mon -1 again. For example, here is my present configuration:

============
Last updated: Fri Aug 10 10:10:06 2012
Last change: Fri Aug 10 10:04:47 2012 via cibadmin on nas1-1
Stack: openais
Current DC: nas1-2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
18 Resources configured.
============

Node nas1-2: standby
Node nas1-3: standby
Online: [ nas1-1 ]

 Clone Set: c_dlm-clone [p_dlm]
     Started: [ nas1-1 ]
     Stopped: [ p_dlm:1 p_dlm:2 ]
 Clone Set: c_o2cb-clone [p_o2cb]
     Started: [ nas1-1 ]
     Stopped: [ p_o2cb:1 p_o2cb:2 ]
 Clone Set: c_sharedFS1-clone [p_sharedFS1]
     Started: [ nas1-1 ]
     Stopped: [ p_sharedFS1:1 p_sharedFS1:2 ]
 Resource Group: g_ip1-group
     p_block-ip1        (ocf::heartbeat:portblock):     Started nas1-1
     p_ip1      (ocf::heartbeat:IPaddr2):       Started nas1-1
     p_unblock-ip1      (ocf::heartbeat:portblock):     Started nas1-1
 Clone Set: c_ctdb-clone [p_ctdb]
     Started: [ nas1-1 ]
     Stopped: [ p_ctdb:1 p_ctdb:2 ]
 Clone Set: c_vsftpd-clone [p_vsftpd]
     Started: [ nas1-1 ]
     Stopped: [ p_vsftpd:1 p_vsftpd:2 ]

Let me try to explain what we did. We created and configured a CTDB resource as well as a VSFTP resource and cloned it. If you look closely at the crm configuration for the vsftpd primitive, youll notice that its a lsb class meaning that Pacemaker will just use the standard scripts found in /etc/init.d to manage the resource.
We also created an IPAddr2, portblock and unblock resource then grouped them together. Notice that we didn’t clone that resource. This is because this type of IPAddr2 resource should only run on a single node.
The portblock and unblock resource essentially speeds up recovery. They work together and use iptables in addition to some IP magic when failover occurs or the IP resource migrates. You’ll notice that it uses the csync2 command to sync it’s “tickle_dir”. If you examine the directory, you should have a file that corresponds to each one of your IPs. These files contain a list of connections for each IP address. When a migration or failover happens, the node that takes over for that IP sends an ack to all clients so they switch over. It’s a proactive approach and should be much faster than waiting for clients to timeout then arping.
We also needed some constraints to ensure that resources start in the right order and that chaos doesn’t occur. For example, you don’t want nodes running an IP address resource if the filesystem is not mounted or services aren’t running. That would be bad for business. The constraints work to that effect, assuming you define them correctly ;).
The most interesting constraint we defined is colocation ip1-with-ctdb-vsftpd-sharedFS1 inf: ( g_ip1-group ) c_ctdb-clone c_vsftpd-clone c_sharedFS1-clone. This is referred to as a resource set, though its kinda difficult to get the concept unless you examine the XML in the cib. Execute the command cibadmin --query --scope constraints. For example:


        
        
                
                        
                
                
                        
                        
                        
                
        
    
    
    
    
    

A resource set is sort of like a resource group. You can read more about it here, but the basic concept behind them is that they allow you to create simple or complex dependencies. A plain text explanation of our configuration would probably read as follows:

  1. In order for the shared filesystem, ctdb and vsftpd to run, those three resources must run concurrently (this would not be the case if “sequential=false”).
  2. In order for the resource group “g_ip1_group” to start, the shared filesystem, ctdb and vsftpd must have already been successfully started”.

Hopefully that makes sense? Either way, lets move on as now we need to add the cluster to the domain.

In order to add the cluster to the domain, we need to manually start samba first. Execute the /etc/init.d/smb start command then run the net ads join -U [Member-of-Domain-Admins]. Run that command and the output should be just like my output below:

[root@nas1-1 events.d]# net ads join -U Administrator
Enter Administrator's password:
Using short domain name -- FOO
Joined 'NAS-CLUSTER1' to realm 'foo.local'
Not doing automatic DNS update in aclustered setup.

The message Not doing automatic DNS update in aclustered setup. lets you know that Samba recognizes that you are clustering and your tdb’s will be setup properly.
Otherwise, you will have problems when we reconfigure the CTDB resource to manage smb and winbind. Speaking of that, lets go ahead and do that right now.
Go ahead and manually stop samba then top execute crm resource stop c_ctdb-clone. Then enter the vim like interface to edit your cluster config and change ctdb_manages_samba="no" to ctdb_manages_samba="yes" and ctdb_manages_winbind="no" to ctdb_manages_winbind="yes". Save the configuration then start the c_ctdb-clone and quickly tail the file /var/log/ctdb and tail the log.ctdb file. You should see similar lines below:

2012/08/10 11:59:57.173244 [17181]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2012/08/10 11:59:57.873023 [recoverd:17243]: The rerecovery timeout has elapsed. We now allow recoveries to trigger again.
2012/08/10 11:59:57.875639 [17181]: Forced running of eventscripts with arguments ipreallocated
2012/08/10 11:59:58.174155 [17181]: CTDB_WAIT_UNTIL_RECOVERED
2012/08/10 11:59:58.174246 [17181]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2012/08/10 11:59:59.174660 [17181]: CTDB_WAIT_UNTIL_RECOVERED
2012/08/10 11:59:59.174726 [17181]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2012/08/10 12:00:00.175457 [17181]: CTDB_WAIT_UNTIL_RECOVERED
2012/08/10 12:00:00.175529 [17181]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2012/08/10 12:00:01.175943 [17181]: CTDB_WAIT_UNTIL_RECOVERED
2012/08/10 12:00:01.176065 [17181]: ctdb_recheck_presistent_health: OK[7] FAIL[0]
2012/08/10 12:00:01.176079 [17181]: server/ctdb_monitor.c:310 Recoveries finished. Running the "startup" event.
2012/08/10 12:00:01.540170 [17181]: 50.samba: Starting Winbind services: [  OK  ]
2012/08/10 12:00:01.832467 [17181]: 50.samba: Starting SMB services: [  OK  ]
2012/08/10 12:00:01.961015 [17181]: 50.samba: Starting NMB services: [  OK  ]
2012/08/10 12:00:02.087247 [17181]: 60.nfs: Shutting down NFS daemon: [FAILED]
2012/08/10 12:00:02.119600 [17181]: 60.nfs: Shutting down NFS mountd: [FAILED]
2012/08/10 12:00:02.161889 [17181]: 60.nfs: Shutting down NFS quotas: [FAILED]
2012/08/10 12:00:02.334433 [17181]: 60.nfs: Stopping NFS statd: [FAILED]
2012/08/10 12:00:02.464439 [17181]: 60.nfs: Starting NFS statd: [  OK  ]
2012/08/10 12:00:02.639363 [17181]: 60.nfs: Starting NFS services:  [  OK  ]
2012/08/10 12:00:02.661867 [17181]: 60.nfs: Starting NFS quotas: [  OK  ]
2012/08/10 12:00:02.676858 [17181]: 60.nfs: Starting NFS mountd: [  OK  ]
2012/08/10 12:00:02.901706 [17181]: 60.nfs: Stopping RPC idmapd: [  OK  ]
2012/08/10 12:00:03.016523 [17181]: 60.nfs: Starting RPC idmapd: [  OK  ]
2012/08/10 12:00:03.052096 [17181]: 60.nfs: Starting NFS daemon: [  OK  ]
2012/08/10 12:00:03.162077 [17181]: startup event OK - enabling monitoring
2012/08/10 12:00:05.651235 [17181]: monitor event OK - node re-enabled
2012/08/10 12:00:05.651896 [17181]: Node became HEALTHY. Ask recovery master 0 to perform ip reallocation
2012/08/10 12:00:05.886716 [17181]: Forced running of eventscripts with arguments ipreallocated

Notice that it appears that CTDB successfully started samba and winbind, but the proof is in the pudding. Run the command wbinfo -g. You should get back a list of your AD domains. Also, if you run crm_mon, you’ll notice that all your resources are now running.
If you’ve confirmed all that, then we are really to bring the other nodes online. First, stop the CTDB resource using the command crm resource stop c_ctdb-clone. Wait a few moments after running that command. All your resources except dlm and controld should be stopped (because of the constraints).
Now execute the command crm node online [node-name] for all your nodes on standby then start the c_ctdb-clone resource. It may take some time for CTDB to startup on some of your nodes. You can look at the log.ctdb file to see if you have any errors.
Once everything starts up, here is what it should look like:

============
Last updated: Fri Aug 10 13:00:33 2012
Last change: Fri Aug 10 12:59:49 2012 via cibadmin on nas1-1
Stack: openais
Current DC: nas1-2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
18 Resources configured.
============

Online: [ nas1-2 nas1-3 nas1-1 ]

 Clone Set: c_dlm-clone [p_dlm]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Clone Set: c_o2cb-clone [p_o2cb]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Clone Set: c_sharedFS1-clone [p_sharedFS1]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Resource Group: g_ip1-group
     p_block-ip1        (ocf::heartbeat:portblock):     Started nas1-1
     p_ip1      (ocf::heartbeat:IPaddr2):       Started nas1-1
     p_unblock-ip1      (ocf::heartbeat:portblock):     Started nas1-1
 Clone Set: c_ctdb-clone [p_ctdb]
     Started: [ nas1-1 nas1-2 nas1-3 ]
 Clone Set: c_vsftpd-clone [p_vsftpd]
     Started: [ nas1-1 nas1-2 nas1-3 ]

Phew – that was a long road huh? Still not done yet though. Lets take this baby for a spin!

Tags:

7 Comments

  • Excellent.

    But I would like to see a samba ctdb only from you.

    Possible ? 🙂

    • I could, but samba already has a pretty good explanation of how to do it at ctdb.samba.org. Not to mention, there are many reasons why you would not want to run ctdb, samba and a cluster filesystem without a full blown cluster-stack.

  • Hi,

    When I try and apply the CTDB patch i get the following:

    [root@cluster1 heartbeat]# cat ~/ctdb.patch | patch
    patching file CTDB
    Hunk #1 succeeded at 78 with fuzz 2 (offset -3 lines).
    patch: **** malformed patch at line 34: @@ -371,6 +391,11 @@

    Any suggestions ?

    I am using the latest resource agents from GIT as I am using GlusterFS instead of fighting with DRBD / OCFS2.

    I am also running directly on Oracle Linux rather than Centos with the kernel patched in.

    Your guide has worked for the majority of it so far with a few teeth gnashes between parts 🙂

    Cheers,

    Kane.

    • Hey thanks for the comment and sorry for any troubles. I tried to test as much as possible lol.
      Perhaps its the formatting of the patch? Try this db link . Let me know if it works/doesn’t work for you.
      If you have time to elaborate, I’d love to hear about any other frustrations or problems you experiences.

      Thanks

  • That worked, thanks.

    Most of my problems were getting the ocfs2_controld.pcmk to come up, it would install each time but pacemaker could never start it. dlm_docntold.pcmk was running but there was no /dlm for ocfs2 to attach onto.

    Otherwise it was silly things like DRDB tools (8.13) and kernel mod (8.11) are different in Oracle so when you yum update you then have to downgrade the tools or exclude them from the update.

    I have to document the build I am doing for work so I will drop you a copy of it, GlusterFS once running seems to have a lot less to go wrong but of course only time and testing will tell.

    Cheers

    Kane.

  • MINECRAFT FOR LIFE DONT EVN TRY TRI 360-NOSCOPE ME BRUHHHH IM THE QUEEN OF MINCRAFT… MINECRAFT BLESSES U AND MINECRAFT WILL BE WITH U

Leave a reply

required

required

optional


Time limit is exhausted. Please reload the CAPTCHA.

css.php