Clustering,High Availability,How To-s,Linux August 5, 2012 9:02 pm

Ultimate NAS How-To

Step 2: Configuring Pacemaker and Corosync

So if you haven’t already, make sure pacemaker, corosync, resource-agents and openais are installed. Then we need to apply a small patch to the CTDB resource-agent script in /usr/lib/ocfs/resource.d/heartbeat. The current CTDB resource-agent script doesn’t manage NFS, which I feel simplifies management tremendously and should help to speed up recovery as CTDB includes scripts to ensure that tickles are sent to clients when services are migrated or fail over occurs.
In addition, I commented out two functions that modify the smb.conf file when CTDB is started or stopped. This will cause csync2 to mark the smb.conf file as being dirty, even though no changes have really occurred.

--- /usr/lib/ocf/resource.d/heartbeat/CTDB      2012-08-08 12:06:01.806465356 -0400
+++ /usr/lib/ocf/resource.d/heartbeat/CTDB      2012-08-09 21:35:47.368810599 -0400
@@ -81,6 +81,8 @@
 : ${OCF_RESKEY_ctdb_service_nmb:=""}
 : ${OCF_RESKEY_ctdb_service_winbind:=""}
 : ${OCF_RESKEY_ctdb_samba_skip_share_check:=yes}
+: ${OCF_RESKEY_ctdb_manages_nfs:=no}
+: ${OCF_RESKEY_ctdb_nfs_skip_share_check:=yes}
 : ${OCF_RESKEY_ctdb_monitor_free_memory:=100}
 : ${OCF_RESKEY_ctdb_start_as_disabled:=yes}

@@ -186,6 +188,24 @@

+
+
+Should CTDB manage starting/stopping the NFS service for you?
+
+Should CTDB manage NFS?
+
+
+
+
+
+If there are very many shares it may not be feasible to check that all
+of them are available during each monitoring interval.  In that case
+this check can be disabled.
+
+Skip share check during monitor?
+
+
+

 If the amount of free memory drops below this value the node will
@@ -371,6 +391,11 @@
        else
                chmod a-x $event_dir/50.samba
        fi
+       if ocf_is_true "$OCF_RESKEY_ctdb_manages_nfs"; then
+               chmod u+x $event_dir/60.nfs
+       else
+               chmod a-x $event_dir/60.nfs
+       fi
 }

 # This function has no effect (currently no way to set CTDB_SET_*)
@@ -454,6 +479,8 @@
 CTDB_SAMBA_SKIP_SHARE_CHECK=$(ocf_is_true "$OCF_RESKEY_ctdb_samba_skip_share_check" echo 'yes' || echo 'no')
 CTDB_MANAGES_SAMBA=$(ocf_is_true "$OCF_RESKEY_ctdb_manages_samba"  echo 'yes' || echo 'no')
 CTDB_MANAGES_WINBIND=$(ocf_is_true "$OCF_RESKEY_ctdb_manages_winbind"  echo 'yes' || echo 'no')
+CTDB_MANAGES_NFS=$(ocf_is_true "$OCF_RESKEY_ctdb_manages_nfs"  echo 'yes' || echo 'no')
+CTDB_NFS_SKIP_SHARE_CHECK=$(ocf_is_true "$OCF_RESKEY_ctdb_nfs_skip_share_check"  echo 'yes' || echo 'no')
 EOF
        append_ctdb_sysconfig CTDB_SERVICE_SMB $OCF_RESKEY_ctdb_service_smb
        append_ctdb_sysconfig CTDB_SERVICE_NMB $OCF_RESKEY_ctdb_service_nmb
@@ -490,11 +517,12 @@
        done

        # Add necessary configuration to smb.conf
-       init_smb_conf
-       if [ $? -ne 0 ]; then
-               ocf_log err "Failed to update $OCF_RESKEY_smb_conf."
-               return $OCF_ERR_GENERIC
-       fi
+       # This section is commented out because it messes with csync2 synchronizing smb.conf
+       #init_smb_conf
+       #if [ $? -ne 0 ]; then
+       #       ocf_log err "Failed to update $OCF_RESKEY_smb_conf."
+       #       return $OCF_ERR_GENERIC
+       #fi

        # Generate new CTDB sysconfig
        generate_ctdb_sysconfig
@@ -525,7 +553,8 @@
                -d $OCF_RESKEY_ctdb_debuglevel
        if [ $? -ne 0 ]; then
                # cleanup smb.conf
-               cleanup_smb_conf
+               # This command is commented out because it messes up file synchronization with csync2
+               #cleanup_smb_conf

                ocf_log err "Failed to execute $OCF_RESKEY_ctdbd_binary."
                return $OCF_ERR_GENERIC
@@ -583,7 +612,8 @@
        done

        # Cleanup smb.conf
-       cleanup_smb_conf
+       # This commmand is disabled because it messes up file synchronization with csync2
+       #cleanup_smb_conf

        # It was a clean shutdown, return success
        [ $rv -eq $OCF_SUCCESS ]  return $OCF_SUCCESS

Apply that patch then run the command corosync-keygen. This will generate the authentication keys required to run your cluster securely. It may take some time to generate enough bits. Once it’s done, copy the resulting file, /etc/corosync/authkey to your cluster nodes. You’ll also need to modify /etc/corosync/corosync.conf for your
environment. Here is my file for example:

totem {
        version: 2

        # How long before declaring a token lost (ms)
        token: 5000

        # How many token retransmits before forming a new configuration
        token_retransmits_before_loss_const: 20

        # How long to wait for join messages in the membership protocol (ms)
        join: 1000

        # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
        consensus: 7500

        # Turn off the virtual synchrony filter
        vsftype: none

        # Number of messages that may be sent by one processor on receipt of the token
        max_messages: 20

        # Limit generated nodeids to 31-bits (positive signed integers)
        clear_node_high_bit: yes

        # Disable encryption
        #secauth: off

        # How many threads to use for encryption/decryption
        threads: 10

        # Optionally assign a fixed node id (integer)
        nodeid: 0001

        # This specifies the mode of redundant ring, which may be none, active, or passive.
        rrp_mode: none

        interface {
                # The following values need to be set based on your environment
                ringnumber: 0
                bindnetaddr: 172.24.100.11
                mcastaddr: 226.94.1.1
                mcastport: 5410
        }
}

amf {
        mode: disabled
}

service {
        # Load the Pacemaker Cluster Resource Manager
        ver:       1
        name:      pacemaker
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: no
        to_syslog: yes
        syslog_facility: local0
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

Corosync uses multicast to communicate so at the very least, you’ll want to ensure that you select a unique port per-cluster.

Modify the configuration file on all of your hosts, ensuring that you set the optiional nodeid value to something unique per node and that the bindnetaddr is set to the IP address that resolves back to the hostname of the node.
Next, we need to enable an openais plugin for Corosync. It’s a requirement if you are going to need OCFS2, like we are. Create the file /etc/corosync/service.d/ckpt with the following contents:

service {
        name: openais_ckpt
        ver: 0
}

Lastly, ensure that the runlevel scripts for Pacemaker and Corosync are disabled. It’s a good practice to bring up the cluster stack manually in case a node gets fenced by the cluster stack or rebooted by the watchdog service. Now lets start configuring resources.

Tags:

7 Comments

  • Excellent.

    But I would like to see a samba ctdb only from you.

    Possible ? 🙂

    • I could, but samba already has a pretty good explanation of how to do it at ctdb.samba.org. Not to mention, there are many reasons why you would not want to run ctdb, samba and a cluster filesystem without a full blown cluster-stack.

  • Hi,

    When I try and apply the CTDB patch i get the following:

    [root@cluster1 heartbeat]# cat ~/ctdb.patch | patch
    patching file CTDB
    Hunk #1 succeeded at 78 with fuzz 2 (offset -3 lines).
    patch: **** malformed patch at line 34: @@ -371,6 +391,11 @@

    Any suggestions ?

    I am using the latest resource agents from GIT as I am using GlusterFS instead of fighting with DRBD / OCFS2.

    I am also running directly on Oracle Linux rather than Centos with the kernel patched in.

    Your guide has worked for the majority of it so far with a few teeth gnashes between parts 🙂

    Cheers,

    Kane.

    • Hey thanks for the comment and sorry for any troubles. I tried to test as much as possible lol.
      Perhaps its the formatting of the patch? Try this db link . Let me know if it works/doesn’t work for you.
      If you have time to elaborate, I’d love to hear about any other frustrations or problems you experiences.

      Thanks

  • That worked, thanks.

    Most of my problems were getting the ocfs2_controld.pcmk to come up, it would install each time but pacemaker could never start it. dlm_docntold.pcmk was running but there was no /dlm for ocfs2 to attach onto.

    Otherwise it was silly things like DRDB tools (8.13) and kernel mod (8.11) are different in Oracle so when you yum update you then have to downgrade the tools or exclude them from the update.

    I have to document the build I am doing for work so I will drop you a copy of it, GlusterFS once running seems to have a lot less to go wrong but of course only time and testing will tell.

    Cheers

    Kane.

  • MINECRAFT FOR LIFE DONT EVN TRY TRI 360-NOSCOPE ME BRUHHHH IM THE QUEEN OF MINCRAFT… MINECRAFT BLESSES U AND MINECRAFT WILL BE WITH U

Leave a reply

required

required

optional


Time limit is exhausted. Please reload the CAPTCHA.

css.php