Clustering,High Availability,How To-s,Linux July 11, 2012 10:20 pm

OCFS2, Pacemaker and Corosync on CentOS 6.x

In an earlier post here, I shared my frustrations about how it doesn’t seem possible to get a Pacemaker cluster going with OCFS2 as the cluster fs on CentOS 6. But I wouldn’t be a “guru” if I couldn’t get it work now would I?? <evil laugh> hahahaha </evil laugh>. Enough laughter – on with the how to.

Note: I’m going to try and be as detailed as possible on this how to. I ran through it several times just to be sure I wasn’t missing anything, but I’m human – I may have made a mistake or two. Let me know if you run into issues.

Step 1: Installing and Configuring CentOS

Installing CentOS is pretty easy, so I won’t go into tremendous detail here.
I strongly suggest that you use a development host or virtual machine because you are going to install quite a few development libraries and their corresponding dependencies. I chose to use a vm with 3 GB of memory and a 12 GB disk. Once we get all the RPMs and the one or two stand alone binaries built, you can simply transfer and install them on your production systems.

For my system, I simply went with the defaults for the partitioning scheme however, feel free to adjust your disk layout to suit your specific needs. When choosing my installation type, again I choose the default of “minimal”. I suggest that you use the defaults in this respect to avoid potential conflicts between this how to and your system configuration.

Once your installation is complete, reboot, log in and perform some basic configuration:

  • Set a static IP address and DNS
  • Ensure name resolution is properly configured and the OS can ping itself (e.g. ping `uname -n`)
  • Disable selinux (edit /etc/sysconfig/selinux) and iptables (chkconfig iptables off; chkconfig ip6tables off)

Step 2: Installing the UEK

I would love for this step to be optional, but the ocfs2 kernel module does not ship with any of CentOS kernels (AFAIK) and it doesn’t seem to be provided by any other package. Therefore, we need to go outside the distribution to get a compatible kernel with the module built in. The latest UE Kernel from Oracle at the time of this how-to is v2.6.39 and does the job quite nicely as it ships with kmod ocfs2, version 1.8 which provides some really cool features. I’ll try to highlight some of them:

  • POSIX ACL support
  • Extended attributes for SELinux
  • Quota recovery
  • Quota syncing
  • Quota accounting on mount, disable on umount
  • Name based indexed b-tree of directory inodes
  • Optimized inode allocation
  • CoW support and unlimited inode-based writeable snapshot
  • Huge volume (> 16 TiB) support
  • Mount option (coherency=*) to control how to handle cluster coherency for O_DIRECT writes
  • SSD trimming support

So you excited yet? ;). OK – so to install it, do a yum install wget then follow the instructions here: http://public-yum.oracle.com/. Edit the repository file and only enable the [ol6_UEK_latest] repository then execute yum update; yum install kernel-uek kernel-uek-devel. After the kernel is installed, you will need to edit your /boot/grub/menu.lst and make sure the UEK is the default, but before you reboot, one additional change is needed.
The Pacemaker cluster stack requires a control device in /dev/misc to function. The full path to the device is /dev/misc/ocfs2_control. On CentOS 6.x, with no changes, this device is either inaccessible or non-existent. We fix that by adding a udev rule (more on udev here: http://en.wikipedia.org/wiki/Udev) that will take effect at each boot.
To add the rule, create the file /etc/udev/rules.d/99-ocfs2_control.rules. The file should contain a single line: KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660". Perform this step then reboot.

Step 3: Ready the Development Environment

So at this point, you have a pristine installation of CentOS 6.x with the Oracle Unbreakable Enterprise Kernel which has built-in support for OCFSv2. Time start installing our software.
I’m going to give you one liner that will download all the necessary rpms. Here is the rpm list: pacemaker openais corosync pacemaker-libs pacemaker-libs-devel gcc corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb sqlite-devel gnutls-devel byacc flex nss-devel. Install those rpms with the yum install command. Any dependencies will automatically be met
Lastly, create a symlink /usr/include/libxml2/libxml in /usr/include. Now lets try and build /usr/sbin/dlm_controld.pcmk.

Step 4: Building dlm_controld_pcmk

So now it’s time to build /usr/sbin/dlm_controld.pcmk. This is Pacemaker’s interface to the kernel’s distributed lock manager. Without it, you can’t run a cluster with OCFS2.
The sources are available at https://fedorahosted.org/cluster/wiki/HomePage via git, however we need to patch them otherwise it won’t build successfully.
Copy the source code below into a file on your system. Name it something like dlm_controld_pacemaker.patch. We will need it later.

--- a/group/dlm_controld/pacemaker.c    2012-07-12 19:07:56.555023010 -0400
+++ b/group/dlm_controld/pacemaker.c    2012-07-12 19:06:11.058024609 -0400
@@ -16,7 +16,7 @@
 #undef SUPPORT_HEARTBEAT
 #define SUPPORT_HEARTBEAT 0
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -64,7 +64,7 @@
     crm_log_init("cluster-dlm", LOG_INFO, FALSE, TRUE, 0, NULL);

     if(init_ais_connection(NULL, NULL, NULL, &local_node_uname, &our_nodeid) == FALSE) {
-       log_error("Connection to our AIS plugin (%d) failed", CRM_SERVICE);
+       log_error("Connection to our AIS plugin (CRM) failed");
        return -1;
     }

So here is a synopsis of what we are going to do:

  1. Clone the Repository
  2. Checkout the appropriate branch
  3. Patch group/dlm_controld/pacemaker.c
  4. Run the configure script
  5. Run make
  6. Profit??

Shell we begin?

First, clone the repository.

[root@pcmk-dev2 ~]# mkdir cluster-build
[root@pcmk-dev2 ~]# cd cluster-build/
[root@pcmk-dev2 cluster-build]# ls
[root@pcmk-dev2 cluster-build]# git clone http://git.fedorahosted.org/git/dlm.git
Initialized empty Git repository in /root/cluster-build/dlm/.git/
cd dlm

Then checkout the “pacemaker” branch.

[root@pcmk-dev2 cluster-build]# cd dlm
[root@pcmk-dev2 dlm]# git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/dlm-fixes
  remotes/origin/fscontrol
  remotes/origin/master
  remotes/origin/pacemaker
  remotes/origin/sles
[root@pcmk-dev2 dlm]# 

[root@pcmk-dev2 dlm]# git checkout pacemaker
Branch pacemaker set up to track remote branch pacemaker from origin.
Switched to a new branch 'pacemaker'

Now patch the code

[root@pcmk-dev2 dlm]# pwd
/root/cluster-build/dlm
[root@pcmk-dev2 dlm]# cat ../dlm_controld_pacmaker.patch | patch -p1
patching file group/dlm_controld/pacemaker.c
[root@pcmk-dev2 dlm]#

Run the configure script.

[root@pcmk-dev2 dlm]# ./configure --enable_pacemaker

Configuring Makefiles for your system...

Checking tree: nothing to do

Checking kernel:
 WARNING: Could not determine kernel version.
          Build might fail!
Completed Makefile configuration

[root@pcmk-dev2 dlm]#

Finally, run make

[root@pcmk-dev2 dlm]# make
[ -n "" ] || make -C dlm all
make[1]: Entering directory `/root/cluster-build/dlm/dlm'
set -e && \
        for i in libdlm libdlmcontrol tool man; do \
                make -C $i all; \
        done
make[2]: Entering directory `/root/cluster-build/dlm/dlm/libdlm'
gcc -Wall -Wformat=2 -MMD -O2 -g -I/root/cluster-build/dlm/make -DENABLE_PACEMAKER=1 -DLOGDIR=\"/var/log/cluster\" -DSYSLOGFACILITY=LOG_LOCAL4 -DSYSLOGLEVEL=LOG_INFO -DRELEASE_VERSION=\"1342137464\" -fPIC -I/root/cluster-build/dlm/dlm/libdlm -I/usr/include -I/lib/modules/2.6.39-200.29.1.el6uek.x86_64/source/include  -D_REENTRANT -c -o libdlm.o /root/cluster-build/dlm/dlm/libdlm/libdlm.c
In file included from /root/cluster-build/dlm/dlm/libdlm/libdlm.c:21:
/lib/modules/2.6.39-200.29.1.el6uek.x86_64/source/include/linux/types.h:13:2: warning: #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders"
ar cru libdlm.a libdlm.o
ranlib libdlm.a
gcc -Wall -Wformat=2 -MMD -O2 -g -I/root/cluster-build/dlm/make -DENABLE_PACEMAKER=1 -DLOGDIR=\"/var/log/cluster\" -DSYSLOGFACILITY=LOG_LOCAL4 -DSYSLOGLEVEL=LOG_INFO -DRELEASE_VERSION=\"1342137464\" -fPIC -I/root/cluster-build/dlm/dlm/libdlm -I/usr/include -I/lib/modules/2.6.39-200.29.1.el6uek.x86_64/source/include  -c -o libdlm_lt.o /root/cluster-build/dlm/dlm/libdlm/libdlm.c
In file included from /root/cluster-build/dlm/dlm/libdlm/libdlm.c:21:
/lib/modules/2.6.39-200.29.1.el6uek.x86_64/source/include/linux/types.h:13:2: warning: #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders"
ar cru libdlm_lt.a libdlm_lt.o
ranlib libdlm_lt.a
gcc -shared -o libdlm.so.3.0 -Wl,-soname=libdlm.so.3 libdlm.o -lpthread   -L/usr/lib
ln -sf libdlm.so.3.0 libdlm.so
ln -sf libdlm.so.3.0 libdlm.so.3
gcc -shared -o libdlm_lt.so.3.0 -Wl,-soname=libdlm_lt.so.3 libdlm_lt.o  -L/usr/lib
ln -sf libdlm_lt.so.3.0 libdlm_lt.so
ln -sf libdlm_lt.so.3.0 libdlm_lt.so.3
cat /root/cluster-build/dlm/dlm/libdlm/libdlm.pc.in | \
        sed \
                -e 's#@PREFIX@#/usr#g' \
                -e 's#@LIBDIR@#/usr/lib#g' \
                -e 's#@INCDIR@#/usr/include#g' \
                -e 's#@VERSION@#1342137464#g' \
        > libdlm.pc
cat /root/cluster-build/dlm/dlm/libdlm/libdlm_lt.pc.in | \
        sed \
                -e 's#@PREFIX@#/usr#g' \
                -e 's#@LIBDIR@#/usr/lib#g' \
                -e 's#@INCDIR@#/usr/include#g' \

****

make[2]: Entering directory `/root/cluster-build/dlm/bindings/python'
set -e && \
        for i in ; do \
                make -C $i all; \
        done
make[2]: Leaving directory `/root/cluster-build/dlm/bindings/python'
make[1]: Leaving directory `/root/cluster-build/dlm/bindings'
[ -n "" ] || make -C contrib all
make[1]: Entering directory `/root/cluster-build/dlm/contrib'
set -e && \
        for i in ; do \
                make -C $i all; \
        done
make[1]: Leaving directory `/root/cluster-build/dlm/contrib'
[root@pcmk-dev2 dlm]#

At the end of this process, you should have a shiny new dlm_controld.pcmk in group/dlm_controld. Admire it.. then go ahead and run make install then we will move to the next step.

Step 5: Building ocfs2-tools

OK, so once again, we are going to have to go outside of the CentOS repository as the ocfs2-tools rpm is nowhere to be found.
Visit the Oracle Public Yum Repository. Download the SRPM for ocfs2-tools 1.8 and install it. We are going to rebuild the rpm but before we do that, we need to patch the code using the patch below patch.

--- a/ocfs2_controld/pacemaker.c        2012-07-12 20:33:52.525019305 -0400
+++ b/ocfs2_controld/pacemaker.c        2012-07-12 20:34:01.781025802 -0400
@@ -30,7 +30,9 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
+#include 

 #include "ocfs2-kernel/kernel-list.h"
 #include "o2cb/o2cb.h"
@@ -155,7 +157,7 @@
        crm_log_init("ocfs2_controld", LOG_INFO, FALSE, TRUE, 0, NULL);

        if(init_ais_connection(NULL, NULL, NULL, &local_node_uname, &our_nodeid) == FALSE) {
-               log_error("Connection to our AIS plugin (%d) failed", CRM_SERVICE);
+               log_error("Connection to our AIS plugin (CRM) failed");
                return -1;
        }

If that patch looks like patch we applied to group/dlm_controld/pacemaker.c in the source for dlm_controld.pcmk, you are right. The common theme in both cases is that the CRM_SERVICE is not properly defined and some headers are not properly being referenced.
Copy that code to the server and name it ocfs2_controld.patch. Also, download my RPM spec file from from this link and put it on your server. For some odd reason, the spec files shipped with the RPM builds but doesn’t package/usr/sbin/ocfs2_controld.pcmk (or /usr/sbin/ocfs2_controld.cman for that matter). My spec file corrects this.

So here is the synopsis for building the ocfs2-tools rpm.

  1. Download and install the SRPM from the link I provided
  2. Untar and patch the ocfs2-tools
  3. Re-tar the source and replace the one shipped with the rpm
  4. Build the rpm using my spec file
  5. Install the rpm

OK so lets go.

Download the SRPM and install it.

[root@pcmk-dev2 ~]# wget http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/x86_64/ocfs2-tools-1.8.0-10.el6.src.rpm
--2012-07-12 22:02:01--  http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/x86_64/ocfs2-tools-1.8.0-10.el6.src.rpm
Resolving public-yum.oracle.com... 141.146.44.34
Connecting to public-yum.oracle.com|141.146.44.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 853920 (834K) [application/x-rpm]
Saving to: âocfs2-tools-1.8.0-10.el6.src.rpmâ

100%[===================================================================================================================================================================================================>] 853,920     1.38M/s   in 0.6s

2012-07-12 22:02:01 (1.38 MB/s) - âocfs2-tools-1.8.0-10.el6.src.rpmâ

[root@pcmk-dev2 ~]# rpm -ivh ocfs2-tools-1.8.0-10.el6.src.rpm
   1:ocfs2-tools            warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
########################################### [100%]
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
[root@pcmk-dev2 ~]#

Untar the rpm and patch it

[root@pcmk-dev2 ~]# tar -zxf rpmbuild/SOURCES/ocfs2-tools-1.8.0.tar.gz
[root@pcmk-dev2 ~]# cd ocfs2-tools-1.8.0
[root@pcmk-dev2 ocfs2-tools-1.8.0]# cat ../ocfs2_controld.patch | patch -p1
patching file ocfs2_controld/pacemaker.c
[root@pcmk-dev2 ocfs2-tools-1.8.0]#

Now re-tar the source code and replace the source shipped with the srpm rpmbuild/SOURCES

[root@pcmk-dev2 ocfs2-tools-1.8.0]# cd ../
[root@pcmk-dev2 ~]# tar -czf ocfs2-tools-1.8.0.tar.gz ocfs2-tools-1.8.0
[root@pcmk-dev2 ~]# cp ocfs2-tools-1.8.0.tar.gz rpmbuild/SOURCES/.
cp: overwrite `rpmbuild/SOURCES/./ocfs2-tools-1.8.0.tar.gz'? y
[root@pcmk-dev2 ~]#

Finally, build the rpm with my spec file

[root@pcmk-dev2 ~]# rpmbuild -bb ocfs2-tools.spec
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.dttsXy
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd /root/rpmbuild/BUILD
+ rm -rf ocfs2-tools-1.8.0
+ /bin/tar -xvvf -
+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/ocfs2-tools-1.8.0.tar.gz
drwxr-xr-x root/root         0 2011-03-20 03:13 ocfs2-tools-1.8.0/
-rw-r--r-- root/root      2390 2011-03-20 02:48 ocfs2-tools-1.8.0/mbvendor.m4
drwxr-xr-x root/root         0 2011-03-20 03:13 ocfs2-tools-1.8.0/listuuid/
-rw-r--r-- root/root      5210 2011-03-20 02:48 ocfs2-tools-1.8.0/listuuid/listuuid.c
-rw-r--r-- root/root       640 2011-03-20 02:48 ocfs2-tools-1.8.0/listuuid/Makefile
****
****
****
Processing files: ocfs2-tools-devel-1.8.0-10.el6.x86_64
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/ocfs2-tools-1.8.0-10.el6.x86_64
warning: Installed (but unpackaged) file(s) found:
   /sbin/ocfs2_controld.cman
   /sbin/ocfs2_controld.pcmk
warning: Could not canonicalize hostname: pcmk-dev2
Wrote: /root/rpmbuild/RPMS/x86_64/ocfs2-tools-1.8.0-10.el6.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/ocfs2-tools-devel-1.8.0-10.el6.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.IvQAGC
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd ocfs2-tools-1.8.0
+ rm -rf /root/rpmbuild/BUILDROOT/ocfs2-tools-1.8.0-10.el6.x86_64
+ exit 0

If all goes well, you should see output similar to that above. Install the resulting rpm and rejoice! The hard work is done. Tar up the compiled source code for dlm_controld.pcmk and copy that plus the ocfs2-tools-1.8 RPM produced above to your servers. On your production servers you’ll simply need to install the UEK kernel, redhat-lsb, cluster-lib, corosync, pacemaker, openais and resource-agents in addition to the ocfs2-tools rpm you compiled. You’ll also need to untar the dlm_controld.pcmk source and run a make install.

Stay tuned. In the next how to, I’ll show you how to put this software to work!

Tags:

4 Comments

  • Francisco Olcina

    Hi,FYI, if you install the latest pacemaker-libs* packages (1.1.8-7) the dlm compiling won’t work. The lastest pacemaker-lib* packages that work well are the 1.1.7-6.el6 version. If you have already installed the 1.1.8-7 version, you can downgrade the packages with this command: yum downgrade pacemaker pacemaker-cli pacemaker-cluster-libs pacemaker-cts pacemaker-libs pacemaker-libs-devel.

    Regards.

    • Thanks for the heads up Francisco. I’ll add that as a note in the how-to.

  • Richard Sharpe

    Hmmm,, I am having linking problems:

    pacemaker.o: In function `dlm_process_node’:
    /root/cluster-build/dlm/group/dlm_controld/pacemaker.c:122: undefined reference to `crm_is_member_active’
    /root/cluster-build/dlm/group/dlm_controld/pacemaker.c:214: undefined reference to `crm_is_member_active’
    /root/cluster-build/dlm/group/dlm_controld/pacemaker.c:214: undefined reference to `crm_is_member_active’
    pacemaker.o: In function `process_cluster’:
    /root/cluster-build/dlm/group/dlm_controld/pacemaker.c:94: undefined reference to `ais_dispatch’

    etc.

    Assuming it is a problem with the pacemaker libs, I will try to downgrade and see if that works.

  • Richard Sharpe

    So, I eventually got this working with CentOS 6.6 and OCFS2 but I had to build the ocfs2-tools code and fish out ocfs2_controld.cman and place it in the correct place.

    Also, of course, I had to use the correct approach to configure things, mostly with help from this site:

    http://floriancrouzat.net/2013/04/rhel-6-4-pacemaker-1-1-8-adding-cman-support-and-getting-rid-of-the-plugin/

    Also, the OCFS2 documentation is a little wrong (with respect to the steps), but that is OK.

    I now have CTDB running on a two node cluster on two VBox VMs using OCFS2, CMAN and Pacemaker.

Leave a reply

required

required

optional


Time limit is exhausted. Please reload the CAPTCHA.

css.php