TroubleNow.org – Page 2 – Just a reference base

Monitoring Cisco ASA Cluster status

ASA-Cluster

There are multiple way’s of configuring High Availablity on a set of Cisco ASA firewalls.
One of them is clustering (Note that Active/Standby or Active/Active is not the same as a ASA Cluster)

As of now there is no way to monitor the cluster status using SNMP, the only way to check if your ASA cluster is up and running is by monitoring your interface status.
If the data interfaces of a single ASA change to a disconnected state you know something has gone wrong in your cluster.

However I wanted more, after contacting TAC they confirmed that there still is no way of monitoring the ASA cluster with SNMP so I had to find a different way.
Most I wanted is shown when you run the ‘show cluster info’ command from the system context (if you use contexts), so to monitor the ASA cluster status with Nagios all I needed was a output from this command.

For this I created two simple scripts (one in batch, the other in expect) to login with SSH and run the show cluster info command.
Now this is absolutely not the best way of monitoring, and as you need to provide the password somewhere in Nagios it is not the most secure way but it works and especially in a development environment it is a good way of knowing when something has gone wrong in the ASA cluster.

For this you need the two scripts and some modifications to your nagios setup (nagios core in my case), so download the following files:

1) Download both files and put then in your nagios libexec folder (/usr/local/nagios/libexec in my setup):

1
2
3

cd /usr/local/nagios/libexec/
wget https://raw.githubusercontent.com/darky83/Scripts/master/Nagios/ASA-Cluster-Status/check_asa_cluster.bash
wget https://raw.githubusercontent.com/darky83/Scripts/master/Nagios/ASA-Cluster-Status/check_asa_cluster.exp

2) Make the files executable:

1	chmod 755 check_asa_cluster.*

3) Edit nagios templates.cfg and add a new service, in my case I check only every 6 hours, no need to bash the ASA with ssh logins as long as we get a notification if something goes wrong once a day, so edit your templates.cfg and add the following section at the bottom.

	define service{
        	name                            asa-cluster-service
	        use                             generic-service
	        normal_check_interval           360
	        retry_check_interval            10
	        register                        0
        }

4) Add a new asa cluster test command in your commands.cfg file:

	# ASA Cluster test
	define command{
        	command_name    check_asa_cluster
	        command_line    $USER1$/check_asa_cluster.bash -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -M $ARG3$
	}

5) Add a new check somewhere in your host definition, note to change the hostname, ssh username, password and mode:

        # Cisco ASA Cluster status
	define service{
	        use                     asa-cluster-service
	        host_name               HOSTNAME_CHANGEME
	        service_description     Cisco ASA Cluster Status
	        check_command           check_asa_cluster!USERNAME_CHANGEME!PASSWORD_CHANGEME!MODE_CHANGEME
        }

The mode is a 0 if the monitored unit should be the cluster Master and a 1 if the unit should be a cluster slave, this way you can check if your cluster master status changes to another unit.

Hopefully Cisco will add support for ASA Clustering status monitoring in SNMP sometime soon so we won’t need workarounds anymore.

Status OK and the configured unit should be the master:
ASA-Cluster-master-ok

Status OK and the configured unit should be a slave:
ASA-Cluster-slave-ok

Status Critical when clustering is not enabled: ASA-Cluster-Critical

Category: linux | LEAVE A COMMENT

CentOS 7 and ZFS

zfs-linux

I have used ZFS before on FreeBSD but never on any Linux variant.
So it became time to play around with ZFS on Linux, but as you can read below, there still are some mayor issues.

As I just downloaded the latest CentOS 7 Minimal CD I used CentOS as my first try.
After the minimal install and updating all packages on my little VM we were ready to go.

ZFS is not part of CentOS so you will need to enable extra repository’s to install ZFS.

First we enable EPEL and add the zfs repo:

1 2	sudo yum install epel-release sudo yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm

Then install ZFS:

1	sudo install kernel-devel zfs

This is where the first issue came up, ZFS did not seem to have installed correctly giving me not much more than an:

1	Failed to load ZFS module stack.

It seemed that ZFS was not build correctly because of the kernel update that was installed when I updated the system for the first time. After numerous attempts to fix ZFS I decided to just remove all the ZFS packages and old kernel and start over.

An tip online on gmane also mentioned to remove all modules, so sure lets go:

1
2
3

find /lib/modules/$(uname -r)/extra -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f
 
find /lib/modules/$(uname -r)/weak-updates/ -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f

Then I removed my old kernel, headers and all ZFS packages and finally an reboot:

sudo yum erase kernel-3.10.0-229.el7.x86_64
sudo yum erase kernel-headers kernel-devel kernel-tools
sudo yum erase zfs zfs-dkms libzfs2 spl spl-dkms dkmsc
reboot

After the reboot it was time to install everything again:

1
2
3

sudo yum localinstall --nogpgcheck https://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
sudo yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
sudo yum install kernel-devel zfs

And finally ZFS worked like it should, no more error messages.

I have 4 small 5GB drives added to my VM to test with,
sdb, sdc, sdd and sde

So lets create an pool called data01 in raidz, as the drives are blank and unformatted you will need to force the creation of an new GPT.

1	zpool create data01 raidz sdb sdc sdd sde -f

See if all went well:

zpool status
  pool: data01
 state: ONLINE
  scan: none requested
config:
 
	NAME        STATE     READ WRITE CKSUM
	data01      ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	  sde       ONLINE       0     0     0

That looks fine, now lets see what zdb says:

zdb
data01:
    version: 5000
    name: 'data01'
    state: 0
    txg: 141
    pool_guid: 11371218379924896498
    errata: 0
    hostname: 'gluster01'
    vdev_children: 2
    vdev_tree:
        type: 'root'
        id: 0
        guid: 11371218379924896498
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            guid: 832996839702454674
            nparity: 1
            metaslab_array: 34
            metaslab_shift: 27
            ashift: 9
            asize: 16060514304
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 2105630605786225987
                path: '/dev/sdb1'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 6145009721102371365
                path: '/dev/sdc1'
                whole_disk: 1
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 14635352678309772165
                path: '/dev/sdd1'
                whole_disk: 1
                create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 1544288362510023523
            path: '/dev/sde1'
            whole_disk: 1
            metaslab_array: 112
            metaslab_shift: 25
            ashift: 9
            asize: 5353504768
            is_log: 0
            create_txg: 134
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

That all looks fine so we should have an 15GB /data01 partition:

df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  8.5G  1.1G  7.5G  13% /
devtmpfs                 481M     0  481M   0% /dev
tmpfs                    490M     0  490M   0% /dev/shm
tmpfs                    490M  6.7M  484M   2% /run
tmpfs                    490M     0  490M   0% /sys/fs/cgroup
/dev/sda1                497M  119M  378M  24% /boot
data01                    15G     0   15G   0% /data01

Perfect, that all works fine, lets see if duplication works:

I will create a 1GB file just filled with zero’s to see if this works.

dd if=/dev/zero of=/data01/zerofile1 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.45455 s, 241 MB/s
 
ls -lh /data01/
total 1.0G
-rw-r--r--. 1 root root 1.0G Jul 28 22:42 zerofile1
 
df -h | grep data01
data01                    15G  1.0G   14G   7% /data01

Okay we have a single file on /data01 that uses 1GB of space, lets copy the file a few times and see what happens, if everything works like it should the usage of the drive should not increase:

i=2; while [ $i -lt 7 ]; do cp /data01/zerofile1 /data01/zerofile$i; i=$[i+1]; done
ls -lh /data01/
total 1.0G
-rw-r--r--. 1 root root 1.0G Jul 28 22:42 zerofile1
-rw-r--r--. 1 root root 1.0G Jul 28 22:53 zerofile2
-rw-r--r--. 1 root root 1.0G Jul 28 22:54 zerofile3
-rw-r--r--. 1 root root 1.0G Jul 28 22:54 zerofile4
-rw-r--r--. 1 root root 1.0G Jul 28 22:54 zerofile5
-rw-r--r--. 1 root root 1.0G Jul 28 22:54 zerofile6
 
df -h | grep data01
data01                    15G  1.0G   14G   7% /data01

Yep we have 6 of the same files that are each 1GB in size but still 1GB of total space used, looks good thus far, lets clean up and the drive should be empty again:

rm -rf zerofile*
ls -lh /data01/
total 0
df -h | grep data01
data01                    15G  1.0G   14G   7% /data01

Thats weird, I removed the files but still the drive has 1GB of space in use.
But if you unmount the drive and mount the drive again the disk space is free again:

cd /
umount /data01/
zfs mount data01
df -h | grep data
data01                    15G     0   15G   0% /data01

This seems to be an bug, creating an sub drive with xattr=sa and modify the drop_caches seems to ‘fix’ the issue

zfs create -o xattr=sa data01/fs
echo 3 > /proc/sys/vm/drop_caches
 
dd if=/dev/zero of=/data01/fs/testfile bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1.98644 s, 541 MB/s
 
# df -h | grep data
data01/fs                 15G  1.0G   14G   7% /data01/fs
 
# rm /data01/fs/testfile
rm: remove regular file â€˜/data01/fs/testfileâ€™? y
 
#  df -h | grep data
data01/fs                 15G  128K   15G   1% /data01/fs

So it seems ZFS on CentOS 7 still has some issues, next up in test is Debian.

Category: linux | LEAVE A COMMENT

Monitoring Cisco ASA Cluster status

CentOS 7 and ZFS

ESVA becomes E.F.A. 3