NFS3 storage for K8S via ganesha+glusterfs
TAGS:Glusterfs cluster (replica 2) integration with k8s via Ganesha+NFS3
Components:
- Two VMs in Openstack, centos 7:
test-glusterfs-1 10.1.39.241
test-glusterfs-2 10.1.39.240 - glustefs version: 3.10
- keepalived 1.2.19
- nfs-ganesha 2.5.3
- kubernetes (deployed previously via rancher 1.6.9)
Our aim is to get working HA persistent storage for K8S apps
WHY we should do it
Some times ago i have written an article External glusterfs integration with k8s. But several days ago i understanded that this scheme is a really big mess. Why i think so? I think so now because Heketi API deployed on baremetall nodes (not in the k8s cluster - it’s a different story) has no fuc*ing idea about any process beneath it. So if i want to add a new node or remove a failed one(which is currently unaccessible) i can’t do it. ALL nodes should be online if you want anything. So there is no such thing as HA (hahaha) or failover. I repeat that one more time
DO NOT USE HEKETI API ON 2 NODE CLUSTER OUTSIDE KUBERNETES! Never!###
SO YES WE ARE GOING NFS!
Step 1. Initial configuration of OS
run on all nodes
cat /etc/hosts
127.0.0.1 localhost
127.0.0.1 test-glusterfs-1
10.1.39.241 test-glusterfs-1
10.1.39.240 test-glusterfs-2
10.1.39.242 test-glusterfs-3
10.1.39.211 test-glusterfs-1v
10.1.39.212 test-glusterfs-2v
sudo passwd root
su
yum install -y centos-release-gluster310.noarch vim
yum install -y glusterfs glusterfs-server nfs-ganesha nfs-ganesha-gluster glusterfs-geo-replication glusterfs-ganesha
systemctl enable glusterd && systemctl start glusterd
systemctl disable firewalld
setenforce 0
vim /etc/selinux/config - disabled
reboot
gluster peer probe test-glusterfs-1
gluster peer probe test-glusterfs-2
gluster peer probe test-glusterfs-3
run on first and second nodes
yum install pcs
vim /etc/corosync/corosync.conf
systemctl enable pcsd && systemctl start pcsd
systemctl enable pacemaker && systemctl start pacemaker
systemctl enable corosync && systemctl start corosync
echo hapassword | passwd --stdin hacluster
Use loginpair (hacluster:hapassword) for next commands
run on one (1st or 2d) node
pcs cluster auth test-glusterfs-1
pcs cluster auth test-glusterfs-2
Step 2. Volumes, neutron port, secgroups, keepalived
- Create 2 volumes for both VM’s in the Openstack:
cinder create --name test-glusterfs-1-data 300G cinder create --name test-glusterfs-2-data 300G cinder create --name test-glusterfs-3-meta 10G
- Attach them to instances
nova volume-attach $vm1_id test-glusterfs-1-data nova volume-attach $vm2_id test-glusterfs-2-data nova volume-attach $vm3_id test-glusterfs-3-meta
- Create and update neutron ports for pcs vip resources
. openrc export OS_TENANT_NAME='K8S-Lab' export OS_PROJECT_NAME='K8S-Lab' neutron port-create --fixed-ip subnet-id=$yournetid,ip_address=test-glusterfs-1v_IP $yournetid neutron port-update $test-glusterfs-1v_IP_portid --allowed-address-pairs type=dict list=true ip_address=VIP,mac_address=mac1 ip_address=VIP,mac_address=mac2 neutron port-create --fixed-ip subnet-id=$yournetid,ip_address=test-glusterfs-2v_IP $yournetid neutron port-update $test-glusterfs-2v_IP_portid --allowed-address-pairs type=dict list=true ip_address=VIP,mac_address=mac1 ip_address=VIP,mac_address=mac2 neutron port-update $VM1_portid --allowed-address-pairs type=dict list=true ip_address=$VM1_ip,mac_address=$VM1_mac ip_address=$test-glusterfs-1v_IP,mac_address=$VM1_mac ip_addres=$test-glusterfs-2v_IP,mac_address=$VM1_mac neutron port-update $VM2_portid --allowed-address-pairs type=dict list=true ip_address=$VM2_ip,mac_address=$VM2_mac ip_address=$test-glusterfs-1v_IP,mac_address=$VM2_mac ip_addres=$test-glusterfs-2v_IP,mac_address=$VM2_mac
- Create security groups in UI (convenient way)
vrrp - 112 tcp ssh - 22 tcp heketi - 8082 tcp nfs - 2049, 564,875 tcp and udp
Assign all these groups to our instances (and default secgroup of course)
Reboot instancesStep 3. LVM
run on all nodes
lsblk; (find your 300G DATA device with fdisk, let's assume that it is /dev/vdc) pvcreate /dev/vdc vgcreate glustervg /dev/vdc lvcreate -n glusterlv -l 100%FREE glustervg mkfs.xfs -i size=512 /dev/glustervg/glusterlv mkdir -p /opt/gluster/vol
vim /etc/fstab
/dev/mapper/glustervg-glusterlv /opt/gluster/vol xfs defaults 0 0
mount -a mkdir /opt/gluster/vol/gluster
Step 4. SSH configuration
vim /etc/ssh/sshd_config
PermitRootLogin yes PasswordAuthentication yes
service sshd restart
run on first node
ssh-keygen -f /var/lib/glusterd/nfs/secret.pem ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@test-nfs-rk-1 ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@test-nfs-rk-2 scp /var/lib/glusterd/nfs/secret.* root@test-nfs-rk-2:/var/lib/glusterd/nfs/
run on node-1 and node-2
ssh -i /var/lib/glusterd/nfs/secret.pem root@test-nfs-rk-1 exit ssh -i /var/lib/glusterd/nfs/secret.pem root@test-nfs-rk-2 exit
run on first node
vim /etc/ganesha/ganesha-ha.conf
gluster volume set all cluster.enable-shared-storage enable
wait for a moment
run on one node
gluster volume remove-brick gluster_shared_storage replica 2 test-glusterfs-3:/var/lib/glusterd/ss_brick
!!!!!!!gluster volume add-brick gluster_shared_storage replica 3 arbiter 1 test-nfs-rk-3:/var/lib/glusterd/ss_brick force !!!!!!
gluster volume create cluster-demo replica 3 arbiter 1 test-nfs-rk-1:/opt/gluster/vol/gluster/ test-nfs-rk-2:/opt/gluster/vol/gluster/ test-nfs-rk-3:/opt/gluster/vol/gluster/ force
mkdir /etc/ganesha/bak/
vim /etc/ganesha/bak/ganesha.conf
vim /usr/lib/pcsd/remote.rb - add sleep 5 on line 774 before pcsd_restart
start ganesha-ha creation
gluster nfs-ganesha enable
debug config
crm_verify -LV
if you have error with stonith when run on ALL pcs nodes
pcs property set stonith-enabled=false
if you have this error: Error: creation of symlink ganesha.conf in /etc/ganesha failed
when you should disable selinux
vim /etc/selinux/config
SELINUX=disabled
reboot
gluster volume create data-1 replica 2 transport tcp test-glusterfs-1:/opt/gluster/vol/gluster test-glusterfs-2:/opt/gluster/vol/gluster test-glusterfs-3:/opt/gluster/vol/gluster force
``
CHECK
mount | grep gluster
Step 4. Ganesha configuration
run on all nodes
vim /etc/ganesha/ganesha.conf
vim /etc/ganesha/gluster.conf
vim /lib/systemd/system/nfs-ganesha.service
CHECK
mkdir /tmp/dirt; mount -t nfs VIP:/volname(from gluster.conf) /tmp/dirt
IMPORTANT
- systemctl enable corosync.service && systemctl enable pacemaker.service && systemctl enable nfs-ganesha.service
- Sle
Step 5. Integration with K8S
Install kubectl(google it), mdkir ~/.kube, vim ~/.kube/config
CHECK
kubectl get po
on all COMPUTE nodes
apt -y install nfs-common
run on the same node with kubectl
mkdir k8s
vim k8s/glusterfs-pv-nfs.yml
vim k8s/glusterfs-pvc-nfs.yml
vim k8s/nginx-deployment-pvc-nfs.yml
kubectl create -f *
CHECK
kubectl get pvc - nfs should be BOUND
Step 6. Deploying and testing in app
run on the same node with kubectl
kubectl get po | grep nginx-deploy
kubectl describe $podname - find out compute node on which our pod is running
ssh $computenode
docker ps | grep nginx-deploy
docker exec -it -u root $id bash
for i in {1..1000000}; do sleep 1; echo `date` >> /usr/share/nginx/html/omaigod; done
Now, while this shit is executing, we could shut off/reboot any glusterfs instance
after some time, check this file in container:
cat /usr/share/nginx/html/omaigod
Step 6.5. If not all pods are running
If you are experiencing some problems with mounting volumes inside pods, you can try this thing:
ssh compute{1...}; mount -t nfs vip:/volume /opt/; umount /opt/
Just run this easy operations and after several moments all pods will be running.
Step 7. Benchmarking glusterfs
cat generate.sh
cat generate.sh
for i in $(seq 1 $NUMBER);
do
dd if=/dev/urandom of=$TARGET/file_$i bs=$SIZE count=$COUNT 2>&1 | grep -v records
done
Creating 10240 files of 100k
export NUMBER=10240
export COUNT=1
export TARGET=pwd/100k
export SIZE=100K
sh generate.sh > 100k.log
Creating 1024 files of 1M
export NUMBER=1024
export TARGET=pwd/1M
export SIZE=1M
sh generate.sh > 1M.log
Creating 100 files of 10M
export NUMBER=100
export TARGET=pwd/10M
export SIZE=10M
sh generate.sh > 10M.log
Creating 10 files of 100M
export NUMBER=10
export COUNT=100
export TARGET=pwd/100M
export SIZE=1M
sh generate.sh > 100M.log
Creating 1 file of 1G
export NUMBER=1
export TARGET=pwd/1G
export SIZE=1M
export COUNT=1024
sh generate.sh > 1G.log
Average:
cat 1M_root.log | awk '{print $8}' | awk '{a+=$1} END{print a/NR}' > 1M_root.result
Tear down cluster
ssh node1; gluster nfs-ganesha dsable
pcs cluster node remove $node1
ssh node2
pcs cluster node remove $node2