OpenShift Ansible
TAGS:OpenShift Ansible origin/release-3.9
ON ALL HOSTS
vim /etc/ssh/sshd_config - (permitRootLogin prohibit-password, PasswordAuthentication no;)
vim /root/.ssh/authorized_keys - paste your key
service sshd restart
ssh-keygen
vim /root/.ssh/authorized_keys - paste all pub keys from all machines
vim /etc/hosts - paste all hostnames, etc
1.1.1.39 os7.exampler.com
1.1.1.69 os5.exampler.com
1.1.1.74 os3.exampler.com
1.1.1.83 os4.exampler.com
1.1.1.37 os1.exampler.com
1.1.1.56 os2.exampler.com
for i in 1 2 3 4 5 7; do ssh -x os$i.exampler.com 'yum -y install wget git net-tools bind-utils iptables-services bridge-utils bash-completion epel-release docker PyYAML python-ipaddress; yum -y update'; done
vim docker (sample file, we will copy it to /etc/sysconfig)
OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false --insecure-registry 172.30.0.0/16'
if [ -z "${DOCKER_CERT_PATH}" ]; then
DOCKER_CERT_PATH=/etc/docker
fi
Copy file on all nodes:
for i in 1 2 3 4 5 7; do scp docker os$i.exampler.com:/etc/sysconfig/docker; systemctl restart docker; done
for i in 1 2 3 4 5 7; do ssh -x os$i.exampler.com 'systemctl restart docker'; done
On Deploy Node
Setup DNS server
yum install -y epel-release
yum install -y named
vim /etc/named.cond
vim /etc/named/test-os.com.zone
Enable service and firewall
systemctl enable named
systemctl restart named
firewall-cmd --permanent --add-port=53/tcp
firewall-cmd --permanent --add-port=53/udp
firewall-cmd --reload
Check it -
ssh vm2
dig @10.220.106.245 oss.test-os.com
Install packages and os
yum -y install ansible pyOpenSSL python-lxml java-1.8.0-openjdk-headless httpd-tools patch python2-passlib
git clone https://github.com/openshift/openshift-ansible.git
cd openshift-ansible
git checkout remotes/origin/release-3.9
vim hosts
NFS (on deployment host)
yum -y install nfs-utils nfs-utils-lib
systemctl enable rpcbind
systemctl enable nfs-server
systemctl enable nfs-lock
systemctl enable nfs-idmap
systemctl start rpcbind
systemctl start nfs-server
systemctl start nfs-lock
systemctl start nfs-idmap
add:
openshift_master_htpasswd_file=/etc/origin/master/htpasswd
cat /etc/origin/master/htpasswd
grafadmin:$apr1$gGoz5HDo$k7ft2vFTNXhWykxtdjead/
test@com:$apr1$ehx/M4nF$0wd3uK7VFzLWc2pU1Segsd/
cognoz:$apr1$d986J3RM$kCQfaztYcKOzBI2aPssdB.Ef.
for i in 1.1.1.39 1.1.1.69 1.1.1.74 1.1.1.83 1.1.1.37 1.1.1.56; do ssh -x $i 'yum -y install NetworkManager; systemctl enable NetworkManager; systemctl start NetworkManager'; done
SELINUX
ON ALL NODES
vim /etc/default/grub
GRUB_CMDLINE_LINUX="consoleblank=0 fsck.repair=yes crashkernel=auto selinux=1 enforcing=1 rhgb quiet"
grub2-mkconfig -o /boot/grub2/grub.cfg
touch /.autorelabel
OPTIONALLY -
useradd -m -s /bin/bash centos
cp -r /root/.ssh /home/centos/
chown -R centos:centos /home/centos
reboot
DEPLOY START
DEPLOY Node
ansible-playbook -i hosts playbooks/prerequisites.yml
ansible-playbook -i hosts playbooks/deploy_cluster.yml
ON MASTER
oc login -u system:admin
oc get nodes
vim /etc/docker/daemon.json
{ "insecure-registries": ["172.30.0.0/16"] }
Basic operations
Create new user
oc create user rklimenko
oc adm policy add-cluster-role-to-user cluster-admin rklimenko
htpasswd -c /etc/origin/master/htpasswd rklimenko
Log in on https://master_ip:8443
Trics
Change pvc without losing any data
I foound it here https://bugzilla.redhat.com/show_bug.cgi?id=1570583
``John Sanda 2018-06-05 10:46:12 EDT
The big challenge with moving components to a new namespace is avoiding data loss. Yesterday I asked on the aos-storage list how I can migrate data from a PV. Here are the detail steps with which I was provided:
- Find your PV.
- Check PV.Spec.PersistentVolumeReclaimPolicy. If it Delete or Recycle,
change it to Retain (oc edit pv <xyz>
oroc patch
)
Whatever happens now, the worst thing that can happen to your PV is that
it can get to Released phase. Data won’t be deleted.
Rebind:
-
Create a new PVC in the new namespace. The new PVC should be the same
as the old PVC - storage classes, labels, selectors, … Explicitly,
PVC.Spec.VolumeName must be set to PV.Name. This effectively turns off
dynamic provisioning for this PVC. The new PVC will be Pending. That’s
OK, the old PVC is still the one that’s bound to the PV. -
Here comes the tricky part: change PV.Spec.ClaimRef exactly in this way:
PV.Spec.ClaimRef.Namespace =PV.Spec.ClaimRef.Name = PV.Spec.ClaimRef.UID =
The old PVC should get “Lost” in couple of seconds (and you can safely
delete it). New PVC should be “Bound”. PV should be “Bound” to the new PVC.
- Restore original PV.Spec.PersistentVolumeReclaimPolicy, if needed.
Note that this just rebinds the PV. It does not stop pods in the old
namespace that use the PV and start them in the new one. Something else
must do that. You should delete the deployment first and re-create it in
the new namespace when the new PVC is bound.``
BUGS
Incorrect image tag in metric rc (v3.9.0 not v.3.9)
https://github.com/openshift/origin/issues/19440
how to fix -
add to hosts this line
openshift_metrics_image_version=v3.9
Problems with cassandra(hawkular metrics) create data dir -
Do NOT create exports in /etc/exports on nfs share manually - openshift
will create them automatically in /etc/exports.d/openshift-ansible.exports
If you have already created one - do following on nfs share
cat /dev/null > /etc/exports
systemctl restart nfs-server
and this on the master node
oc login -u system:admin
oc -n openshift-infra(or openshift-metrics) delete po $hawkular-cassandra, $heapseter_pod $hawkular-metrics
Problems with registry certificate
see https://bugzilla.redhat.com/show_bug.cgi?id=1553838
If you cant get working test app (oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git)
Because of this problem, you should do this on ALL (nodes / masters) nodes
ls -la /etc/docker/certs.d/docker-registry.default.svc\:5000/node-client-ca.crt
rm -rf /etc/docker/certs.d/docker-registry.default.svc\:5000/node-client-ca.crt
ln -s /etc/origin/node/ca.crt /etc/docker/certs.d/docker-registry.default.svc\:5000/node-client-ca.crt
and restart application
Problems with registry push (500) on nfs
If you have this error
e="2018-08-08T11:26:19.218110887Z" level=error msg="response completed with error" err.code=unknown err.detail="filesystem: mkdir /registry: file exists" err.message="unknown error" go.version=go1.9.2
Then you should verify your nfs share
example:
cat /etc/exports.d/openshift-ansible.exports
"/var/nfs/registry" *(rw,root_squash)
"/var/nfs/metrics/metrics" *(rw,root_squash)
"/exports/logging-es" *(rw,root_squash)
"/exports/logging-es-ops" *(rw,root_squash)
"/exports/etcd" *(rw,root_squash)
"/exports/prometheus" *(rw,root_squash)
"/exports/prometheus-alertmanager" *(rw,root_squash)
"/exports/prometheus-alertbuffer" *(rw,root_squash)
DO NOT create anything in this dir manually!!!!
If you have done this than you need to delete anything in this registry
dir , restart nfs-server, recreate default pvc/pv for docker registry, recreate docker pods and recreate your app
cat pv.yaml
``apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: “yes”
name: registry-volume-volume
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 110Gi
nfs:
path: /var/nfs/registry
server: nfs-server-hostname
persistentVolumeReclaimPolicy: Retaincat pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: “yes”
pv.kubernetes.io/bound-by-controller: “yes”
name: registry-volume-claim
namespace: default
spec:
accessModes: - ReadWriteMany
resources:
requests:
storage: 110Gi
storageClassName: “”
volumeName: registry-volume-volume``
PostDeploy
Add compute label to nodes without it
oc project default
oc get nodes -o wide
oc label node node01.finomancer.com node-role.kubernetes.io/compute="true"
oc label node node02.finomancer.com node-role.kubernetes.io/compute="true"
Add svc-ha label to compute nodes (it will be used for ipfailover)
for i in 1 2 3 4; do echo $i; oc describe nodes node0$i.finomancer.com | head -n 14; done
for i in 1 2 3 4; do oc label node node0$i.finomancer.com ha-svc-nodes="failovervip"; done
for i in 1 2 3 4; do echo $i; oc describe nodes node0$i.finomancer.com | head -n 14; done