Cephadm on Ubuntu 22.04
TAGS:Intro
Mission: Deploy new Ceph poduction-ready cluster on 4 new hardware nodes.
Overview
I had zero experience with the current cephadm orchestrator for ceph. But time flies and ceph-ansible is under further deprecation, so let old dog learn new tricks.
Requirements
Nothing new here:
- JBOD for osds
- raid1 for 2 OS disks (mdraid is fine)
- minimum 3 nodes (and maybe around 15 maximum for the sake of possible rebalance)
- at least 2 network cards with 2x10 gbps ports (better 25 gbps) on each node
- configured lacp on network switches
- at least 2xOSD vcpu and 4xOSD ram on each OSD node
- proxy docker registry to quay.io accessible from all nodes
- 2 separate networks - ceph-internal / ceph-cluster (with jumbo frames and mtu 9000)
- accessible corporate ntp/dns servers, IPMI access if something goes wrong
Preparation
Step 1.
- provision all hosts, install Ubuntu 22.04
- configure hostnames, /etc/hosts localhost records
- configure chronyd and systemd-resolve. Here we configure resolv.conf to follow changes from netplan
sudo rm -f /etc/resolv.conf sudo ln -s /run/resolvconf/resolv.conf /etc/resolv.conf
- configure netplan. Pay attention to dhcp-dns, mtu and other configuration options:
network: bonds: bond0: interfaces: - enp180s0f0np0 - enp180s0f1np1 parameters: lacp-rate: fast mode: 802.3ad transmit-hash-policy: layer2+3 bond1: interfaces: - enp179s0f0np0 - enp179s0f1np1 parameters: lacp-rate: fast mode: 802.3ad transmit-hash-policy: layer2+3 ethernets: enp179s0f0np0: dhcp4: true mtu: 9000 enp179s0f1np1: dhcp4: true mtu: 9000 enp180s0f0np0: dhcp4: true dhcp4-overrides: use-dns: false enp180s0f1np1: dhcp4: true dhcp4-overrides: use-dns: false version: 2 vlans: bond0.10: id: 10 link: bond0 addresses: - 10.10.10.10/24 dhcp4-overrides: use-dns: false nameservers: addresses: - 10.10.10.200 - 10.10.10.220 search: - "mycompany.cloud" routes: - to: default via: 10.10.10.1 bond1.20: id: 20 link: bond1 addresses: - 10.10.20.11/24 mtu: 9000
- apply netplan configuration, verify that everything is fine
netplan apply ip -4 a; ip l; timedatectl status dig ceph-02.mycompany.cloud OR nslookup ceph-02
- generate ssh id_rsa for the first host and copy it to all others
ssh ceph-01 ssh-keygen echo "ssh-rsa ....." >> /.ssh/authorized_keys ssh ceph-02; echo "ssh-rsa ....." >> /.ssh/authorized_keys ssh ceph-03; echo "ssh-rsa ....." >> /.ssh/authorized_keys
- configure correct apt repositories on all nodes
vi /etc/apt/sources.list apt update
- configure correct docker-apt repository (add docker gpg key even if you use proxy repo)
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg
- install docker.io and other dependencies
apt install -y docker.io lvm2 python3
Bootstraping cluster
We are going to bootstrap a new cluster starting from the first node. This node will be our “admin” node. So on the first ceph node do:
apt install -y cephadm ceph-common
cephadm bootstrap --mon-ip 10.10.10.50 --log-to-file --registry-url 10.10.10.70:5002 --registry-username docker --registry-password password --cluster-network 10.10.20.0/24
Wait for some time and check the status of cluster
ceph -s
ceph orch host ls
Adding new hosts
First of all, let’s make new mons unmanaged. I prefer to know on which nodes my osds are located.
ssh ceph-01
ceph orch apply mon --unmanaged
Next, lets add other hosts.
First, copy ssh-key of our adm node to them:
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-01
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-02
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-03
Next, add new hosts via orchestrator. I recommend to wait a little time before proceeding to next host if you use single proxy registry.
Note: it’s crucial to use correct existing hostnames of new nodes here. So they could be in uppercase, with special charact$rs etc..
ceph orch host add ceph-02 10.10.10.51
ceph orch host add ceph-03 10.10.10.52
And now add monitors
ceph orch daemon add mon ceph-02:10.10.60.51
ceph orch daemon add mon ceph-03:10.10.60.52
Check the status
ceph -s
ceph orch host ls
Adding OSDs
It’s recommended to go easy path - just add all available devices as is. It’s working, sure, especially with more or less homogeneous setup.
All your future osds should be listed here as available
ceph orch device ls
Apply dry-run first
ceph orch apply osd --all-available-devices --dry-run
ceph orch apply osd --all-available-devices
Other way around.
You can add all devices manually, with some advanced configuration:
ceph orch daemon add osd ceph-01:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2
Or even use yml configuration for that purpose:
docs.ceph.com/drivegroups
Disable auto-provisioning of OSDs
The same thing as with monitors - ceph orchestrator keeps recreating new osds on every ocassion - like, you wipe disk - it creates new osd; you add new drive to host - it creates new osd.
I don’t know why but I believe that quite many system administrators are NOT comfortable with this behavior. So let’s disable it
ceph orch apply osd --all-available-devices --unmanaged=true
After that, if you want to setup new osds, you will need to do:
ceph orch daemon add osd *<host>*:*<path-to-device>*
Removing an OSD
To remove an OSD issue these commands
ceph orch osd rm <osd_id(s)> [--replace] --force --zap
ceph orch osd rm status
Also you could manually zap device if you forgot to provide --zap flag:
- determine your lvs/vgs for drive to zap
cephadm shell --fsid <fsid> -- ceph-volume ls
- zap device via orch
ceph orch device zap my_hostname /dev/sdx --force
- OR via ceph-volume
cephadm shell --fsid <fsid> -- ceph-volume lvm zap \ ceph-vgid/osd-block-lvid --destroy
It’s possible that you’ll need also manually delete osd:
- check if osd is still there
cephadm node ls
- remove osd
ceph osd rm osd.ID
- if it’s not sufficient - you can try to delete it from crushmap manually
ceph osd crusn rm osd.31
What else
Straying daemons
Sometimes ceph orch got stuck - not sure why, some stray daemons that it couldn’t find etc..
the only solution that I’ve found is:
ceph orch restart mgr
If you cannot perfome it because of only 1 mgr - deploy another one (even temporaly) through ceph orch daemon add mgr HOST:IP
After restarting mgrs all duplicates should gone;
Auto-memory tuning
By default, cephadm set osd_memory_target_autotune=true, which is highly unsuitable for heterogeneous or hyperconverged infrastructures.
You can check current memory consumption and limit with
ceph orch ps
You can either place the label on node to prevent memory autotune or set config option
ceph orch host label add HOSTNAME _no_autotune_memory
OR
ceph config set osd.123 osd_memory_target_autotune false
ceph config set osd.123 osd_memory_target 16G
Getting logs
Get logs from daemons
cephadm logs --name osd.34
Removing crash messages
ceph crash ls
ceph crash archive-all