Cephadm on Ubuntu 22.04

TAGS: ceph linux

Intro

Mission: Deploy new Ceph poduction-ready cluster on 4 new hardware nodes.

Overview

I had zero experience with the current cephadm orchestrator for ceph. But time flies and ceph-ansible is under further deprecation, so let old dog learn new tricks.

Requirements

Nothing new here:

JBOD for osds
raid1 for 2 OS disks (mdraid is fine)
minimum 3 nodes (and maybe around 15 maximum for the sake of possible rebalance)
at least 2 network cards with 2x10 gbps ports (better 25 gbps) on each node
configured lacp on network switches
at least 2xOSD vcpu and 4xOSD ram on each OSD node
proxy docker registry to quay.io accessible from all nodes
2 separate networks - ceph-internal / ceph-cluster (with jumbo frames and mtu 9000)
accessible corporate ntp/dns servers, IPMI access if something goes wrong

Preparation

Step 1.

provision all hosts, install Ubuntu 22.04
configure hostnames, /etc/hosts localhost records
configure chronyd and systemd-resolve. Here we configure resolv.conf to follow changes from netplan sudo rm -f /etc/resolv.conf sudo ln -s /run/resolvconf/resolv.conf /etc/resolv.conf
configure netplan. Pay attention to dhcp-dns, mtu and other configuration options: network: bonds: bond0: interfaces: - enp180s0f0np0 - enp180s0f1np1 parameters: lacp-rate: fast mode: 802.3ad transmit-hash-policy: layer2+3 bond1: interfaces: - enp179s0f0np0 - enp179s0f1np1 parameters: lacp-rate: fast mode: 802.3ad transmit-hash-policy: layer2+3 ethernets: enp179s0f0np0: dhcp4: true mtu: 9000 enp179s0f1np1: dhcp4: true mtu: 9000 enp180s0f0np0: dhcp4: true dhcp4-overrides: use-dns: false enp180s0f1np1: dhcp4: true dhcp4-overrides: use-dns: false version: 2 vlans: bond0.10: id: 10 link: bond0 addresses: - 10.10.10.10/24 dhcp4-overrides: use-dns: false nameservers: addresses: - 10.10.10.200 - 10.10.10.220 search: - "mycompany.cloud" routes: - to: default via: 10.10.10.1 bond1.20: id: 20 link: bond1 addresses: - 10.10.20.11/24 mtu: 9000
apply netplan configuration, verify that everything is fine netplan apply ip -4 a; ip l; timedatectl status dig ceph-02.mycompany.cloud OR nslookup ceph-02
generate ssh id_rsa for the first host and copy it to all others ssh ceph-01 ssh-keygen echo "ssh-rsa ....." >> /.ssh/authorized_keys ssh ceph-02; echo "ssh-rsa ....." >> /.ssh/authorized_keys ssh ceph-03; echo "ssh-rsa ....." >> /.ssh/authorized_keys
configure correct apt repositories on all nodes vi /etc/apt/sources.list apt update
configure correct docker-apt repository (add docker gpg key even if you use proxy repo) curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg
install docker.io and other dependencies apt install -y docker.io lvm2 python3

Bootstraping cluster

We are going to bootstrap a new cluster starting from the first node. This node will be our “admin” node. So on the first ceph node do: apt install -y cephadm ceph-common cephadm bootstrap --mon-ip 10.10.10.50 --log-to-file --registry-url 10.10.10.70:5002 --registry-username docker --registry-password password --cluster-network 10.10.20.0/24 Wait for some time and check the status of cluster ceph -s ceph orch host ls

Adding new hosts

First of all, let’s make new mons unmanaged. I prefer to know on which nodes my osds are located. ssh ceph-01 ceph orch apply mon --unmanaged Next, lets add other hosts.
First, copy ssh-key of our adm node to them: ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-01 ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-02 ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-03 Next, add new hosts via orchestrator. I recommend to wait a little time before proceeding to next host if you use single proxy registry. Note: it’s crucial to use correct existing hostnames of new nodes here. So they could be in uppercase, with special charact$rs etc.. ceph orch host add ceph-02 10.10.10.51 ceph orch host add ceph-03 10.10.10.52 And now add monitors ceph orch daemon add mon ceph-02:10.10.60.51 ceph orch daemon add mon ceph-03:10.10.60.52 Check the status ceph -s ceph orch host ls

Adding OSDs

It’s recommended to go easy path - just add all available devices as is. It’s working, sure, especially with more or less homogeneous setup.
All your future osds should be listed here as available ceph orch device ls Apply dry-run first ceph orch apply osd --all-available-devices --dry-run ceph orch apply osd --all-available-devices

Other way around.
You can add all devices manually, with some advanced configuration: ceph orch daemon add osd ceph-01:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2 Or even use yml configuration for that purpose: docs.ceph.com/drivegroups

Disable auto-provisioning of OSDs

The same thing as with monitors - ceph orchestrator keeps recreating new osds on every ocassion - like, you wipe disk - it creates new osd; you add new drive to host - it creates new osd.
I don’t know why but I believe that quite many system administrators are NOT comfortable with this behavior. So let’s disable it ceph orch apply osd --all-available-devices --unmanaged=true After that, if you want to setup new osds, you will need to do: ceph orch daemon add osd *<host>*:*<path-to-device>*

Removing an OSD

To remove an OSD issue these commands ceph orch osd rm <osd_id(s)> [--replace] --force --zap ceph orch osd rm status Also you could manually zap device if you forgot to provide --zap flag:

determine your lvs/vgs for drive to zap cephadm shell --fsid <fsid> -- ceph-volume ls
zap device via orch ceph orch device zap my_hostname /dev/sdx --force
OR via ceph-volume cephadm shell --fsid <fsid> -- ceph-volume lvm zap \ ceph-vgid/osd-block-lvid --destroy

It’s possible that you’ll need also manually delete osd:

check if osd is still there cephadm node ls
remove osd ceph osd rm osd.ID
if it’s not sufficient - you can try to delete it from crushmap manually ceph osd crusn rm osd.31

What else

Straying daemons

Sometimes ceph orch got stuck - not sure why, some stray daemons that it couldn’t find etc.. the only solution that I’ve found is: ceph orch restart mgr If you cannot perfome it because of only 1 mgr - deploy another one (even temporaly) through ceph orch daemon add mgr HOST:IP After restarting mgrs all duplicates should gone;

Auto-memory tuning

By default, cephadm set osd_memory_target_autotune=true, which is highly unsuitable for heterogeneous or hyperconverged infrastructures. You can check current memory consumption and limit with ceph orch ps You can either place the label on node to prevent memory autotune or set config option ceph orch host label add HOSTNAME _no_autotune_memory OR ceph config set osd.123 osd_memory_target_autotune false ceph config set osd.123 osd_memory_target 16G

Getting logs

Get logs from daemons cephadm logs --name osd.34

Removing crash messages

ceph crash ls ceph crash archive-all

Links

redhat operations guide cephadm

Written on October 5, 2023

Roman Klimenko