Ceph Storage¶

The Acme deployment uses Ceph as a storage backend.

The Ceph deployment is not managed by StackHPC Ltd.

Working with Ceph deployment tool¶

cephadm configuration location¶

In kayobe-config repository, under etc/kayobe/cephadm.yml (or in a specific Kayobe environment when using multiple environment, e.g. etc/kayobe/environments/production/cephadm.yml)

StackHPC’s cephadm Ansible collection relies on multiple inventory groups:

mons
mgrs
osds
rgws (optional)

Those groups are usually defined in etc/kayobe/inventory/groups.

Running cephadm playbooks¶

In kayobe-config repository, under etc/kayobe/ansible there is a set of cephadm based playbooks utilising stackhpc.cephadm Ansible Galaxy collection.

cephadm.yml - runs the end to end process starting with deployment and defining EC profiles/crush rules/pools and users
cephadm-crush-rules.yml - defines Ceph crush rules according
cephadm-deploy.yml - runs the bootstrap/deploy playbook without the additional playbooks
cephadm-ec-profiles.yml - defines Ceph EC profiles
cephadm-gather-keys.yml - gather Ceph configuration and keys and populate kayobe-config
cephadm-keys.yml - defines Ceph users/keys
cephadm-pools.yml - defines Ceph pools

Running Ceph commands¶

Ceph commands are usually run inside a cephadm shell utility container:

ceph# cephadm shell

Operating a cluster requires a keyring with an admin access to be available for Ceph commands. Cephadm will copy such keyring to the nodes carrying _admin label - present on MON servers by default when using StackHPC Cephadm collection.

Adding a new storage node¶

Add a node to a respective group (e.g. osds) and run cephadm-deploy.yml playbook.

Note

To add other node types than osds (mons, mgrs, etc) you need to specify -e cephadm_bootstrap=True on playbook run.

Removing a storage node¶

First drain the node

ceph# cephadm shell
ceph# ceph orch host drain <host>

Once all daemons are removed - you can remove the host:

ceph# cephadm shell
ceph# ceph orch host rm <host>

And then remove the host from inventory (usually in etc/kayobe/inventory/overcloud)

Additional options/commands may be found in Host management

Replacing a Failed Ceph Drive¶

Once an OSD has been identified as having a hardware failure, the affected drive will need to be replaced.

If rebooting a Ceph node, first set noout to prevent excess data movement:

ceph# cephadm shell
ceph# ceph osd set noout

Reboot the node and replace the drive

Unset noout after the node is back online

ceph# cephadm shell
ceph# ceph osd unset noout

Remove the OSD using Ceph orchestrator command:

ceph# cephadm shell
ceph# ceph orch osd rm <ID> --replace

After removing OSDs, if the drives the OSDs were deployed on once again become available, cephadm may automatically try to deploy more OSDs on these drives if they match an existing drivegroup spec. If this is not your desired action plan - it’s best to modify the drivegroup spec before (cephadm_osd_spec variable in etc/kayobe/cephadm.yml). Either set unmanaged: true to stop cephadm from picking up new disks or modify it in some way that it no longer matches the drives you want to remove.

Operations¶

Replacing drive¶

See upstream documentation: https://docs.ceph.com/en/quincy/cephadm/services/osd/#replacing-an-osd

In case where disk holding DB and/or WAL fails, it is necessary to recreate (using replacement procedure above) all OSDs that are associated with this disk - usually NVMe drive. The following single command is sufficient to identify which OSDs are tied to which physical disks:

ceph# ceph device ls

Host maintenance¶

https://docs.ceph.com/en/quincy/cephadm/host-management/#maintenance-mode

Upgrading¶

https://docs.ceph.com/en/quincy/cephadm/upgrade/

Troubleshooting¶

Investigating a Failed Ceph Drive¶

A failing drive in a Ceph cluster will cause OSD daemon to crash. In this case Ceph will go into HEALTH_WARN state. Ceph can report details about failed OSDs by running:

ceph# ceph health detail

Note

Remember to run ceph/rbd commands from within cephadm shell (preferred method) or after installing Ceph client. Details in the official documentation. It is also required that the host where commands are executed has admin Ceph keyring present - easiest to achieve by applying _admin label (Ceph MON servers have it by default when using StackHPC Cephadm collection).

A failed OSD will also be reported as down by running:

ceph# ceph osd tree

Note the ID of the failed OSD.

The failed disk is usually logged by the Linux kernel too:

storage-0# dmesg -T

Cross-reference the hardware device and OSD ID to ensure they match. (Using pvs and lvs may help make this connection).

Inspecting a Ceph Block Device for a VM¶

To find out what block devices are attached to a VM, go to the hypervisor that it is running on (an admin-level user can see this from openstack server show).

On this hypervisor, enter the libvirt container:

comp0# docker exec -it nova_libvirt /bin/bash

Find the VM name using libvirt:

(nova-libvirt)[root@comp0 /]# virsh list
 Id    Name                State
------------------------------------
 1     instance-00000001   running

Now inspect the properties of the VM using virsh dumpxml:

(nova-libvirt)[root@comp0 /]# virsh dumpxml instance-00000001 | grep rbd
      <source protocol='rbd' name='acme-vms/51206278-e797-4153-b720-8255381228da_disk'>

On a Ceph node, the RBD pool can be inspected and the volume extracted as a RAW block image:

ceph# rbd ls acme-vms
ceph# rbd export acme-vms/51206278-e797-4153-b720-8255381228da_disk blob.raw

The raw block device (blob.raw above) can be mounted using the loopback device.

Inspecting a QCOW Image using LibGuestFS¶

The virtual machine’s root image can be inspected by installing libguestfs-tools and using the guestfish command:

ceph# export LIBGUESTFS_BACKEND=direct
ceph# guestfish -a blob.qcow
><fs> run
 100% [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 00:00
><fs> list-filesystems
/dev/sda1: ext4
><fs> mount /dev/sda1 /
><fs> ls /
bin
boot
dev
etc
home
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
><fs> quit