Ceph Storage¶
The Acme deployment uses Ceph as a storage backend.
The Ceph deployment is not managed by StackHPC Ltd.
Working with Ceph deployment tool¶
cephadm configuration location¶
In kayobe-config repository, under etc/kayobe/cephadm.yml
(or in a specific
Kayobe environment when using multiple environment, e.g.
etc/kayobe/environments/production/cephadm.yml
)
StackHPC’s cephadm Ansible collection relies on multiple inventory groups:
mons
mgrs
osds
rgws
(optional)
Those groups are usually defined in etc/kayobe/inventory/groups
.
Running cephadm playbooks¶
In kayobe-config repository, under etc/kayobe/ansible
there is a set of
cephadm based playbooks utilising stackhpc.cephadm Ansible Galaxy collection.
cephadm.yml
- runs the end to end process starting with deployment and defining EC profiles/crush rules/pools and userscephadm-crush-rules.yml
- defines Ceph crush rules accordingcephadm-deploy.yml
- runs the bootstrap/deploy playbook without the additional playbookscephadm-ec-profiles.yml
- defines Ceph EC profilescephadm-gather-keys.yml
- gather Ceph configuration and keys and populate kayobe-configcephadm-keys.yml
- defines Ceph users/keyscephadm-pools.yml
- defines Ceph pools
Running Ceph commands¶
Ceph commands are usually run inside a cephadm shell
utility container:
ceph# cephadm shell
Operating a cluster requires a keyring with an admin access to be available for Ceph commands. Cephadm will copy such keyring to the nodes carrying _admin label - present on MON servers by default when using StackHPC Cephadm collection.
Adding a new storage node¶
Add a node to a respective group (e.g. osds) and run cephadm-deploy.yml
playbook.
Note
To add other node types than osds (mons, mgrs, etc) you need to specify
-e cephadm_bootstrap=True
on playbook run.
Removing a storage node¶
First drain the node
ceph# cephadm shell
ceph# ceph orch host drain <host>
Once all daemons are removed - you can remove the host:
ceph# cephadm shell
ceph# ceph orch host rm <host>
And then remove the host from inventory (usually in
etc/kayobe/inventory/overcloud
)
Additional options/commands may be found in Host management
Replacing a Failed Ceph Drive¶
Once an OSD has been identified as having a hardware failure, the affected drive will need to be replaced.
If rebooting a Ceph node, first set noout
to prevent excess data
movement:
ceph# cephadm shell
ceph# ceph osd set noout
Reboot the node and replace the drive
Unset noout after the node is back online
ceph# cephadm shell
ceph# ceph osd unset noout
Remove the OSD using Ceph orchestrator command:
ceph# cephadm shell
ceph# ceph orch osd rm <ID> --replace
After removing OSDs, if the drives the OSDs were deployed on once again become
available, cephadm may automatically try to deploy more OSDs on these drives if
they match an existing drivegroup spec.
If this is not your desired action plan - it’s best to modify the drivegroup
spec before (cephadm_osd_spec
variable in etc/kayobe/cephadm.yml
).
Either set unmanaged: true
to stop cephadm from picking up new disks or
modify it in some way that it no longer matches the drives you want to remove.
Operations¶
Replacing drive¶
See upstream documentation: https://docs.ceph.com/en/quincy/cephadm/services/osd/#replacing-an-osd
In case where disk holding DB and/or WAL fails, it is necessary to recreate (using replacement procedure above) all OSDs that are associated with this disk - usually NVMe drive. The following single command is sufficient to identify which OSDs are tied to which physical disks:
ceph# ceph device ls
Host maintenance¶
https://docs.ceph.com/en/quincy/cephadm/host-management/#maintenance-mode
Upgrading¶
Troubleshooting¶
Investigating a Failed Ceph Drive¶
A failing drive in a Ceph cluster will cause OSD daemon to crash. In this case Ceph will go into HEALTH_WARN state. Ceph can report details about failed OSDs by running:
ceph# ceph health detail
Note
Remember to run ceph/rbd commands from within cephadm shell
(preferred method) or after installing Ceph client. Details in the
official documentation.
It is also required that the host where commands are executed has admin
Ceph keyring present - easiest to achieve by applying
_admin
label (Ceph MON servers have it by default when using
StackHPC Cephadm collection).
A failed OSD will also be reported as down by running:
ceph# ceph osd tree
Note the ID of the failed OSD.
The failed disk is usually logged by the Linux kernel too:
storage-0# dmesg -T
Cross-reference the hardware device and OSD ID to ensure they match. (Using pvs and lvs may help make this connection).
Inspecting a Ceph Block Device for a VM¶
To find out what block devices are attached to a VM, go to the hypervisor that
it is running on (an admin-level user can see this from openstack server
show
).
On this hypervisor, enter the libvirt container:
comp0# docker exec -it nova_libvirt /bin/bash
Find the VM name using libvirt:
(nova-libvirt)[root@comp0 /]# virsh list
Id Name State
------------------------------------
1 instance-00000001 running
Now inspect the properties of the VM using virsh dumpxml
:
(nova-libvirt)[root@comp0 /]# virsh dumpxml instance-00000001 | grep rbd
<source protocol='rbd' name='acme-vms/51206278-e797-4153-b720-8255381228da_disk'>
On a Ceph node, the RBD pool can be inspected and the volume extracted as a RAW block image:
ceph# rbd ls acme-vms
ceph# rbd export acme-vms/51206278-e797-4153-b720-8255381228da_disk blob.raw
The raw block device (blob.raw above) can be mounted using the loopback device.
Inspecting a QCOW Image using LibGuestFS¶
The virtual machine’s root image can be inspected by installing libguestfs-tools and using the guestfish command:
ceph# export LIBGUESTFS_BACKEND=direct
ceph# guestfish -a blob.qcow
><fs> run
100% [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 00:00
><fs> list-filesystems
/dev/sda1: ext4
><fs> mount /dev/sda1 /
><fs> ls /
bin
boot
dev
etc
home
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
><fs> quit