Installing Ceph

Installing the Ceph daemon

Before we install any packages we should ensure that the correct use-flags will be used so that all required functionality is made available and unnecessary functionality is not included. The sys-cluster/ceph package and its dependencies provide a variety of use-flags only some of which will be discussed further here. As usual feel free to add and remove use-flags at will although the minimum set which are required for using this guide in its entirety are shown below.

host1 ~ # emerge -pv sys-cluster/ceph

These are the packages that would be merged, in order:

Calculating dependencies... done!

[ebuild   N    ] dev-libs/libatomic_ops-7.2d
[ebuild   N    ] dev-libs/libaio-0.3.109-r4 USE="-static-libs"
[ebuild   N    ] dev-util/boost-build-1.49.0 USE="-examples -python"
[ebuild   N    ] app-arch/snappy-1.1.0 USE="-static-libs"
[ebuild   N    ] dev-libs/crypto++-5.6.2 USE="-static-libs"
[ebuild   N    ] sys-apps/keyutils-1.5.5
[ebuild   N    ] dev-libs/fcgi-2.4.1_pre0910052249 USE="-html"
[ebuild   N    ] dev-libs/libedit-20120311.3.0-r1 USE="-static-libs"
[ebuild   N    ] dev-libs/boost-1.49.0-r2 USE="-debug -doc -icu -mpi -python -static-libs -tools"
[ebuild   N    ] sys-fs/btrfs-progs-0.19.11
[ebuild   N    ] sys-libs/libunwind-1.1:7 USE="libatomic -debug -debug-frame -lzma -static-libs"
[ebuild   N    ] dev-util/google-perftools-2.0-r2:0/4 USE="-debug -largepages -minimal -static-libs"
[ebuild   N    ] net-misc/curl-7.31.0 USE="ssl threads -adns -idn -ipv6 -kerberos -ldap -metalink -rtmp -ssh -static-libs"
[ebuild   N    ] app-arch/unzip-6.0-r3 USE="bzip2 unicode -natspec"
[ebuild   N    ] dev-libs/leveldb-1.9.0-r5 USE="snappy -static-libs"
[ebuild   N    ] sys-cluster/ceph-0.67 USE="radosgw libatomic tcmalloc -debug -fuse -gtk -static-libs"

Once you are confident that the correct use-flags are set for the sys-cluster/ceph package, and any dependencies it may require, you can proceed with the installation by issuing the emerge command shown below.

host1 ~ # emerge sys-cluster/ceph

Initial configuration

Now that the sys-cluster/ceph package has been installed we can configure the first node. As it will be the only node, at least until we create some others, we shall configure a Monitor (MON), an Object Store Daemon (OSD) and a Metadata Server (MDS) thus providing a complete Ceph stack.

Before we can proceed however we first need to create some space for Ceph to use to store our data. The example below would create a 1TB logical volume named ceph in the host1_vg1 volume group, format it with the ext4 filesystem and mount it at /mnt/ceph. Don't forget to add an entry to /etc/fstab so that it is mounted automatically after a re-boot.

host1 ~ # lvcreate -n ceph -L 1T host1_vg1
host1 ~ # mkfs.ext4 /dev/host1_vg1/ceph
host1 ~ # mkdir /mnt/ceph
host1 ~ # mount /dev/host1_vg1/ceph /mnt/ceph

We will also need to create a logical volume for the MON maps. This is critical to the correct operation of the MON daemon as it will automatically shut down should this location ever become more than 95% full. Remember to add the new logical volume to the /etc/fstab file so it is automatically mounted.

host2 ~ # lvcreate -n ceph-mon -L 2G host2_vg1
host2 ~ # mkfs.ext4 /dev/host2_vg1/ceph-mon
host2 ~ # mkdir -p /var/lib/ceph/mon
host2 ~ # mount /dev/host2_vg1/ceph-mon /var/lib/ceph/mon
host2 ~ # mkdir /var/lib/ceph/mon/ceph-b /tmp/ceph

Finally, we require a UUID to uniquely identify our filesystem. We can generate one using the uuidgen application, as shown below.

host1 ~ # uuidgen

f7693e88-148c-41f5-bd40-2fedeb00bfeb

Now that we have somewhere to store our data and we have generated a suitable FSID we can create a basic configuration file.

As you can see from the example below the configuration file is fairly simple. The first block, headed [global], contains the global settings which will be used by all Ceph daemons. In this case we have configured an FSID to uniquely identify our filesystem, a keyring (which would be used by admin roles when using the authentication system) and finally we have completely disabled authentication so that we can concentrate on creating a working configuration with as few additional components as possible.

/etc/ceph/ceph.conf

[global]
fsid = f7693e88-148c-41f5-bd40-2fedeb00bfeb
keyring = /etc/ceph/keyring.admin
auth cluster required = none
auth service required = none
auth client required = none

[mon]
keyring = /etc/ceph/keyring.$name

[mds]
keyring = /etc/ceph/keyring.$name

[osd]
keyring = /etc/ceph/keyring.$name
osd data = /mnt/ceph
osd journal = /mnt/ceph/journal
osd journal size = 2048
filestore xattr use omap = true

[mon.a]
host = host1
mon addr = 10.0.0.70:6789

[mds.a]
host = host1

[osd.0]
host = host1

The next three sections (headed [mon], [mds] and [osd]) specify configuration options which will be shared between all instances of the monitor, metadata server and object store daemon respectively. As you can see we have specified a path to the keyring to be used by each daemon (the $name component will be automatically expanded to the name of the daemon)

We have also specified some basic atributes for all the OSD instances such as the path to the data and journal as well as the size of the journal (in this case 2048MB). Finally we use the filestore entry to specify that we wish to store the filesystem metadata in a separate object map as we are using the ext4 filesystem. If we were using xfs or btrfs this entry would not be required.

The final three blocks (headed [mon.a], [mds.a] and [osd.0]) specify configuration options which will apply only to the named instance of the monitor, metadata server and object store daemon respectively. As you can see we have specified the short name of the host for all three types of daemon but have only specified the address of the monitor. We do not need to specify addresses for the other daemons, even if they are running on a different host, as they will register with the monitor and their addresses are therefore available to clients from there.

To keep things as simple as possible we have disabled all authentication. If you are intending to use Ceph in a trusted environment you may wish to leave the configuration as it is for ease of use. If you are using Ceph in a more hostile environment don't forget to configure authentication later!

Node one setup

Now that we have a basic configuration in place we can create the various directories and files required by the Ceph daemons. The Ceph documentation suggests the use of the ceph-deploy utility to perform these actions however this tool does not currently work correctly on Gentoo Linux so we shall have to perform them using the mkcephfs utility instead. Before we begin however we first need to create a directory for the first monitor to store its data files and a temporary directory which will only be required for the duration of the configuration process.

host1 ~ # mkdir -p /var/lib/ceph/mon/ceph-a /tmp/ceph

Each monitor node maintains a cluster map, usually referred to as a monmap, which contains various status information about the cluster. The example below will create a new monmap with a single monitor in the temporary directory we created earlier.

host1 ~ # mkcephfs -d /tmp/ceph -c /etc/ceph/ceph.conf --prepare-monmap

preparing monmap in /tmp/ceph/monmap
/usr/bin/monmaptool --create --clobber --add a 10.0.0.70:6789 --print /tmp/ceph/monmap
/usr/bin/monmaptool: monmap file /tmp/ceph/monmap
epoch 0
fsid f7693e88-148c-41f5-bd40-2fedeb00bfeb
last_changed 2013-09-02 21:08:39.021792
created 2013-09-02 21:08:39.021792
0: 10.0.0.70:6789/0 mon.a
/usr/bin/monmaptool: writing epoch 0 to /tmp/ceph/monmap (1 monitors)

Once we have created a temporary monmap we can initialise the local object store daemon. As you can see from the example below there are some errors reported however they are not serious as the next line of output indicates. If any other errors are displayed, especially ones which are not resolved in later lines of output, you should obviously resolve these errors before proceeding further.

host1 ~ # mkcephfs -d /tmp/ceph --init-local-daemons osd

=== osd.0 === 
2013-09-02 21:10:17.389471 7ff871a34780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2013-09-02 21:10:17.974284 7ff871a34780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2013-09-02 21:10:17.975701 7ff871a34780 -1 filestore(/mnt/ceph) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2013-09-02 21:10:18.375261 7ff871a34780 -1 created object store /mnt/ceph journal /mnt/ceph/journal for osd.0 fsid f7693e88-148c-41f5-bd40-2fedeb00bfeb
2013-09-02 21:10:18.375326 7ff871a34780 -1 auth: error reading file: /etc/ceph/keyring.osd.0: can't open /etc/ceph/keyring.osd.0: (2) No such file or directory
2013-09-02 21:10:18.375464 7ff871a34780 -1 created new key in keyring /etc/ceph/keyring.osd.0

We can now initialise the local metadata server. As you can see from the example below this is simply a process of creating the private key and keyring.

host1 ~ # mkcephfs -d /tmp/ceph --init-local-daemons mds

=== mds.a === 
creating private key for mds.a keyring /etc/ceph/keyring.mds.a
creating /etc/ceph/keyring.mds.a

Now that we have a metadata server and an objec store daemon configured and prepared we can prepare the monitor as shown below. This will create a new osd map and a new keyring with the public keys of the daemons we prepared in the previous steps.

host1 ~ # mkcephfs -d /tmp/ceph --prepare-mon

Building generic osdmap from /tmp/ceph/conf
/usr/bin/osdmaptool: osdmap file '/tmp/ceph/osdmap'
/usr/bin/osdmaptool: writing epoch 1 to /tmp/ceph/osdmap
Generating admin key at /tmp/ceph/keyring.admin
creating /tmp/ceph/keyring.admin
Building initial monitor keyring
added entity mds.a auth auth(auid = 18446744073709551615 key=AQC84iRSOK5sBxAAi0S85ksmD2D6IKEu90ttPQ== with 0 caps)
added entity osd.0 auth auth(auid = 18446744073709551615 key=AQCa4iRSWChfFhAAjWITcFkiaeENVG8tr0L7+w== with 0 caps)

The final steps initialise the local monitor daemon, copy the admin keyring to /etc/ceph/ and remove the temporary directory we have been using.

host1 ~ # mkcephfs -d /tmp/ceph --init-local-daemons mon

=== mon.a === 
/usr/bin/ceph-mon: created monfs at /var/lib/ceph/mon/ceph-a for mon.a

host1 ~ # cp /tmp/ceph/keyring.admin /etc/ceph/
host1 ~ # rm /tmp/ceph -rf

Testing the first node

With our first node configured and initialised it is time to start the ceph daemons. As you can see from the example below this is performed using the /etc/init.d/ceph init script, as usual.

host1 ~ # /etc/init.d/ceph start

=== mon.a === 
Starting Ceph mon.a on host1...
Starting ceph-create-keys on host1...
=== mds.a === 
Starting Ceph mds.a on host1...
starting mds.a at :/0
=== osd.0 === 
Error ENOENT: osd.0 does not exist.  create it before updating the crush map
Starting Ceph osd.0 on host1...
starting osd.0 at :/0 osd_data /mnt/ceph /mnt/ceph/journal

Once the ceph daemons have been started it is a good idea to check that they are still running and have not crashed or failed due to a configuration or initialisation error. The example below shows how this can be accomplished and the output you should receive if all three of the ceph daemons are still running. If any of the ceph daemons is not running check the log files in /var/log/ceph for details of the problem and resolve any issues before proceeding.

host1 ~ # ps -A | grep ceph

11041 pts/0    00:16:13 ceph-mon
11153 ?        00:00:44 ceph-mds
11502 ?        02:46:51 ceph-osd

Assuming that the ceph daemons are still operating correctly you can display the status of the ceph storage cluster using the command shown below.

host1 ~ # ceph --status

  cluster f7693e88-148c-41f5-bd40-2fedeb00bfeb
   health HEALTH_WARN 384 pgs stuck inactive; 384 pgs stuck unclean
   monmap e1: 1 mons at {a=10.0.0.70:6789/0}, election epoch 2, quorum 0 a
   osdmap e3: 1 osds: 1 up, 1 in
    pgmap v4: 384 pgs: 384 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
   mdsmap e3: 1/1/1 up {0=a=up:creating}

As you can see from the output in the above example the ceph storage cluster first creates some internal data structures. When this is completed the status will change from creating to active+degraded, as shown below.

host1 ~ # ceph --status

  cluster f7693e88-148c-41f5-bd40-2fedeb00bfeb
   health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%)
   monmap e1: 1 mons at {a=10.0.0.70:6789/0}, election epoch 2, quorum 0 a
   osdmap e3: 1 osds: 1 up, 1 in
    pgmap v8: 384 pgs: 384 active+degraded; 9518 bytes data, 127 MB used, 9296 MB / 9951 MB avail; 21/42 degraded (50.000%)
   mdsmap e4: 1/1/1 up {0=a=up:active}

With only a single OSD this is as good as it will get. We should add the ceph daemon to the default run-level, as shown below, so that it will start automatically.

host1 ~ # rc-update add ceph default

In the next section, Additional Ceph nodes, we shall demonstrate how to configure additional OSD, MON and MDS nodes for a fully redundant cluster.