Configuring the system

Rebuilding the initramfs using Dracut

Once the sys-boot/last-known-good-dracut-module package has been installed the next task is to use the Dracut application to rebuild the initramfs with the new module. The example below demonstrates how this may be accomplished. Obviously, you may need to modify some of the parameters to suit your kernel version and configuration.

lisa dracut 3.8.13-gentoo-r1.img 3.8.13-gentoo-r1 --mdadmconf --nolvmconf --no-hostonly --omit "network nbd nfs iscsi"
Caution:
As you can see we have instructed Dracut to include the local mdadm.conf file to ensure that MD-RAID arrays are correctly and predictably assembled by the initramfs at boot time. We have also instructed Dracut to not include the local lvm.conf file as it is unnecessary and may interfere with the correct operation of the LKG system.
 

Assuming that all goes well you should see output similar to that shown below. As long as you see the highlighted line in the example output below and there are no obvious errors then the new initramfs should be ready to use.

I: dracut module 'network' will not be installed, because it's in the list to be omitted!
I: dracut module 'iscsi' will not be installed, because it's in the list to be omitted!
I: dracut module 'nbd' will not be installed, because it's in the list to be omitted!
I: dracut module 'nfs' will not be installed, because it's in the list to be omitted!
I: *** Including module: dash ***
I: *** Including module: caps ***
I: *** Including module: i18n ***
I: Skipping program /usr/lib/systemd/systemd-vconsole-setup as it cannot be found and is flagged to be optional
I: *** Including module: dm ***
I: Skipping udev rule: 64-device-mapper.rules
I: *** Including module: kernel-modules ***
I: Omitting driver nfs
I: Omitting driver nfsv2
I: Omitting driver nfsv3
I: Omitting driver nfsd
I: Omitting driver lockd
I: *** Including module: lvm ***
I: Skipping udev rule: 64-device-mapper.rules
I: *** Including module: mdraid ***
I: *** Including module: lkg ***
I: *** Including module: resume *** I: *** Including module: rootfs-block *** I: *** Including module: terminfo *** I: *** Including module: udev-rules *** I: Skipping program /lib/udev/create_floppy_devices as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/edd_id as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/firmware.sh as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/firmware as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/firmware.agent as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/hotplug.functions as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/fw_unit_symlinks.sh as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/hid2hci as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/path_id as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/input_id as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/usb_id as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/pcmcia-socket-startup as it cannot be found and is flagged to be optional I: Skipping program /lib/udev/pcmcia-check-broken-cis as it cannot be found and is flagged to be optional I: Skipping program /etc/pcmcia/config.opts as it cannot be found and is flagged to be optional I: *** Including module: biosdevname *** I: *** Including module: usrmount *** I: *** Including module: base *** I: Skipping program /usr/lib/systemd/systemd-timestamp as it cannot be found and is flagged to be optional I: *** Including module: fs-lib *** I: Skipping program xfs_db as it cannot be found and is flagged to be optional I: Skipping program xfs_check as it cannot be found and is flagged to be optional I: Skipping program xfs_repair as it cannot be found and is flagged to be optional I: Skipping program xfs_metadump as it cannot be found and is flagged to be optional I: Skipping program jfs_fsck as it cannot be found and is flagged to be optional I: Skipping program reiserfsck as it cannot be found and is flagged to be optional I: Skipping program btrfsck as it cannot be found and is flagged to be optional I: *** Including module: shutdown *** I: Skipping program kexec as it cannot be found and is flagged to be optional I: *** Including modules done *** I: *** Installing kernel module dependencies and firmware *** I: *** Installing kernel module dependencies and firmware done *** I: *** Stripping files *** I: *** Stripping files done *** I: *** Creating image file *** I: *** Creating image file done *** I: Wrote /root/initramfs.img: I: -rw------- 1 root root 8423451 Jul 8 21:40 /root/3.8.13-gentoo-r1.img

Before the new initramfs may be used it will need to be moved or copied to the /boot partition, which probably needs to be mounted. You may also need to copy the latest kernel image too. The example below demonstrates these three tasks.

lisa mount /boot
lisa mv 3.8.13-gentoo-r1.img /boot/
lisa cp /usr/src/linux/arch/x86_64/boot/bzImage /boot/3.8.13-gentoo-r1

Reconfiguring Grub

So that the Last Known Good Configuration (LKGC) system is able to make backup copies of all the required boot-time files (including kernel image, initramfs and any hypervisor if required) these items must appear in pre-defined locations. The example below creates symbolic links to the kernel image and initramfs we created in the previous step named kernel and initrd respecfively.

lisa ln -s /boot/3.8.13-gentoo-r1 /boot/kernel
lisa ln -s /boot/3.8.13-gentoo-r1.img /boot/initrd

If you are using a hypervisor, this example assumes that the Xen hypervisor is in use, you will also need to create a symbolic link to it named hyperv, as shown below.

lisa ln -s /boot/xen.gz /boot/hyperv

Now that we have the required boot-time files in the expected locations we can modify the boot-loader configuration to use these files and those generated by the Last Known Good Configuration (LKGC) system. The example below demonstrates how to configure grub to boot our default kernel image and initramfs (pointed to by the symbolic links we created in the previous step) as well as the last known good kernel image and initramfs (pointed to by symbolic links which will be automatically created by the LKG system).

/boot/grub/grub.conf
title  Gentoo Linux / Normal boot
kernel (hd0,0)/kernel rd.md.uuid=d43c4166:3bd439c9:7a21b82b:80fd4af2 rd.dm=0 root=/dev/lkg/volumes/root
module (hd0,0)/initrd

title  Gentoo Linux / Last Known Good (LKG) [1]
kernel (hd0,0)/kernel.1 rd.md.uuid=d43c4166:3bd439c9:7a21b82b:80fd4af2 rd.dm=0 root=/dev/lkg/volumes/root rd.lkg.snapshot=1
module (hd0,0)/initrd.1

# Existing entries should be kept here until you know it works!

As always, the above example will need to be modified to reflect the exact configuration in use on your system. At the very least you will need to replace the rd.md.uuid value(s) with the unique identifier(s) associated with any MD-RAID array(s) which will need to be assembled by the initramfs before the logical volumes may be detected.

The example below demonstrates how to configure grub when a hypervisor (specificaly the Xen hypervisor although other configurations would be similar) is in use.

/boot/grub/grub.conf
title  XEN (credit sched) / Gentoo Linux / Normal boot
kernel (hd0,0)/hyperv sched=credit dom0_mem=927M,max:927M dom0_max_vcpus=2
module (hd0,0)/kernel rd.md.uuid=d43c4166:3bd439c9:7a21b82b:80fd4af2 rd.dm=0 root=/dev/lkg/volumes/root
module (hd0,0)/initrd

title  XEN (credit sched) / Gentoo Linux / Last Known Good (LKG) [1]
kernel (hd0,0)/hyperv.1 sched=credit dom0_mem=927M,max:927M dom0_max_vcpus=2
module (hd0,0)/kernel.1 rd.md.uuid=d43c4166:3bd439c9:7a21b82b:80fd4af2 rd.dm=0 root=/dev/lkg/volumes/root rd.lkg.snapshot=1
module (hd0,0)/initrd.1

# Existing entries should be kept here until you know it works!

When you are satisfied that the required boot-time files are present and the boot-loader configuration has been modified correctly the boot partition should be unmounted, as shown below.

lisa umount /boot

Marking the LVM snapshot sources

So that the Last Known Good Configuration (LKGC) system knows which logical volumes are part of the snapshot set we need to tag the appropriate volumes with @lkg_source. The example below demonstrates how to tag the opt, root, usr and var volumes. If your volumes are named differently, or are not contained in the volumes group, then you will obviously need to modify the paths used accordingly.

lisa lvchange --addtag @lkg_source /dev/volumes/opt
lisa lvchange --addtag @lkg_source /dev/volumes/root
lisa lvchange --addtag @lkg_source /dev/volumes/usr
lisa lvchange --addtag @lkg_source /dev/volumes/var

You can verify that the tags have been applied correctly using the lvs application to display a list of all the volumes which have been tagged with @lkg_source as shown in the example below.

lisa lvs @lkg_source
  LV    VG        Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  opt   volumes   owi-aos- 128.00m                                           
  root  volumes   owi-aos-   2.00g                                           
  usr   volumes   owi-aos-   4.00g                                           
  var   volumes   owi-aos-   2.00g 
Caution:
If you do not already have /var/log stored on a separate logical volume you should do so now. Having the logs stored on the var volume, which will usually be one of the snapshot sources, will cause unnecessary writes to the snapshot volumes when new log entries are written and also cause the logs to be rolled back when the LKGC system is used. If the logs are stored on a separate volume these issues do not occur and complete logs from any LKGC boots are available.
 

Testing

Installation and configuration of the Last Known Good Configuration (LKGC) system should now be complete. Before you rely on it however it is a sensible idea to test that it is functioning as expected. Unfortunately, the only was to accomplish this is to reboot the system and find out!

lisa shutdown -r now && exit
Warning:
As always when rebooting a computer system you should first ensure that any users of the system are expecting the interruption in service. It is also a good idea to ensure that the previous configuration is available through a boot menu option and that physical or remote access is available should an issue arrise which requires interaction with the boot process.
 

When the system restarts you should observe several signs that things are working, or at the very least some indication of where things went wrong. The first message you should see, during the execution of the initramfs, is shown below.

Last Known Good (LKG) [phase 1] 

During phase one symbolic links are created in the /dev/lkg directory to the volumes which have been tagged with @lkg_source so that they are available for the rest of the boot process.

After phase one is completed, and several other tasks have been performed by other components of the initramfs, phase two will be executed. This should produce a message similar to that shown below.

Last Known Good (LKG) [phase 2] - Creating LKG candidate snapshots 

During phase two any existing snapshot candidate volumes are deleted (these may occur when a boot failed and the generated candidates were not automatically removed) and new snapshot candidate volumes are created.

The boot process should now continue normally until the default runlevel has started. Once this is complete phase three will be entered.

During phase three the test scripts in /lib/lkg.tests are executed in order. Unless additional tests have been installed these will only comprise 01-default-runlevel (which tests to see if the runlevel we are currently in is the default as specified in inittab) and 02-services-started (which, as the name would suggest, checks that all services in the current runlevel have started correctly). Assuming that these tests are passed then the snapshot candidate volumes created in phase two are re-tagged as snapshot volumes.

You can verify that this has indeed occured using the lvs command shown in the example below.

lisa lvs @lkg_snapshot
  LV                       VG        Attr     LSize   Pool Origin   Data%  Move Log Copy%  Convert
  opt_lkg_20130605224318   volumes   swi-a-s- 128.00m      opt      0.01                        
  root_lkg_20130605224318  volumes   swi-a-s-   2.00g      root     0.66                        
  usr_lkg_20130605224318   volumes   swi-a-s-   4.00g      usr      0.56                        
  var_lkg_20130605224318   volumes   swi-a-s-   2.00g      var      2.39 

If the tests above fail then the snapshot candidate volumes created in phase two should be automatically deleted. You can verify that there are no snapshot candidate volumes present using the command shown in the example below.

lisa lvs @lkg_candidate

You should probably now test the Last Known Good Configuration aspect of the system is also working correctly by rebooting once more and selecting the Last Known Good (LKG) option from the boot menu. Assuming all goes well a duplicate of the last known good snapshots volumes will be made and the system should then boot normally.

To verify that active duplicate snapshot volume(s) were indeed created you can use the lvs application as shown below.

lisa lvs @lkg_active
Warning:
Once you have verified that the Last Known Good Configuration (LKGC) system is indeed functioning correctly don't forget to reboot one last time to return to a normal system instead of a duplicate of the last known good snaphot volumes. Failure to do so will result in all changes to the system being lost on the next reboot!