[64498] | 1 |
|
---|
| 2 | Testbox Imaging (Backup / Restore)
|
---|
| 3 | ==================================
|
---|
| 4 |
|
---|
| 5 |
|
---|
| 6 | Introduction
|
---|
| 7 | ------------
|
---|
| 8 |
|
---|
| 9 | This document is explores deloying a very simple drive imaging solution to help
|
---|
| 10 | avoid needing to manually reinstall testboxes when a disk goes bust or the OS
|
---|
| 11 | install seems to be corrupted.
|
---|
| 12 |
|
---|
| 13 |
|
---|
| 14 | Definitions / Glossary
|
---|
| 15 | ======================
|
---|
| 16 |
|
---|
| 17 | See AutomaticTestingRevamp.txt.
|
---|
| 18 |
|
---|
| 19 |
|
---|
| 20 | Objectives
|
---|
| 21 | ==========
|
---|
| 22 |
|
---|
| 23 | - Off site, no admin interaction (no need for ILOM or similar).
|
---|
| 24 | - OS independent.
|
---|
| 25 | - Space and bandwidth efficient.
|
---|
| 26 | - As automatic as possible.
|
---|
| 27 | - Logging.
|
---|
| 28 |
|
---|
| 29 |
|
---|
| 30 | Overview of the Solution
|
---|
| 31 | ========================
|
---|
| 32 |
|
---|
| 33 | Here is a brief summary:
|
---|
| 34 |
|
---|
| 35 | - Always boot testboxes via PXE using PXELINUX.
|
---|
| 36 | - Default configuration is local boot (hard disk / SSD)
|
---|
| 37 | - Restore/backup action triggered by machine specific PXE config.
|
---|
| 38 | - Boots special debian maintenance install off NFS.
|
---|
| 39 | - A maintenance service (systemd style) does the work.
|
---|
| 40 | - The service reads action from TFTP location and performs it.
|
---|
| 41 | - When done the service removes the TFTP machine specific config
|
---|
| 42 | and reboots the system.
|
---|
| 43 |
|
---|
| 44 | Maintenance actions are:
|
---|
| 45 | - backup
|
---|
[64523] | 46 | - backup-again
|
---|
[64498] | 47 | - restore
|
---|
[64523] | 48 | - refresh-info
|
---|
| 49 | - rescue
|
---|
[64498] | 50 |
|
---|
| 51 | Possible modifier that indicates a subset of disk on testboxes with other OSes
|
---|
| 52 | installed. Support for partition level backup/restore is not explored here.
|
---|
| 53 |
|
---|
| 54 |
|
---|
[64523] | 55 | How to use
|
---|
| 56 | ----------
|
---|
| 57 |
|
---|
[64599] | 58 | To perform one of the above maintenance actions on a testbox, run the
|
---|
| 59 | ``testbox-pxe-conf.sh`` script::
|
---|
[64523] | 60 |
|
---|
[64599] | 61 | /mnt/testbox-tftp/pxeclient.cfg/testbox-pxe-conf.sh 10.165.98.220 rescue
|
---|
[64523] | 62 |
|
---|
[64599] | 63 | Then trigger a reboot. The box will then boot the NFS rooted debian image and
|
---|
| 64 | execute the maintenance action. On success, it will remove the testbox hex-IP
|
---|
| 65 | config file and reboot again.
|
---|
[64524] | 66 |
|
---|
[64599] | 67 |
|
---|
[64498] | 68 | Storage Server
|
---|
| 69 | ==============
|
---|
| 70 |
|
---|
| 71 | The storage server will have three areas used here. Using NFS for all three
|
---|
| 72 | avoids extra work getting CIFS sharing right too (NFS is already a pain).
|
---|
| 73 |
|
---|
[64523] | 74 | 1. /export/testbox-tftp - TFTP config area. Read-write.
|
---|
| 75 | 2. /export/testbox-backup - Images and logs. Read-write.
|
---|
| 76 | 3. /export/testbox-nfsroot - Custom debian. Read-only, no root squash.
|
---|
[64498] | 77 |
|
---|
| 78 |
|
---|
[64523] | 79 | TFTP (/export/testbox-tftp)
|
---|
[64498] | 80 | ============================
|
---|
| 81 |
|
---|
| 82 | The testbox-tftp share needs to be writable, root squashing is okay.
|
---|
| 83 |
|
---|
| 84 | We need files from both PXELINUX and SYSLINUX to make this work now. On a
|
---|
| 85 | debian system, the ``pxelinux`` and ``syslinux`` packages needs to be
|
---|
| 86 | installed. We actually do this further down when setting up the nfsroot, so
|
---|
| 87 | it's possible to get them from there by postponing this step a little. On
|
---|
| 88 | debian 8.6.0 the PXELINUX files are found in ``/usr/lib/PXELINUX`` and the
|
---|
| 89 | SYSLINUX ones in ``/usr/lib/syslinux``.
|
---|
| 90 |
|
---|
| 91 | The initial PXE image as well as associated modules comes in three variants,
|
---|
| 92 | BIOS, 32-bit EFI and 64-bit EFI. We'll only need the BIOS one for now.
|
---|
| 93 | Perform the following copy operations::
|
---|
| 94 |
|
---|
[64523] | 95 | cp /usr/lib/PXELINUX/pxelinux.0 /mnt/testbox-tftp/
|
---|
| 96 | cp /usr/lib/syslinux/modules/*/ldlinux.* /mnt/testbox-tftp/
|
---|
| 97 | cp -R /usr/lib/syslinux/modules/bios /mnt/testbox-tftp/
|
---|
| 98 | cp -R /usr/lib/syslinux/modules/efi32 /mnt/testbox-tftp/
|
---|
| 99 | cp -R /usr/lib/syslinux/modules/efi64 /mnt/testbox-tftp/
|
---|
[64498] | 100 |
|
---|
| 101 |
|
---|
| 102 | For simplicitly, all the testboxes boot using good old fashioned BIOS, no EFI.
|
---|
| 103 | However, it doesn't really hurt to be prepared.
|
---|
| 104 |
|
---|
| 105 | The PXELINUX related files goes in the root of the testbox-tftp share. (As
|
---|
| 106 | mentioned further down, these can be installed on a debian system by running
|
---|
| 107 | ``apt-get install pxelinux syslinux``.) We need the ``*pxelinux.0`` files
|
---|
| 108 | typically found in ``/usr/lib/PXELINUX/`` on debian systems (recent ones
|
---|
| 109 | anyway). It is possible we may need one ore more fo the modules [1]_ that
|
---|
| 110 | ships with PXELINUX/SYSLINUX, so do copy ``/usr/lib/syslinux/modules`` to
|
---|
| 111 | ``testbox-tftp/modules`` as well.
|
---|
| 112 |
|
---|
| 113 |
|
---|
| 114 | The directory layout related to the configuration files is dictated by the
|
---|
| 115 | PXELINUX configuration file searching algorithm [2]_. Create a subdirectory
|
---|
| 116 | ``pxelinux.cfg/`` under ``testbox-tftp`` and create the world readable file
|
---|
| 117 | ``default`` with the following content::
|
---|
| 118 |
|
---|
| 119 | PATH bios
|
---|
| 120 | DEFAULT local-boot
|
---|
| 121 | LABEL local-boot
|
---|
| 122 | LOCALBOOT
|
---|
| 123 |
|
---|
| 124 | This will make the default behavior to boot the local disk system.
|
---|
| 125 |
|
---|
[64599] | 126 | Copy the ``testbox-pxe-conf.sh`` script file found in the same directory as
|
---|
| 127 | this document to ``/mnt/testbox-tftp/pxelinux.cfg/``. Edit the copy to correct
|
---|
| 128 | the IP addresses near the top, as well as any linux, TFTP and PXE details near
|
---|
| 129 | the bottom of the file. This script will generate the PXE configuration file
|
---|
| 130 | when performing maintenance on a testbox.
|
---|
[64498] | 131 |
|
---|
| 132 |
|
---|
[64523] | 133 | Images and logs (/export/testbox-backup)
|
---|
[64498] | 134 | =========================================
|
---|
| 135 |
|
---|
| 136 | The testbox-backup share needs to be writable, root squashing is okay.
|
---|
| 137 |
|
---|
[64523] | 138 | In the root there must be a file ``testbox-backup`` so we can easily tell
|
---|
| 139 | whether we've actually mounted the share or are just staring at an empty mount
|
---|
| 140 | point directory.
|
---|
[64498] | 141 |
|
---|
[64523] | 142 | The ``testbox-maintenance.sh`` script maintains a global log in the root
|
---|
| 143 | directory that's called ``maintenance.log``. Errors will be logged there as
|
---|
| 144 | well as a ping and the action.
|
---|
[64498] | 145 |
|
---|
[64523] | 146 | We use a directory layout based on dotted decimal IP addresses here, so for a
|
---|
| 147 | server with the IP 10.40.41.42 all its file will be under ``10.40.41.42/``:
|
---|
[64498] | 148 |
|
---|
| 149 | ``<hostname>``
|
---|
| 150 | The name of the testbox (empty file). Help finding a testbox by name.
|
---|
| 151 |
|
---|
| 152 | ``testbox-info.txt``
|
---|
| 153 | Information about the testbox. Starting off with the name, decimal IP,
|
---|
| 154 | PXELINUX style hexadecimal IP, and more.
|
---|
| 155 |
|
---|
| 156 | ``maintenance.log``
|
---|
| 157 | Maintenance log file recording what the maintenance service does.
|
---|
| 158 |
|
---|
| 159 | ``disk-devices.lst``
|
---|
| 160 | Optional list of disk devices to consider backuping up or restoring. This is
|
---|
| 161 | intended for testboxes with additional disks that are used for other purposes
|
---|
| 162 | and should touched.
|
---|
| 163 |
|
---|
| 164 | ``sda.raw.gz``
|
---|
| 165 | The gzipped raw copy of the sda device of the testbox.
|
---|
| 166 |
|
---|
| 167 | ``sd[bcdefgh].raw.gz``
|
---|
| 168 | The gzipped raw copy sdb, sdc, sde, sdf, sdg, sdh, etc if any of them exists
|
---|
| 169 | and are disks/SSDs.
|
---|
| 170 |
|
---|
| 171 |
|
---|
| 172 | Note! If it turns out we can be certain to get a valid host name, we might just
|
---|
| 173 | switch to use the hostname as the directory name instead of the IP.
|
---|
| 174 |
|
---|
| 175 |
|
---|
[64523] | 176 | Debian NFS root (/export/testbox-nfsroot)
|
---|
[64498] | 177 | ==========================================
|
---|
| 178 |
|
---|
| 179 | The testbox-nfsroot share should be read-only and must **not** have root
|
---|
[64599] | 180 | squashing enabled. Also, make sure setting the set-uid-bit is allowed by the
|
---|
| 181 | server, or ``su` and ``sudo`` won't work
|
---|
[64498] | 182 |
|
---|
| 183 | There are several ways of creating a debian nfsroot, but since we've got a
|
---|
| 184 | tool like VirtualBox around we've just installed it in a VM, prepared it,
|
---|
| 185 | and copied it onto the NFS server share.
|
---|
| 186 |
|
---|
| 187 | As of writing debian 8.6.0 is current, so a minimal 64-bit install of it was
|
---|
| 188 | done in a VM. After installation the following modifications was done:
|
---|
| 189 |
|
---|
[64601] | 190 | - ``apt-get install pxelinux syslinux initramfs-tools zip gddrescue sudo joe``
|
---|
[64523] | 191 | and optionally ``apt-get install smbclient cifs-utils``.
|
---|
[64498] | 192 |
|
---|
| 193 | - ``/etc/default/grub`` was modified to set ``GRUB_CMDLINE_LINUX_DEFAULT`` to
|
---|
| 194 | ``""`` instead of ``"quiet"``. This allows us to see messages during boot
|
---|
| 195 | and perhaps spot why something doesn't work on a testbox. Regenerate the
|
---|
| 196 | grub configuration file by running ``update-grub`` afterwards.
|
---|
| 197 |
|
---|
[64601] | 198 | - ``/etc/sudoers`` was modified to allow the ``vbox`` user use sudo without
|
---|
| 199 | requring any password.
|
---|
| 200 |
|
---|
[64498] | 201 | - Create the directory ``/etc/systemd/system/getty@tty1.service.d`` and create
|
---|
[64523] | 202 | the file ``noclear.conf`` in it with the following content::
|
---|
[64498] | 203 |
|
---|
| 204 | [Service]
|
---|
| 205 | TTYVTDisallocate=no
|
---|
| 206 |
|
---|
| 207 | This stops getty from clearing VT1 and let us see the tail of the boot up
|
---|
| 208 | messages, which includes messages from the testbox-maintenance service.
|
---|
| 209 |
|
---|
| 210 | - Mount the testbox-nfsroot under ``/mnt/`` with write privileges. (The write
|
---|
| 211 | privileges are temporary - don't forget to remove them later on.)::
|
---|
| 212 |
|
---|
[64523] | 213 | mount -t nfs myserver.com:/export/testbox-nfsroot
|
---|
[64498] | 214 |
|
---|
[64523] | 215 | Note! Adding ``-o nfsvers=3`` may help with some NTFv4 servers.
|
---|
| 216 |
|
---|
[64498] | 217 | - Copy the debian root and dev file system onto nfsroot. If you have ssh
|
---|
| 218 | access to the NFS server, the quickest way to do it is to use ``tar``::
|
---|
| 219 |
|
---|
| 220 | tar -cz --one-file-system -f /mnt/testbox-maintenance-nfsroot.tar.gz . dev/
|
---|
| 221 |
|
---|
| 222 | An alternative is ``cp -ax . /mnt/. && cp -ax dev/. /mnt/dev/.`` but this
|
---|
| 223 | is quite a bit slower, obviously.
|
---|
| 224 |
|
---|
[64599] | 225 | - Edit ``/etc/ssh/sshd_config`` setting ``PermitRootLogin`` to ``yes`` so we can ssh
|
---|
| 226 | in as root later on.
|
---|
| 227 |
|
---|
[64498] | 228 | - chroot into the nfsroot: ``chroot /mnt/``
|
---|
| 229 |
|
---|
[64523] | 230 | - ``mount -o proc proc /proc``
|
---|
[64498] | 231 |
|
---|
[64523] | 232 | - ``mount -o sysfs sysfs /sys``
|
---|
| 233 |
|
---|
| 234 | - ``mkdir /mnt/testbox-tftp /mnt/testbox-backup``
|
---|
| 235 |
|
---|
| 236 | - Recreate ``/etc/fstab`` with::
|
---|
| 237 |
|
---|
| 238 | proc /proc proc defaults 0 0
|
---|
| 239 | /dev/nfs / nfs defaults 1 1
|
---|
[64599] | 240 | 10.42.1.1:/export/testbox-tftp /mnt/testbox-tftp nfs tcp,nfsvers=3,noauto 2 2
|
---|
| 241 | 10.42.1.1:/export/testbox-backup /mnt/testbox-backup nfs tcp,nfsvers=3,noauto 3 3
|
---|
[64523] | 242 |
|
---|
[64599] | 243 | We use NFS version 3 as that works better for our NFS server and client,
|
---|
| 244 | remove if not necessary. The ``noauto`` option is to work around mount
|
---|
| 245 | trouble during early bootup on some of our boxes.
|
---|
| 246 |
|
---|
[64523] | 247 | - Do ``mount /mnt/testbox-tftp && mount /mnt/testbox-backup`` to mount the
|
---|
| 248 | two shares. This may be a good time to execute the instructions in the
|
---|
| 249 | sections above relating to these two shares.
|
---|
| 250 |
|
---|
[64498] | 251 | - Edit ``/etc/initramfs-tools/initramfs.conf`` and change the ``MODULES``
|
---|
| 252 | value from ``most`` to ``netboot``.
|
---|
| 253 |
|
---|
| 254 | - Append ``aufs`` to ``/etc/initramfs-tools/modules``. The advanced
|
---|
| 255 | multi-layered unification filesystem (aufs) enables us to use a
|
---|
| 256 | read-only NFS root. [3]_ [4]_ [5]_
|
---|
| 257 |
|
---|
| 258 | - Create ``/etc/initramfs-tools/scripts/init-bottom/00_aufs_init`` as
|
---|
| 259 | an executable file with the following content::
|
---|
| 260 |
|
---|
[64523] | 261 | #!/bin/sh
|
---|
[64498] | 262 | # Don't run during update-initramfs:
|
---|
| 263 | case "$1" in
|
---|
| 264 | prereqs)
|
---|
| 265 | exit 0;
|
---|
| 266 | ;;
|
---|
| 267 | esac
|
---|
| 268 |
|
---|
| 269 | modprobe aufs
|
---|
| 270 | mkdir -p /ro /rw /aufs
|
---|
| 271 | mount -t tmpfs tmpfs /rw -o noatime,mode=0755
|
---|
| 272 | mount --move $rootmnt /ro
|
---|
| 273 | mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro
|
---|
| 274 | mkdir -p /aufs/rw /aufs/ro
|
---|
| 275 | mount --move /ro /aufs/ro
|
---|
| 276 | mount --move /rw /aufs/rw
|
---|
| 277 | mount --move /aufs /root
|
---|
| 278 | exit 0
|
---|
| 279 |
|
---|
| 280 | - Update the init ramdisk: ``update-initramfs -u -k all``
|
---|
| 281 |
|
---|
[64523] | 282 | Note! It may be necessary to do ``mount -t tmpfs tmpfs /var/tmp`` to help
|
---|
| 283 | this operation succeed.
|
---|
[64498] | 284 |
|
---|
[64523] | 285 | - Copy ``/boot`` to ``/mnt/testbox-tftp/maintenance-boot/``.
|
---|
[64498] | 286 |
|
---|
[64523] | 287 | - Copy the ``testbox-maintenance.sh`` file found in the same directory as this
|
---|
| 288 | document to ``/root/scripts/`` (need to create the dir) and make it
|
---|
| 289 | executable.
|
---|
[64498] | 290 |
|
---|
[64523] | 291 | - Create the systemd service file for the maintenance service as
|
---|
| 292 | ``/etc/systemd/system/testbox-maintenance.service`` with the content::
|
---|
[64498] | 293 |
|
---|
[64523] | 294 | [Unit]
|
---|
| 295 | Description=Testbox Maintenance
|
---|
| 296 | After=network.target
|
---|
| 297 | Before=getty@tty1.service
|
---|
[64498] | 298 |
|
---|
[64523] | 299 | [Service]
|
---|
| 300 | Type=oneshot
|
---|
| 301 | RemainAfterExit=True
|
---|
| 302 | ExecStart=/root/scripts/testbox-maintenance.sh
|
---|
| 303 | ExecStartPre=/bin/echo -e \033%G
|
---|
| 304 | ExecReload=/bin/kill -HUP $MAINPID
|
---|
| 305 | WorkingDirectory=/tmp
|
---|
| 306 | Environment=TERM=xterm
|
---|
| 307 | StandardOutput=journal+console
|
---|
| 308 |
|
---|
| 309 | [Install]
|
---|
| 310 | WantedBy=multi-user.target
|
---|
| 311 |
|
---|
| 312 | - Enable our service: ``systemctl enable /etc/systemd/system/testbox-maintenance.service``
|
---|
| 313 |
|
---|
| 314 | - xxxx ... more ???
|
---|
| 315 |
|
---|
| 316 | - Before leaving the chroot, do ``mount /proc /sys /mnt/testbox-*``.
|
---|
| 317 |
|
---|
| 318 |
|
---|
| 319 | - Testing the setup from a VM is kind of useful (if the nfs server can be
|
---|
| 320 | convinced to accept root nfs mounts from non-privileged clinet ports):
|
---|
| 321 |
|
---|
[64498] | 322 | - Create a VM using the 64-bit debian profile. Let's call it "pxe-vm".
|
---|
| 323 | - Mount the TFTP share somewhere, like M: or /mnt/testbox-tftp.
|
---|
| 324 | - Reconfigure the NAT DHCP and TFTP bits::
|
---|
| 325 |
|
---|
| 326 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/AboveDriver NAT
|
---|
| 327 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Action mergeconfig
|
---|
| 328 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/TFTPPrefix M:/
|
---|
| 329 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/BootFile pxelinux.0
|
---|
| 330 |
|
---|
| 331 | - Create the file ``testbox-tftp/pxelinux.cfg/0A00020F`` containing::
|
---|
| 332 |
|
---|
| 333 | PATH bios
|
---|
| 334 | DEFAULT maintenance
|
---|
| 335 | LABEL maintenance
|
---|
| 336 | MENU LABEL Maintenance (NFS)
|
---|
| 337 | KERNEL maintenance-boot/vmlinuz-3.16.0-4-amd64
|
---|
| 338 | APPEND initrd=maintenance-boot/initrd.img-3.16.0-4-amd64 ro ip=dhcp aufs=tmpfs \
|
---|
[64523] | 339 | boot=nfs root=/dev/nfs nfsroot=10.42.1.1:/export/testbox-nfsroot
|
---|
[64498] | 340 | LABEL local-boot
|
---|
| 341 | LOCALBOOT
|
---|
| 342 |
|
---|
| 343 |
|
---|
[64653] | 344 | Troubleshooting
|
---|
| 345 | ===============
|
---|
| 346 |
|
---|
| 347 | ``PXE-E11`` or something like ``No ARP reply``
|
---|
| 348 | You probably got the TFTP and DHCP on different machines. Try move the TFTP
|
---|
| 349 | to the same machine as the DHCP, then the PXE stack won't have to do any
|
---|
| 350 | additional ARP resolving. Google results suggest that a congested network
|
---|
| 351 | could use the ARP reply to get lost. Our suspicion is that it might also be
|
---|
| 352 | related to the PXE stack shipping with the NIC.
|
---|
| 353 |
|
---|
| 354 |
|
---|
| 355 |
|
---|
[64498] | 356 | -----
|
---|
| 357 |
|
---|
| 358 | .. [1] See http://www.syslinux.org/wiki/index.php?title=Category:Modules
|
---|
| 359 | .. [2] See http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration
|
---|
| 360 | .. [3] See https://en.wikipedia.org/wiki/Aufs
|
---|
| 361 | .. [4] See http://shitwefoundout.com/wiki/Diskless_ubuntu
|
---|
| 362 | .. [5] See http://debianaddict.com/2012/06/19/diskless-debian-linux-booting-via-dhcppxenfstftp/
|
---|
| 363 |
|
---|
| 364 |
|
---|
| 365 | -----
|
---|
| 366 |
|
---|
| 367 | :Status: $Id: TestBoxImaging.txt 82972 2020-02-04 11:13:09Z vboxsync $
|
---|
[82972] | 368 | :Copyright: Copyright (C) 2010-2020 Oracle Corporation.
|
---|
[64498] | 369 |
|
---|
| 370 |
|
---|