Dumping SmartOS boot zpool when booting from harddisk
Read first
Before reading this article you would find useful to become familiar with this other two:
Problem to solve
The steps described in previous articles create a SmartOS installation booting from the harddisk instead the regular boot from a CD, a DVD, a USB pendrive or PXE. This is useful when managing remote machines with no physical access. You must be careful, though, since a mistake in the manipulation of the boot zpool can leave the machine offline with a slow and painful recovery procedure.
For my personal needs (servers hosted in a remote datacenter), my approach for boot recovery would be to restart the server with the rescue mode facilities of the datacenter using an Operating system with ZFS support. That would be enough to mount the harddisk and correct any misconfiguration.
This approach is fine 99.9% of the time, but it doesn't work (easily) if there is an issue in early boot stages, like a corruption of GRUB, bad upgrading of GRUB (trying, for instance, to support features available in new ZFS releases) or if you want to upgrade to new Illumos loader, enable ZFS encryption, etc.
We need a fast way to recover the machine if something goes wrong in the early boot stage.
Since in my case the boot ZFS zpool is only 1 GB in size, what about just dumping the harddisk datablocks as a file in another server? With gigabit data speed, you can rewrite 1 GB in ten seconds. Booting in rescue mode and typing the appropriate commands would be the slow part!
Identifying GRUB area and ZFS boot zpool
The following instructions can be done safely in the Solaris global zone while the machine is in production.
-
Let's see the Solaris harddisk partitions:
[root@xXx ~]# fdisk /dev/rdsk/c1t0d0 Total disk size is 60800 cylinders Cylinder size is 96390 (512 byte) blocks Cylinders Partition Status Type Start End Length % ========= ====== ============ ===== === ====== === 1 EFI 0 60799 60800 100 SELECT ONE OF THE FOLLOWING: 1. Create a partition 2. Specify the active partition 3. Delete a partition 4. Change between Solaris and Solaris2 Partition IDs 5. Edit/View extended partitions 6. Exit (update disk configuration and exit) 7. Cancel (exit without updating disk configuration) Enter Selection:
Here we see that the harddisk contains a single EFI partition spanning all of it.
Nota
This machine has two harddisks with the same partitioning. I only show one for brevity.
-
Under Solaris and derivatives, there is a second nested partitioning schema called slices:
[root@xXx ~]# format /dev/rdsk/c1t0d0p0 selecting /dev/rdsk/c1t0d0p0 [disk formatted] /dev/dsk/c1t0d0s0 is part of active ZFS pool arranque. Please see zpool(1M). /dev/dsk/c1t0d0s1 is part of active ZFS pool zones. Please see zpool(1M). FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk fdisk - run the fdisk program repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels inquiry - show vendor, product and revision volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> p PARTITION MENU: 0 - change `0' partition 1 - change `1' partition 2 - change `2' partition 3 - change `3' partition 4 - change `4' partition 5 - change `5' partition 6 - change `6' partition expand - expand label to use whole disk select - select a predefined table modify - modify a predefined partition table name - name the current table print - display the current table label - write partition map and label to the disk !<cmd> - execute <cmd>, then return quit partition> p Current partition table (original): Total disk sectors available: 5860516717 + 16384 (reserved sectors) Part Tag Flag First Sector Size Last Sector 0 usr wm 34 1.00GB 2097185 1 usr wm 2097186 2.73TB 5860516750 2 unassigned wm 0 0 0 3 unassigned wm 0 0 0 4 unassigned wm 0 0 0 5 unassigned wm 0 0 0 6 unassigned wm 0 0 0 8 reserved wm 5860516751 8.00MB 5860533134 partition>
Nota
Notice that we are listing the slices in the first (and only) EFI partition of the harddisk. That is the reason we write /dev/rdsk/c1t0d0p0: the first partition of the disk /dev/rdsk/c1t0d0.
We can see this:
- There are 34 (not really) free sectors at the beginning of the partition. This is reserved space for the boot system. ZFS doesn't need it because it provides its own boot area, but GRUB will be installed there by default. This is legacy, my friends.
- After that, there is a 1 GB ZFS zpool. This is the boot zpool, as described in previous articles. The zpool is ONLY used when booting the system. After that, it is never used again. SmartOS is a hypervisor and the Operating System will be running from RAM. In a regular SmartOS deployment, the Operating System would be loaded from CD, DVD, an USB pendrive or PXE.
- Then we have the main ZFS zpool, known as zones zpool in SmartOS nomenclature. Here is where the configuration and production data live.
- There is a small slice at the end. Forget about this, it is legacy.
-
In order to preserve the boot environment, we need to dump the EFI partition table, the free (not actually free) sectors at the beginning of the partition and the boot zpool slice.
The sectors provided by the command format are offsets inside the partition, but in this case the EFI partition starts at cylinder zero, so the real offsets in the disks are:
-
After booting, the ZFS boot zpool will be idle. Fine. We can be cautious, though, and export the boot zpool before dumping it. The intent is to get a consistent dump:
[root@xXx ~]# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT arranque 1008M 300M 708M - - 22% 29% 1.00x ONLINE - zones 2.73T 1.21T 1.52T - - 56% 44% 1.00x ONLINE - [root@xXx ~]# zpool export arranque [root@xXx ~]# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zones 2.73T 1.21T 1.52T - - 56% 44% 1.00x ONLINE -
-
We dump the data from the global zone:
[root@xXx ~]# cd /zones/z-jcea/ [root@xXx /zones/z-jcea]# dd if=/dev/rdsk/c1t0d0 \ of=SmartOS_boot-20190829.dump \ bs=512 count=2097186
Nota
This dump will take a while because we are reading in 512 bytes chunks. The dump can be 6 times faster noticing that:
512 * 2097186 = 3072 * 349531
Advertencia
If the data you are overwriting don't start at the very beginning of the disk, you would need to use the skip= parameter.
-
Now you can transfer that file to another machine for disaster recovery.
I would calculate its hash and rename the file containing the hash in order to detect corruption in the future, the SmartOS release and the dump date. In this example, the filename would be:
SmartOS_xXx_boot-20190829-770171d16de18c1d17a6570b9842dc600cc5477ebfff1e9da74958e6fbdc1d44-20190918.dump
This machine has two harddisks configured as mirror. The two sides of a ZFS mirror are not bit-by-bit identical. For catastrophic failure you could recover only a side, boot the machine, rebuild the other side of the ZFS mirror and reinstall GRUB on both sides.
Disaster recovery
If everything fails and you need to overwrite the boot system, you would:
-
Boot the machine in rescue mode. The rescue Operating System is not important, use any you feel familiar with.
-
Transfer the old dump to the machine. Usually, the rescue mode will run from RAM and you would need at least 1 GB of extra free RAM to download the recovery dump file. This is not an issue with this machine because it has plenty of memory.
-
Then overwrite the boot area of the disk with the old dump. For instance, if you use Linux, you could type:
root@RESCUE:~# dd if=DUMP_FILE of=/dev/sda
Advertencia
If the data you want to overwrite is not located at the beginning of the harddisk, you will need to use the parameter seek=.
Failing to do that WILL cause corruption and data loss.
BE CAREFUL!
-
Disable rescue mode and be sure your system will boot from the harddisk you have corrected.
-
Reboot and cross your fingers.