Installing/booting SmartOS in/from a harddisk without physical access

Advertencia

Installing SmartOS in your harddisk is a not supported configuration. Be careful and try to understand what are you doing.

SmartOS is a hypervisor derived from Illumos, the open source evolution of OpenSolaris. I have been using Solaris since early 90's and I still consider this Operating System far more "serious" and enterprise ready than Linux.

Most distributions based on Illumos are complete Operating Systems. SmartOS is a different beast. A typical SmartOS deployment consists in a thin hypervisor running multiple Solaris containers. Something similar to Docker but more capable and far more secure.

As a hypervisor, SmartOS boots from a read only media like a USB flash drive, a DVD or PXE. The image is quite small, less than 300 Megabytes. There is no personalization stored in the boot media, the server boots from that media and tries to read the configuration from the local harddisk. That is a quite nice approach: upgrading the hypervisor is trivial and failsafe because you can always boot from the previous version, you can not bork your server with a problematic upgrade. Being 300 Megabytes in size, you can keep quite a few versions of SmartOS around, just in case.

I like the idea a lot, but it usually requires direct access to the physical infraestructure. You need to replace the DVD or you need control over the network in order to run PXE securely. Some hosting providers have the option to plug an USB flash drive as a one-time option, for a price. The problem here is that you have to pay and you will need to pay again if you migrate to a new server in the future. It could require a KVM too (more money) in order to configure the BIOS to boot from a USB flash drive. Also, many cloud providers doesn't have this option. You are restricting the choice of providers if you go thru this path.

Installing SmartOS in a harddisk is something undocumented and unsupported, but there are some nice and useful docs around. The suggested approach have several problems:

  • It requires physical access for the initial install.
  • It requires booting SmartOS natively to install it in the harddisk.
  • The procedure will install SmartOS inside the zones ZFS dataset. This will be problematic if a future ZFS on-disk format upgrade is incompatible with the installed GRUB bootloader. You would be unable to boot if you upgrade the ZFS pool.

This article documents a procedure to overcome those issues.

  1. First step is to boot the remote server in rescue mode. We must determine if the IP address and gateway are hardcoded or they are obtained via DHCP or similar. In my personal situation, the remote server networking params are obtained via DHCP. We write down the MAC address of the network card too.

  2. In a local machine, we create a virtual machine using VirtualBox or a similar product. 2GB or RAM and 5GB of virtual harddisk are enough.

    Nota

    We choose 2GB of RAM because SmartOS will create a swap ZFS dataset of the same size. We are interested in a minimum size virtual harddisk image. We will resize the swap ZFS dataset later.

  3. We boot the virtual machine using the SmartOS DVD image.

  4. Initial configuration:

    After booting, SmartOS tries to locate zones ZFS dataset. There is nothing in the harddisk so SmartOS will start the configuration process.

    • We reply questions as:

      • (admin) IP address (or dhcp): dhcp
      • Enter the default gateway IP [none]: 10.0.2.2
      • Enter the Primary DNS server IP [8.8.8.8]: ENTER
      • Enter the Secondary DNS server IP [8.8.4.4]: 10.0.2.2
      • Default DNS search domain: jcea.es.
      • Enter an NTP server IP address or hostname [0.smartos.pool.ntp.org]: ENTER

      Nota

      The IP address 10.0.2.2 is the VirtualBox virtual gateway, DNS server, etc.

    • We accept the configuration offered for zones ZFS dataset.

    • SmartOS will complete the configuration.

    • Reboot.

    • We select "recovery" mode.

    • Let's destroy zones ZFS dataset:

      # zpool import zones
      # zpool destroy zones
      

    Why are we destroying the configuration just created?. These steps are not pointless, we have now a virtual hardisk EFI label with the right aligment. It is not a waste of time, the process is quite faster that even booting SmartOS.

  5. Real configuration:

    • We reboot again and do a new configuration. Same parameters than previous step.

    • This time we reject the configuration offered for zones ZFS dataset. Let's select manual. SmartOS will lauch a CLI.

      • We format the virtual harddisk creating two partitions. First partition sized 1Gigabyte. Second partition using the rest of the virtual harddisk.

        We pay attention to the EFI label created in the previous step. Notably, the start sector of the first partition and the empty space at the end of the virtual harddisk.

      • We edit file /kernel/drv/sd.conf to instruct SmartOS to consider the virtual harddisk as 4096 bytes sector:

        "", "physical-block-size:4096",
        
      • Enable the configuration change:

        # update_drv -vf sd
        
      • Let's create the ZFS datasets:

        # zpool create arranque c0t0d0s0
        # zpool create zones c0t0d0s1
        # zdb | egrep "name|ashift" # Let's confirm 12 bits
          name: 'arranque'
          hostname: ''
                  ashift: 12
          name: 'zones'
          hostname: ''
                  ashift: 12
        

        Nota

        We want an ashift of 12 to avoid a performance hit when deploying on harddisks with 4096 bytes per sector.

        Nota

        In this context, arranque is the spanish word for boot.

      • Let's build boot details:

        We mount the DVD boot image and copy the boot infrastructure in the harddisk.

        # zfs create arranque/os
        # zfs create arranque/os/20160915
        # mount -F hsfs /dev/dsk/c0t1d0p0 /mnt
        # cp -a /mnt/platform/ /arranque/os/20160915/
        # cp -a /mnt/boot/ /arranque/
        # cd /arranque/boot/grub/
        # mkdir bootsign
        # touch bootsign/pool_arranque
        # installgrub -m stage1 stage2 /dev/rdsk/c0t0d0s0
        

        The critical detail here is the -m parameter in installgrub command. That is what is needed to install GRUB in an EFI partitioned harddisk.

        Now we simply edit /arranque/boot/grub/menu.lst file:

        title SmartOS (20160915)
           findroot(pool_arranque,0,a)
           bootfs arranque/os/20160915
           kernel$ /platform/i86pc/kernel/amd64/unix -B console=${os_console},${os_console}-mode="115200,8,n,1,-",root_shadow='**********',smartos=true
           module /platform/i86pc/amd64/boot_archive
        
    • We finish the interactive CLI session with control+d. Back in the installer, we finish the configuration process.

  6. We reboot the system, ejecting the virtual DVD. SmartOS will boot from the harddisk. The hypervisor is running inside the virtual machine.

  7. We configure the right details from the real server in /usbkey/config. For example, the MAC address, the IP address, DNS resolvers, default gateway, etc.

    We can add also our public SSH key, etc.

    Advertencia

    Doing this, we will "misconfigure" the SmartOS instance for its current VirtualBox virtual machine. The new configuration will just work (luckily) in the real bare server.

    Now we have a SmartOS instance configured for the real bare server.

  8. We upload the SmartOS image to internet:

    • We boot SmartOS from the DVD, rescue mode. We want the virtual harddisk idle, no activity, "exported".

    • Since we are booting in "rescue", we need to configure the network manually. Note that VirtualBox virtual network interface is an Intel e1000:

      # ifconfig e1000g0 plumb
      # ifconfig e1000g0 10.0.2.15 up
      # route add default 10.0.2.2
      # echo "nameserver 10.0.2.2" > /etc/resolv.conf
      
    • Let's copy the virtual harddisk image somewhere in internet:

      # dd if=/dev/dsk/c0t0d0p0 | xz -v | ssh babylon5.jcea.es "dd of=/tmp/SmartOS.dump.xz"
      

      My laptop is eight years old and CPU starved. A faster machine could use a more aggressive compression. I must compress also because my upstream bandwidth is very small. In my case it needs 1:00:49 (little over an hour) to send 5GB compressed to mere 141MB. Good enough. [1]

      Compressión is so good because the virtual harddisk is basically empty. It only has a SmartOS boot image and a tiny hypervisor configuration.

      [1] (1, 2)

      Uploading the image is quite fast. In my case, with very small CPU and tiny bandwidth it takes an hour. Could be quite faster if your resources are beefy enough.

      Given this, I could upload the SmartOS image instance directly to the new server, not needing an intermediation server.

      My resources are constrained and I rather use this intermediate step, just in case. As a bonus, I could do several deployment tries without uploading again from home.

  9. We deploy the SmartOS image instance on the new server:

    • Boot the new server with a "rescue" image. For instance, Linux. Options available depends of your hosting provider. Any UNIX like operating system would be good enough.

    • Download the SmartOS image just uploaded. Since the compressed image is quite small, we can drop it in the /tmp/ directory. [1]

    • Let's overwrite the first harddisk with this image:

      # unxz SmartOS.dump.xz
      # dd if=SmartOS.dump of=/dev/sda bs=65536
      81920+0 records in
      81920+0 records out
      5368709120 bytes (5.4 GB) copied, 35.5407 s, 151 MB/s
      
  10. Reboot. The server will try to boot from the local harddisk.

  11. Try to access via SSH. IT WORKS.

It works at the first try. Amazing!.

Now we have a running SmartOS instance. We can use the hypervisor to create new Solaris containers, even real (and heavy) virtual machines. I am a happy man.

There are some things to improve yet before moving this server to production:

  1. Let's use the entire harddisks and let's do ZFS mirroring:

    This server has two harddisks of 3TB. The current image only uses 5GB in one harddisk, no redundancy.

    Advertencia

    In the following steps, be careful mixing 512 and 4096 bytes sectors. Different tools use different values.

    Be careful, also, with data aligment. You can check the alignment details in the first harddisk to configure the second harddisk.

    • In the second harddisk, create an EFI partition using the complete disk:

      # fdisk /dev/rdsk/c0t1d0
      
    • Format the first couple of Solaris slices. First slice is 1GB in size. The second slice will use the rest of the partition (the entire harddisk).

      # format
      [...]
              partition> **p**
              Current partition table (original):
              Total disk sectors available: 5860516717 + 16384 (reserved sectors)
      
              Part      Tag    Flag     First Sector          Size          Last Sector
                0        usr    wm                34         1.00GB           2097185
                1        usr    wm           2097186         2.73TB           5860516750
                2 unassigned    wm                 0            0                0
                3 unassigned    wm                 0            0                0
                4 unassigned    wm                 0            0                0
                5 unassigned    wm                 0            0                0
                6 unassigned    wm                 0            0                0
                8   reserved    wm        5860516751         8.00MB           5860533134
      
    • Mirroring!:

      # zpool attach arranque c0t0d0s0 c0t1d0s0
      # zpool status arranque
        pool: arranque
       state: ONLINE
        scan: resilvered 270M in 0h0m with 0 errors on Tue Oct 25 14:54:07 2016
       config:
      
              NAME          STATE     READ WRITE CKSUM
              arranque      ONLINE       0     0     0
                mirror-0    ONLINE       0     0     0
                  c0t0d0s0  ONLINE       0     0     0
                  c0t1d0s0  ONLINE       0     0     0
      errors: No known data errors
      
      # zpool attach zones c0t0d0s1 c0t1d0s1
      # zpool status zones
        pool: zones
       state: ONLINE
        scan: resilvered 1.00G in 0h0m with 0 errors on Tue Oct 25 14:55:23 2016
      config:
      
              NAME          STATE     READ WRITE CKSUM
              zones         ONLINE       0     0     0
                mirror-0    ONLINE       0     0     0
                  c0t0d0s1  ONLINE       0     0     0
                  c0t1d0s1  ONLINE       0     0     0
      
      errors: No known data errors
      
    • Now we have a mirror. ZFS dataset arranque is OK, but dataset zones is only 5GB in size, because when we do the mirroring, the size available is the size of the smaller component of the mirror.

      So now we have to break the mirror and resize the Solaris slice corresponding to zones ZFS dataset:

      • We break the zones mirror. Beware, you must split out the first harddisk:

        # zpool detach arranque c0t0d0s0
        # zpool detach zones c0t0d0s1
        
      • Now we must do the harddisk reconfiguration as described before. The fdisk and format step. Be extra careful when you type the harddisk ID. You want to reconfigure the FIRST harddisk.

      • Rebuild the mirror:

        # zpool attach arranque c0t1d0s0 c0t0d0s0
        # zpool attach zones c0t1d0s1 c0t0d0s1
        

        I find quite unsatisfactory that now the first harddisk in the ZFS ZPOOL is harddisk 2, and the second harddisk is harddisk 1. So I wait until the mirror rebuilding is done, split the mirror again and reorder the harddisks.

        This is purely aesthetic. Let's call it beauty and "minimum surprise":

        Nota

        Since the ZPOOLs ZFS are basically empty, rebuilding the mirrors (resilvering, in the ZFS language) is instantaneous. ZFS only synchronizes live data, not the empty space in the ZPOOL. Cool!.

        # zpool detach arranque c0t1d0s0
        # zpool attach arranque c0t0d0s0 c0t1d0s0
        # zpool detach zones c0t1d0s1
        # zpool attach zones c0t0d0s1 c0t1d0s1
        # zpool status
          pool: arranque
         state: ONLINE
          scan: resilvered 270M in 0h0m with 0 errors on Tue Oct 25 15:04:17 2016
        config:
        
                NAME          STATE     READ WRITE CKSUM
                arranque      ONLINE       0     0     0
                  mirror-0    ONLINE       0     0     0
                    c0t0d0s0  ONLINE       0     0     0
                    c0t1d0s0  ONLINE       0     0     0
        
        errors: No known data errors
        
          pool: zones
         state: ONLINE
          scan: resilvered 1.00G in 0h0m with 0 errors on Tue Oct 25 15:05:35 2016
        config:
        
                NAME          STATE     READ WRITE CKSUM
                zones         ONLINE       0     0     0
                  mirror-0    ONLINE       0     0     0
                    c0t0d0s1  ONLINE       0     0     0
                    c0t1d0s1  ONLINE       0     0     0
        
        errors: No known data errors
        
      • The GRUB bootloader is installed already in the fist harddisk. We should install it in the second harddisk too, just in case:

        # cd /arranque/boot/grub/
        # installgrub -m stage1 stage2 /dev/rdsk/c0t0d0s0
        Updating master boot sector destroys existing boot managers (if any).
        continue (y/n)? y
        stage2 written to partition 0, 281 sectors starting at 1024 (abs 1058)
        stage1 written to partition 0 sector 0 (abs 34)
        stage1 written to master boot sector
        
        # installgrub -m stage1 stage2 /dev/rdsk/c0t1d0s0
        Updating master boot sector destroys existing boot managers (if any).
        continue (y/n)? y
        stage2 written to partition 0, 281 sectors starting at 1024 (abs 1058)
        stage1 written to partition 0 sector 0 (abs 34)
        stage1 written to master boot sector
        

        Now we can boot the server with both harddisks.

  2. Reboot. Everything should be nice and clean:

    # zpool list
      NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
      arranque  1008M   270M   738M         -    13%    26%  1.00x  ONLINE  -
      zones     2.73T  1.01G  2.73T         -     0%     0%  1.00x  ONLINE  -
    
  3. Resize the swap ZFS dataset:

    When we created the original SmartOS image inside the VirtualBox virtual machine, we set a 2GB of RAM. By default SmartOS creates a swap ZFS dataset of the same size of physical RAM.

    SmartOS requires a swap ZFS dataset. It is mandatory at boot time. After booting, nevertheless, we can play games with it:

    • We have a swap space of 2GB:

      # swap -l
      swapfile             dev    swaplo   blocks     free
      /dev/zvol/dsk/zones/swap 90,1         8  4192248  4192248
      
    • Let's see what special ZFS properties are configured in the swap ZFS dataset:

      # zfs get all zones/swap|grep -i local
      zones/swap  volsize               2.00G                  local
      zones/swap  refreservation        2.06G                  local
      

      The rest of the properties are default or inherited from parent ZFS dataset.

    • Let's disable the swap:

      # swap -d /dev/zvol/dsk/zones/swap
      
    • Let's create a new ZFS dataset with the desired properties. We don't simply modify the original swap ZFS dataset, currently disabled, because we don't want an unscheduled reboot to render the server unbootable:

      # zfs create -V 24gb zones/swap2
      # zfs get all zones/swap2|grep -i local
      zones/swap2  volsize               24G                    local
      zones/swap2  refreservation        24.8G                  local
      

      We don't need to manually set the refreservation property. Nice.

    • Let's check that everything is OK:

      # swap -a /dev/zvol/dsk/zones/swap2
      # swap -s
      total: 90356k bytes allocated + 27628k reserved = 117984k used, 39833764k available
      # swap -d /dev/zvol/dsk/zones/swap2
      
    • Replace the original swap ZFS dataset:

      # zfs destroy zones/swap
      # zfs rename zones/swap2 zones/swap
      
    • Reboot the machine and check that everything is good:

      # swap -s
      total: 89632k bytes allocated + 27488k reserved = 117120k used, 39843664k available
      

Everything is done.

Catastrophe recovery

One of the interesting characteristics of my current hosting provider is that one of its "rescue" environments is a current release of FreeBSD.

If everything fails and SmartOS is unable to boot someday, we can use that "rescue" mode to access the ZFS ZPOOLs, do modifications and, if necessary, just zfs send them somewhere else. Outside of a total hardware failure, our data is safe.