Background

This is my personal checklist for when I am setting up a new ezjail host. I like my jail hosts configured in a very specific way. There is a good chance that what is right for me is not right for you. For example, my filesystem layout is basically a zpool on top of a geli device created on a zvol exported from another zpool. If you decide to go with this layout you shouldn't expect blazing filesystem performance. I don't need filesystem blazing performance, but I do need encryption, and I do need ZFS, so this is just right for me :).

Also note that I talk a lot about the German hosting provider Hetzner, if you are using another provider or you are doing this at home, just ignore the Hetzner specific stuff. Much of the content here can be used with little or no changes outside Hetzner.

Installation

OS install with mfsbsd

After receiving the server from Hetzner I boot it using the rescue system which puts me at an mfsbsd prompt. I then edit the zfsinstall script /root/bin/zfsinstall and add "usr" to FS_LIST near the top of the script. I do this because I like to have /usr as a seperate ZFS dataset.

I then run the zfsinstall script like below. I am going to export the majority of the available diskspace as a ZVOL which will be used for a GELI device with another zfs pool on top. This pool will house the actual jails and data.

Note that the disks are new-ish (Power_On_Hours is 73 on both drives according to smartctl, which the mfsbsd author has been clever enough to include in mfsbsd) but I still found an MBR partition that needed to be deleted first. This can be done with the destroygeom command like shown below:

[root@rescue ~]# zfsinstall -d ad4 -d ad6 -r mirror -s 5G -t /nfs/mfsbsd/9.0-amd64-zfs.tar.xz
Error: /dev/ad4 already contains a partition table.

=>        63  5860533105  ad4  MBR  (2.7T)
          63  5860533105       - free -  (2.7T)

You may erase the partition table manually with the destroygeom command
[root@rescue ~]# destroygeom
Usage: /root/bin/destroygeom [-h] -d geom [-d geom ...] [-p zpool ...]
[root@rescue ~]# destroygeom -d ad4 -d ad6
Destroying geom ad4:
Destroying geom ad6:
[root@rescue ~]# zfsinstall -d ad4 -d ad6 -r mirror -s 5G -t /nfs/mfsbsd/9.0-amd64-zfs.tar.xz
Creating GUID partitions on ad4 ... done
Configuring ZFS bootcode on ad4 ... done
=>        34  5860533101  ad4  GPT  (2.7T)
          34        2014       - free -  (1.0M)
        2048         128    1  freebsd-boot  (64K)
        2176    10485760    2  freebsd-swap  (5.0G)
    10487936  5850045199    3  freebsd-zfs  (2.7T)

Creating GUID partitions on ad6 ... done
Configuring ZFS bootcode on ad6 ... done
=>        34  5860533101  ad6  GPT  (2.7T)
          34        2014       - free -  (1.0M)
        2048         128    1  freebsd-boot  (64K)
        2176    10485760    2  freebsd-swap  (5.0G)
    10487936  5850045199    3  freebsd-zfs  (2.7T)

Creating ZFS pool tank on ad4p3 ad6p3 ... done
Creating tank root partition: ... done
Creating tank partitions: var tmp usr ... done
Setting bootfs for tank to tank/root ... done
NAME            USED  AVAIL  REFER  MOUNTPOINT
tank            210K  2.68T    21K  none
tank/root        88K  2.68T    25K  /mnt
tank/root/tmp    21K  2.68T    21K  /mnt/tmp
tank/root/usr    21K  2.68T    21K  /mnt/usr
tank/root/var    21K  2.68T    21K  /mnt/var
Extracting FreeBSD distribution ... done
Writing /boot/loader.conf... done
Writing /etc/fstab...Writing /etc/rc.conf... done
Copying /boot/zfs/zpool.cache ... done

Installation complete.
The system will boot from ZFS with clean install on next reboot

You may type "chroot /mnt" and make any adjustments you need.
For example, change the root password or edit/create /etc/rc.conf for
for system services. 

WARNING - Don't export ZFS pool "tank"!
[root@rescue]#

Finally to fix a small shortcoming in the zfsinstall script. It only enables swap on the first drive even though it creates swap partitions on all drives. I run the commands below to find the gptid of the swap partition not yet swapon'd. I run swapon on the gptid and check swapinfo output again to verify that it worked:

[root@rescue]# swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/gptid/e0f61e4f-a59e-11e1-9   5242880        0  5242880     0%
[root@rescue]# glabel status ada0p2 | grep gptid
gptid/e0f61e4f-a59e-11e1-93ed-e840f2090be0     N/A  ada0p2
[root@rescue]# glabel status ada1p2 | grep gptid
gptid/e20f4cc8-a59e-11e1-93ed-e840f2090be0     N/A  ada1p2
[root@rescue]# swapon /dev/gptid/e20f4cc8-a59e-11e1-93ed-e840f2090be0
[root@rescue]# swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/gptid/e0f61e4f-a59e-11e1-9   5242880        0  5242880     0%
/dev/gptid/e20f4cc8-a59e-11e1-9   5242880        0  5242880     0%
Total            10485760        0 10485760     0%
[root@rescue]#

Post install configuration (before reboot)

Before rebooting into the installed FreeBSD I need to make certain I can reach the server through SSH after the reboot. This means:

Adding network settings to /etc/rc.conf
Adding sshd_enable="YES" to /etc/rc.conf
Change PermitRootLogin to Yes in /etc/ssh/sshd_config
Finally I set the root password.

All of these steps are essential if I am going to have any chance of logging in after reboot. Most of these changes can be done from the mfsbsd shell but the password change requires chroot into the newly installed environment.

I use the chroot command but start another shell as bash is not installed in /mnt:

[root@rescue ~]# chroot /mnt/ csh
rescue# ee /etc/rc.conf
rescue# ee /etc/ssh/sshd_config
rescue# passwd
New Password:
Retype New Password:
rescue#

So, the network settings are sorted, root password is set, and root is permitted to ssh in. Time to reboot (this is the exciting part).

Creating the encrypted zvol

I export most of the available diskspace as a ZVOL which will be used to house a GELI volume.

[root@ ~]# zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
tank            359M  2.68T    21K  none
tank/root       359M  2.68T  84.8M  /
tank/root/tmp    28K  2.68T    28K  /tmp
tank/root/usr   274M  2.68T   274M  /usr
tank/root/var   412K  2.68T   412K  /var
[root@ ~]# zfs create -V 2640G tank/gelizvol 
[root@ ~]# zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
tank           2.66T  17.0G    21K  none
tank/gelizvol  2.66T  2.68T    16K  -
tank/root       359M  17.0G  84.8M  /
tank/root/tmp    28K  17.0G    28K  /tmp
tank/root/usr   274M  17.0G   274M  /usr
tank/root/var   412K  17.0G   412K  /var
[root@ ~]# ls -l /dev/zvol/tank/gelizvol    
crw-r-----  1 root  operator    0, 124 May 24 13:10 /dev/zvol/tank/gelizvol
[root@ ~]#

Then I create a geli key from /dev/random and initialize the geli provider:

[root@ ~]# dd if=/dev/random of=/root/encrypted.key bs=64 count=1 
1+0 records in
1+0 records out
64 bytes transferred in 0.000031 secs (2064888 bytes/sec)
[root@ ~]# ls -l /root/encrypted.key 
-rw-r--r--  1 root  wheel  64 May 24 13:14 /root/encrypted.key
[root@ ~]# geli init -s 512 -K /root/encrypted.key /dev/zvol/tank/gelizvol 
Enter new passphrase:
Reenter new passphrase: 

Metadata backup can be found in /var/backups/zvol_tank_gelizvol.eli and
can be restored with the following command:

        # geli restore /var/backups/zvol_tank_gelizvol.eli /dev/zvol/tank/gelizvol

[root@ ~]#

Next is to attach the newly created geli provider:


[root@ ~]# geli attach -k /root/encrypted.key /dev/zvol/tank/gelizvol    
Enter passphrase:
[root@ ~]# ls -l /dev/zvol/tank/             
total 0
crw-r-----  1 root  operator    0, 124 May 24 13:20 gelizvol
crw-r-----  1 root  operator    0, 127 May 24 13:22 gelizvol.eli
[root@ ~]#

Now to create the zpool on top of the unlocked geli provider:

[root@ ~]# zpool create cryptopool /dev/zvol/tank/gelizvol.eli 
[root@ ~]# zpool list
NAME         SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
cryptopool  2.56T   108K  2.56T     0%  1.00x  ONLINE  -
tank        2.72T   363M  2.72T     0%  1.00x  ONLINE  -
[root@ ~]# zpool status cryptopool
  pool: cryptopool
 state: ONLINE
 scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        cryptopool                ONLINE       0     0     0
          zvol/tank/gelizvol.eli  ONLINE       0     0     0

errors: No known data errors
[root@ ~]#

The last remaining thing is to create a filesystem in the new zfs pool:

[root@ ~]# zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
cryptopool       91K  2.52T    31K  /cryptopool
tank           2.66T  17.0G    21K  none
tank/gelizvol  2.66T  2.68T  1.16M  -
tank/root       359M  17.0G  84.8M  /
tank/root/tmp    28K  17.0G    28K  /tmp
tank/root/usr   274M  17.0G   274M  /usr
tank/root/var   419K  17.0G   419K  /var
[root@ ~]# zfs set mountpoint=none cryptopool
[root@ ~]# zfs create -o compression=gzip -o mountpoint=/usr/jails cryptopool/jails
[root@ ~]# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
cryptopool         149K  2.52T    31K  none
cryptopool/jails    31K  2.52T    31K  /usr/jails
tank              2.66T  17.0G    21K  none
tank/gelizvol     2.66T  2.68T  1.33M  -
tank/root          359M  17.0G  84.8M  /
tank/root/tmp       28K  17.0G    28K  /tmp
tank/root/usr      274M  17.0G   274M  /usr
[root@ ~]#

The next things are post-install configuration stuff like firewall and so on. The basic install is finished \o/

Upgrade OS (buildworld)

I usually run -STABLE on my hosts, which means I need to build and install a new world and kernel. I also like having rctl available on my jail hosts, so I can limit jail ressources in all kinds of neat ways. Additionally I also need the built world to populate ezjails basejail.

Note: I will need to update the host and the jails many times during the lifespan of this server, which is likely > 2-3 years. As new security problems are found or features are added that I want, I will update host and jails. There is a section about staying up to date later in this page. This section (the one you are reading now) only covers the OS update I run right after installing the server.

Fetching sources

I make a copy of the example config file, and set the servername to cvsup.de.FreeBSD.org using sed:

$ sudo cp /usr/share/examples/cvsup/stable-supfile /etc/
$ sudo sed -i "" "s/=CHANGE_THIS/=cvsup.de/g" /etc/stable-supfile
$ sudo csup -L 2 /etc/stable-supfile

Create kernel config

After the sources finish downloading, I create a new kernel config file /etc/TYKJAIL with the following content:

include GENERIC
ident TYKJAIL

#rctl
options RACCT
options RCTL

#dtrace
options KDTRACE_HOOKS        # all architectures - enable general DTrace hooks
options DDB_CTF              # all architectures - kernel ELF linker loads CTF data
options KDTRACE_FRAME        # amd64 - ensure frames are compiled in
makeoptions DEBUG="-g"       # amd64? - build kernel with gdb(1) debug symbols
makeoptions WITH_CTF=1

I then create a symlink to the kernel config file in /etc/:

$ ln -s /etc/TYKJAIL /usr/src/sys/amd64/conf/
$ ls -l /usr/src/sys/amd64/conf/TYKJAIL 
lrwxr-xr-x  1 root  wheel  9 Jul 22 16:14 /usr/src/sys/amd64/conf/TYKJAIL -> /etc/TYKJAIL

Building world and kernel

Finally I start the build. I use -j12 because I have 12 cores in this system. Check the number of cores in your system with sysctl:

$ sysctl hw.ncpu
hw.ncpu: 12

To build the new system:

$ cd /usr/src/
$ sudo make -j12 buildworld && sudo make -j12 buildkernel KERNCONF=RCTL && sudo make installkernel KERNCONF=RCTL && date

After the build finishes, reboot and run mergemaster, installworld, and mergemaster again:

$ cd /usr/src/
$ sudo mergemaster -pFUi && sudo make installworld && sudo mergemaster -FUi

DO NOT OVERWRITE /etc/group AND /etc/master.passwd AND OTHER CRITICAL FILES!

Reboot after the final mergemaster completes, and boot into the newly built world.

Preparing ezjail

ezjail needs to be installed and a bit of configuration is also needed, in addition to bootstrapping /usr/jails/basejail and /usr/jails/newjail.

Installing ezjail

First I install /usr/ports/sysutils/ezjail using Portmaster:

$ sudo portmaster /usr/ports/sysutils/ezjail

Configuring ezjail

Then I go edit the ezjail config file /usr/local/etc/ezjail.conf and add/change these two lines near the bottom:

ezjail_use_zfs="YES"
ezjail_jailzfs="cryptopool/jails"

This makes ezjail use seperate zfs datasets under cryptopool/jails for the basejail and newjail, as well as for each jail created with the -c zfs option.

Bootstrapping ezjail

Finally I populate basejail and newjail from the world I build earlier:

$ sudo ezjail-admin update -i

The last line of the output is a message saying:

Note: a non-standard /etc/make.conf was copied to the template jail in order to get the ports collection running inside jails.

This is because ezjail defaults to symlinking the ports collection in the same way it symlinks the basejail. I prefer having seperate/individual ports collections in each of my jails though, so I remove the symlink and make.conf from newjail:

$ sudo rm /usr/jails/newjail/etc/make.conf /usr/jails/newjail/usr/ports

ZFS goodness

Note that ezjail has created two new ZFS datasets to hold basejail and newjail:

$ zfs list -r cryptopool/jails
NAME                        USED  AVAIL  REFER  MOUNTPOINT
cryptopool/jails            129M  2.52T    31K  /usr/jails
cryptopool/jails/basejail   128M  2.52T   128M  /usr/jails/basejail
cryptopool/jails/newjail    816K  2.52T   816K  /usr/jails/newjail

Adding -c zfs to the ezjail-admin create command will make ezjail create a seperate zfs dataset for each jail, more on that later.

Configuration

This section outlines what I do to further prepare the machine to be a nice ezjail host.

Firewall

One of the first things I fix is to enable the pf firewall from OpenBSD. I add the following to /etc/rc.conf to enable pf at boot time:

[root@ ~]# grep pf /etc/rc.conf 
pf_enable="YES"
pflog_enable="YES"
[root@ ~]#

I also create a very basic /etc/pf.conf:

[root@ ~]# cat /etc/pf.conf 
### macros
if="em0"
table <portknock> persist

#external addresses
tykv4="a.b.c.d"
tykv6="2002:ab:cd::/48"
table <allowssh> { $tykv4,$tykv6 }

#local addresses
glasv4="w.x.y.z"

### scrub
scrub in on $if all fragment reassemble


################
### filtering
### block everything
block log all

################
### skip loopback interface(s)
set skip on lo0


################
### icmp6                                                                                                        
pass in quick on $if inet6 proto icmp6 all icmp6-type {echoreq,echorep,neighbradv,neighbrsol,routeradv,routersol}


################
### pass outgoing
pass out quick on $if all


################
### pass incoming ssh and icmp
pass in quick on $if proto tcp from { <allowssh> } to ($if) port 22
pass in quick on $if inet proto icmp all icmp-type { 8, 11 }

################
### pass ipv6 fragments (hack to workaround pf not handling ipv6 fragments)
pass in on $if inet6
block in log on $if inet6 proto udp
block in log on $if inet6 proto tcp
block in log on $if inet6 proto icmp6
block in log on $if inet6 proto esp
block in log on $if inet6 proto ipv6

To load pf without rebooting I run the following:

[root@ ~]# kldload pf
[root@ ~]# kldload pflog
[root@ ~]# pfctl -f /etc/pf.conf && sleep 60 && pfctl -d
No ALTQ support in kernel
ALTQ related functions disabled

I get no prompt after this because pf has cut my SSH connection. But I can SSH back in if I did everything right, and if not, I can just wait 60 seconds after which pf will be disabled again. I SSH in and reattach to the screen I am running this in, and press control-c, so the "sleep 60" is interrupted and pf is not disabled. Neat little trick for when you want to avoid locking yourself out :)

Replacing sendmail with Postfix

I always replace Sendmail with Postfix on every server I manage. See Replacing_Sendmail_With_Postfix for more info.

Listening daemons

When you add an IP alias for a jail, any daemons listening on * will also listen on the jails IP, which is not what I want. For example, I want the jails sshd to be able to listen on the jails IP on port 22, instead of the hosts sshd. Check for listening daemons like so:

$ sockstat -l46
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
root     master     1554  12 tcp4   *:25                  *:*
root     master     1554  13 tcp6   *:25                  *:*
root     sshd       948   3  tcp6   *:22                  *:*
root     sshd       948   4  tcp4   *:22                  *:*
root     syslogd    789   6  udp6   *:514                 *:*
root     syslogd    789   7  udp4   *:514                 *:*
[tykling@glas ~]$

This tells me that I need to change Postfix, sshd and syslogd to stop listening on all IP addresses.

Postfix

The defaults in Postfix are really nice on FreeBSD, and most of the time a completely empty config file is fine for a system mailer (sendmail replacement). However, to make Postfix stop listening on port 25 on all IP addresses, I do need one line in /usr/local/etc/postfix/main.cf:

$ cat /usr/local/etc/postfix/main.cf
inet_interfaces=localhost
$

sshd

To make sshd stop listening on all IP addresses I uncomment and edit the ListenAddress line in /etc/ssh/sshd_config:

$ grep ListenAddress /etc/ssh/sshd_config 
ListenAddress x.y.z.226
#ListenAddress ::

(IP address obfuscated..)

syslogd

I don't need my syslogd to listen on the network at all, so I add the following line to /etc/rc.conf:

$ grep syslog /etc/rc.conf 
syslogd_flags="-ss"

Restarting services

Finally I restart Postfix, sshd and syslogd:

$ sudo /etc/rc.d/syslogd restart
Stopping syslogd.
Waiting for PIDS: 789.
Starting syslogd.
$ sudo /etc/rc.d/sshd restart
Stopping sshd.
Waiting for PIDS: 948.
Starting sshd.
$ sudo /usr/local/etc/rc.d/postfix restart
postfix/postfix-script: stopping the Postfix mail system
postfix/postfix-script: starting the Postfix mail system

A check with sockstat reveals that no more services are listening on all IP addresses:

$ sockstat -l46
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
root     master     1823  12 tcp4   127.0.0.1:25          *:*
root     master     1823  13 tcp6   ::1:25                *:*
root     sshd       1617  3  tcp4   x.y.z.226:22         *:*
$

Network configuration

Network configuration is a big part of any jail setup. If I have enough IP addresses (ipv4 and ipv6) I can just add IP aliases as needed. If I only have one or a few v4 IPs I will need to use rfc1918 addresses for the jails. In that case, I create a new loopback interface, lo1 and add the IP aliases there. I then use the pf firewall to redirect incoming traffic to the right jail, depending on the port in use.

IPv4

If rfc1918 jails are needed, I add the following to /etc/rc.conf to create the lo1 interface on boot:

### lo1 interface for ipv4 rfc1918 jails
cloned_interfaces="lo1"

When the lo1 interface is created, or if it isn't needed, I am ready to start adding IP aliases for jails as needed.

IPv6

On the page Hetzner_ipv6 I've explained how to make IPv6 work on a Hetzner server where the supplied IPv6 default gateway is outside the IPv6 subnet assigned.

When basic IPv6 connectivity works, I am ready to start adding IP aliases for jails as needed.

Allow ping from inside jails

I add the following to /etc/sysctl.conf so the jails are allowed to do icmp ping. This enables raw socket access, which can be a security issue if you have untrusted root users in your jails. Use with caution.

#allow ping in jails
security.jail.allow_raw_sockets=1

Tips & tricks

Get jail info out of top

To make top show the jail id of the jail in which the process is running in a column, I need to specify the -j flag to top. Since this is a multi-cpu server I am working on, I also like giving the -P flag to top, to get a seperate line of cpu stats per core. Finally, I like -a to get the full commandline/info of the running processes. I add the following to my .bashrc in my homedir on the jail host:

alias top="nice top -j -P -a"

...this way I don't have to remember passing -j -P -a to top every time. Also, I've been told to run top with nice to limit the cpu used by top itself. I took the advice so the complete alias looks like above.

Staying up-to-date

I update my ezjail hosts and jails to track -STABLE regularly. This section describes the procedure I use.

Updating the jail host

First I update world and kernel of the jail host like I normally would. This is described earlier in this guide, see Ezjail_host#Building_world_and_kernel.

Updating ezjails basejail

To update ezjails basejail located in /usr/jails/basejail, I run the following commands:

ezjail-admin update -i
rm /usr/jails/newjail/etc/make.conf

Running mergemaster in the jails

Finally, to run mergemaster in all jails I use the following small script snippet. Run it in bash as root:

for jail in `jls -n jid`; do JID=`echo $jail | cut -d "=" -f 2`; echo Processing jail id $JID; JPATH=`jls -j $JID -n path`; JPATH=`echo $JPATH | cut -d "=" -f 2`; mount_nullfs /usr/src $JPATH/usr/src; mount_nullfs /usr/obj $JPATH/usr/obj; jexec $JID mergemaster -FUi; umount $JPATH/usr/src; umount $JPATH/usr/obj; done

NOTE THAT THIS ONLY HANDLES RUNNING JAILS. YOU CANNOT RUN MERGEMASTER IN A STOPPED JAIL!

To restart all jails I run:

ezjail-admin restart

Ezjail host