EGOPOLY

Topics include: programming, Apple, Unix, gadgets, large-scale web sites and other nerdy stuff.

Creating a kickstart install server for Fedora Core 4

2006-03-13 16:44:56

We have a lot of servers at our engineering lab at work. It's not a lot by google standards, which is rumored to have at least 100,000 and maybe 1,000,000 servers worldwide. But we have something like 25 machines, and we have no dedicated sysadmins. Several of our programmers and release engineers sort of share the burden of setting up and deploying machines.

Back the the early days of our company, (in 2004), we had about 10 servers, and they were all the same hardware. Exactly the same. So we would build a master disk and clone it with "dd." That made creating machines very fast, but it broke down as we collected more diverse hardware. And creating a master clone was a manual process.

"Kickstart" is a Red Hat feature that allows you to script the install procedure in the Red Hat system installed, anaconda. In theory, you should be able to put a new server on the network, do a "network boot" (PXE boot) to a master install server that had the linux images you care about (Fedora Core, in our case), and then run a scripted install without an administrator baby sitting the process. Then it is only a matter of assigning a real fixed IP address to that machine and putting it on the network.

It sounded really easy, so I figured I would read a man page, set up a bootp or tftp server, write the install script, and be done in no time. It took quite a bit longer than that, and was quite a bit more complicated. Here's what I learned.

Three Things: PXE, pxelinux, Kickstart

The first thing I learned was that there wasn't one thing that was the "kickstart network boot installation server." There are three components: PXE, which is a networking standard that allows a computer on a LAN to get an IP address, find out where some bootable images live, and start a boot. It could be linux, it could be any OS; pxelinux, which is a linux loader that can work within the PXE standard; and kickstart, which is the Red Hat / Fedora thing that enables doing an install from a distribution on a network (via FTP or HTTP).

I tried really hard to see if it was possible to do the last part without the first two, just because I am a lot more comfortable setting up an HTTP server and all that other stuff. But if there is a way to do it (and remain completely independent of having a boot CD-ROM or floppy), then I could not figure it out.

PXE

PXE (Preboot Execution Environment) is a standard driven by Intel and other vendors. You can read all about it on that Wikipedia link. The short of it is that it is tied in to DHCP. So you basically need to configure a DHCP server on your LAN so that machines can connect, get an IP address, and be told where to get a bootable image of some sort. Since there generally already is a DHCP server on one's network, and it generally is some kind of firewall/router/whatever (at least on small networks like ours), one doesn't really want to go mucking with it. The upshot of this is that one really needs to create a separate LAN, on a separate switch/hub that has its own DHCP server.

So I took our "master install" server, which has the FC4 distro on it, and two ethernet devices, and connected its second network port to an empty switch. I stuck a label on it that said "INSTALL NETWORK 192.168.0.0/24" so no one would plug anything else into it. [Our regular LAN at the office is 10.0.0.0/22.] Then I did some reading and searching, and cobbled together this dhcpd.conf, which causes the dhcp server on this master machine to only listen to requests on the install network, and not on the "real" network of the other card. It is very bad if you have two DHCP servers on the same LAN giving out addresses. Very bad.

# DHCP Server Configuration file.
ddns-update-style interim;
ignore client-updates;
local-address 192.168.0.1;

subnet 192.168.0.0 netmask 255.255.255.0 {
    option routers                  192.168.0.1;
    option subnet-mask              255.255.255.0;
    option nis-domain               "example.com";
    option domain-name              "example.com";
    option domain-name-servers      192.168.0.1;
    option time-offset              -18000; # Eastern Standard Time
    option ntp-servers              192.168.0.1;
    option netbios-name-servers     192.168.0.1;
    option broadcast-address        192.168.1.255;
    allow booting;
    allow bootp;
    range dynamic-bootp 192.168.0.128 192.168.0.199;
    default-lease-time 3600;
    max-lease-time 4800;
}

group {
next-server 192.168.0.1;
filename "pxelinux.0";
# only install one machine at a time.
# this is because the kickstart doesn't
# really work with dynamic ip, as far as i can tell
host install1 {
    hardware ethernet 00:11:43:59:88:6C;
    fixed-address 192.168.0.201;
    option host-name "install1";
}
}

The key to this is is the "filename" directive which I highlighted above. This tells the booting machine (with the given MAC address) to ask the TFTP server at 192.168.0.1 for the file called pxelinux.0. This is the bootloader that you can then use to do different things: you could totally boot a Knoppix-style linux at this point. Or you could just run an installer, and that's what I wanted to do.

pxelinux

To get pxelinux going, you need to set up a TFTP server on your master install machine. The root of the tftp server should have an arrangement of files like this:

(screenshot lost to the ages)

This is basically the default arrangement of pxelinux; pxelinux is a part of, or maybe a synonym for, "syslinux" which you can install on your server by doing "yum install syslinux" ; you can then copy most of the above hierarchy from /usr/share/system-config-netboot/ to your tftp root (by default that is /tftproot).

The interesting files in the above are:

  1. fc4: this is nothing more than a copy of the linux kernel from the FC4 install CD. It is found at ./isolinux/vmlinuz.
  2. fc4-initrd: this is a copy of the file ./isolinux/initrd.img from the FC4 install CD.
  3. memtest86: this is another kernel from the FC4 install CD. It's just a diagnostic image that will test RAM on your machine, and not harm anything. It's a good idea to make this the default boot image, when we get to that later. So you don't accidentally do a complete FC4 install on some random machine that accidentally is connected and netbooted to your install network.
  4. pxelinux.cfg/default: this is where most of the magic happens. This file is to declare the boot/install actions you want.

Here is the pxelinux.cfg default file I came up with. It's working for me, but I think there are still some things wrong with it.

# show the boot prompt, so the user can select "ks" or "memtest86"
prompt 1

# make memtest be the default, so people don't erase their OS by accident
default memtest86
# wait a long time before selecting 
timeout 10000

# harmless kernel
label memtest86
    kernel memtest86

# kickstart installer kernel
label ks
    kernel fc4
append text initrd=fc4-initrd ramdisk_size=8192 ip=192.168.0.50 netmask=255.255.255.0 gateway=192.168.0.1 dns=192.168.0.1 ks=http://192.168.0.1/linux/isolinux/ks.cfg ksdevice=eth0

The interesting part of this file is the "ks" section. It's the part that points to the FC4 installer kernel, and the initial ramdisk image. Then it assigns a specific IP address for the installer to use, and specifies a kickstart file. This is where I have run into quite a bit of trouble. Supposedly, you can say "ip=dhcp" (see green highlight). But whenever I do this, the installer can't get it's DHCP address for some reason. It just hangs on the screen for selected IP address, and stays there until I put in a fixed IP. For now, I am living with this sub-optimal config, but it does work, and it is unattended, except when you start the install and when you finish.

Update: It actually turned out to be the managed switch I was using for my install network. There is a funny timing thing related to spanning tree that caused the DHCP requests from the install target to time out.

I just put in a cheap, unmanaged switch instead, and dhcp works fine now!

Kickstart

I had a very hard time finding any good, up-to-date, comprehensive documentation on Kickstart. The best thing I could find were older Red Hat documents, even though the feature is clearly in FC4. From this doc, and a nice article at fedora news(DEAD LINK, sorry), I was able to cobble together this script.

# Install a fresh system rather than upgrade an existing system
install
# Perform the kickstart installation in text mode
text
# Install from an installation tree on a remote server via FTP or HTTP
url --url http://192.168.0.1/linux/
# Sets the language to use during installation
lang en_US
# Sets the language(s) to install on the system
langsupport en_US
# Sets system keyboard type
keyboard us
# auto mouse
mouse
# Sets up the authentication options for the system
authconfig --enableshadow --enablemd5
# Sets the system time zone
timezone America/New_York
# Configures network information for the system
#network --device eth0 --bootproto bootp # doesn't seem to work
#network --device eth0 --bootproto dhcp  # doesn't seem to work
network --device eth0 --bootproto static  --ip=192.168.0.51 --netmask=255.255.255.0 -\
-gateway=192.168.0.1 --nameserver=192.168.0.1
# Sets the system's root password to "whatever"
rootpw whatever
# no firewall
firewall --disabled
# No SE Linux!
selinux --disabled
# Specifies how the GRUB boot loader should be installed
bootloader --location mbr --append quiet
# Removes partitions from the system, prior to creation of new partitions
clearpart --all --initlabel
part swap --recommended
part /boot --fstype ext3 --size 100
part /     --fstype ext3 --size 1024 --grow
# Package Selection
%packages
@ base
# Post-installation Script
%post
rpm --import /usr/share/rhn/RPM-GPG-KEY-fedora
# magic config
wget http://192.168.0.1/kickstart/post_install.sh
sh post_install.sh

The main point of the above is to answer all questions the installer would ask interactively, and install the bare minimum package ("base"). Then, the magic escape sequence is the last two lines of the file: we get a shell script from our master server, which is called post_install.sh. This does anything we want, but mainly:

  1. it installs all individual packages we want, and does it from a local yum server so it is fast.
  2. fetches and installs any extra things we want, like java or our own scripts.
  3. creates basic user accounts and groups we want, and points to our LDAP server.
  4. anything else you can write a script for

That's all there is. I would like to figure out the DHCP thing, so I could boot/install 4 or 5 (or 50) machines in parallel. But for now, this is pretty good.