8         Cluster Administration

               By  Gary Stiehr

8.1      Management Node

A management node can be used to access the consoles of all of the cluster nodes and to monitor their status.  This is particularly useful (even necessary) in the event that a node loses network connectivity or if a node stops responding altogether.

8.1.1      KVM Switches

Keyboard/Video/Mouse (KVM) switches are one method used to simplify access to the consoles of multiple cluster nodes.  With this method, a keyboard, mouse, and video cable is run from the back of each node and connected to a KVM switch.  Then a keyboard, video, and mouse cable is run from a designated port on the KVM switch to the management node.  A special keyboard sequence is used to obtain an on-screen listing of the available systems on the KVM switch from which the administrator can choose.

Advantages of using a KVM switch are that there are usually no modifications needed to the operating system or BIOS.  There is less of need to understand terminal emulation and no complications with properly viewing long messages or scrolling messages (as sometimes can happen with serial consoles).

Disadvantages of using a KVM switch include increased cable management.  Although some KVM switches use an adapter to combine the keyboard/video/mouse cables into one cable.  Also, KVMs may not support remote access.  Usually this capability is found in “enterprise” level KVM switches, which are more expensive.

8.1.2      Serial Port Concentrators

A serial port concentrator, also known as a serial port multi-plexor, is used as a sort of switch for all of the serial connections that are used (either for an EMP or for serial consoles).  The serial port concentrator will allow you to log into it and access any one of the connected systems.  You will need a serial port concentrator if you are planning on using an EMP or serial console for your systems.  Usually, there is not much involved in setting up a serial port concentrator.  The interface and associated commands will vary from vendor to vendor.  Unfortunately, the serial port concentrators will sometimes be used for other things, such as terminal servers, and so there may be a lot of other functionality that you do not need.  This can sometimes be confusing as you read through the documentation on these products or try to find one to purchase.

8.1.3      Serial Consoles

A serial console provides access to a cluster node’s console by using one of its serial ports.  With this method, a cable is run from a serial port on each node to a serial port concentrator.  A cable is then run from a port on the serial port concentrator to a serial port on your management node.  You also have the option to run a cable from the serial port concentrator to a network switch in order toUnlike using a KVM, you will need to make changes to the configuration of your system to use a serial console.

Advantages of this method are the ease of cable management (just one cable per system).  Also, many port concentrators allow remote access.

Disadvantages of using a serial console include the number of changes required to the operating system and the BIOS.

In order to properly interact with your system via a serial console from boot time to login, you will need to address four distinct stages of the startup:

1.      The system loads the BIOS.

2.      The BIOS transfers control to the boot loader.

3.      The boot loader loads the kernel.

4.      The kernel starts user-space processes that provide a login prompt for connections over the serial ports.

As you start to modify your system’s settings to enable serial console support, it will be very helpful to you if keep your keyboard and monitor attached until after you have completed the entire process.  This way you will be able to see how each component (i.e., the BIOS, boot loader, kernel, and OS) is behaving.  For example, once you enable serial port redirection in your BIOS, your BIOS may allow you to interact with it over the serial port in addition to the attached keyboard/monitor.  However, your boot loader, once you have instructed it to use the serial port, may only allow you to interact with it over the serial port (you may not see anything on an attached monitor).  Knowing these behaviors can help you troubleshoot serial console problems and help you avoid confusion when you see some output on your serial console only and some on both your serial console and an attached monitor.

8.1.3.1     Serial Consoles and the BIOS

In order for a serial connection to effectively replace a directly attached keyboard/monitor, you will need to be able to interact with your cluster node’s BIOS via the serial port.  Unfortunately, many computer systems do not have a feature allowing you to redirect your BIOS interaction through a serial port.  Many “server” computer systems offered by various vendors do indeed provide system BIOSs that support serial consoles; however, this may not be the case.  You will need to check your BIOS’ support for serial consoles.  If your BIOS does not support console redirection, you will not be able to interact with your BIOS without using, for instance, an attached monitor (perhaps via a KVM).

If your BIOS supports it, enable support for console redirection in your BIOS.  The procedure for doing this will vary from BIOS to BIOS but should be just like changing any other BIOS setting.  This means that you will need to find the appropriate menu and then select the entries related to serial port redirection.  If you have a choice, pick the type of terminal emulation that is used (e.g., vt102, ansi, etc.).  This must match the terminal emulation used by your terminal application (e.g., minicom).  The emulation you use does not matter much as long as the sending and receiving ends agree on it.  Each component (including the serial port concentrator) will also need to agree on the speed of the serial connection (e.g., 9600, 115200 baud).  Obviously, with higher speeds you will have a better experience; but the limiting factor is the maximum speed allowed by any of the involved components.

At this point, if you reboot the system and monitor the serial console, you should see the BIOS output and be able to enter your BIOS by hitting the appropriate keys.  Each vendor may supply you with special key combinations to substitute for keys that are not supported by your terminal emulation (e.g., F10, F11, F12 for vt100).  Depending on your BIOS, you may also see output on an attached monitor.  Take note of your BIOS’ behavior for future reference.  After the BIOS is done loading, however, no more output will be sent to your serial console.  When the BIOS transfers control to the boot loader, you will only be able to see/interact with the boot loader on an attached monitor.  The same goes for the kernel and OS.  We will configure these components to use the serial console in the following sections.

8.1.3.2     Serial Consoles and the Boot Loader and Linux Kernel

When the BIOS has finished its tasks, it transfers control of the cluster node to the boot loader.  The purpose of the boot loader is to load a kernel.  You generally see a message from the boot loader (e.g., “lilo:”) that you can use to select the kernel to use.  You need to be able to view these messages (and provide input) via the serial port.  The boot loader can be configured to do this.

Once you select a kernel (or the default kernel is loaded), the kernel will print out a lot of information as it probes and initializes your system.  The Linux kernel allows you to redirect these messages to a serial port.  This is done by passing command line arguments to the kernel.  You can do this by instructing the boot loader to do so.

As a result, to allow interaction with the boot loader and kernel over a serial port, we need to make changes to the boot loader’s configuration file.  We present the changes needed for two popular boot loaders: LILO and GRUB.

8.1.3.2.1   LILO

To redirect LILO’s output to a serial port, you will make changes in the global section of /etc/lilo.conf.  We will also add (or modify) an append statement for the kernel in the global section of /etc/lilo.conf.  If for some reason you do not want all kernels to redirect their output, add an append statement for the appropriate image section(s) instead of in the global section of /etc/lilo.conf.

To interact with LILO via the serial console, you will need to make the following changes to the global section of /etc/lilo.conf:

# Comment out message=/boot/message if /boot/message is not a text file
# message=/boot/message

# Add the line that tells LILO to send its output to / get its input
#     The first parameter is the serial port used,
#     e.g. for ttyS0, enter 0; for ttyS1, enter 1
#     The second parameter is the speed to use (see the lilo.conf(5)
#     man page for other possible parameters).  Use the highest
#     speed supported by your components for best results.
serial=0,9600

To instruct the kernel to send its output to the serial console, add the following line to the global section of /etc/lilo.conf:

# The kernel sends its output to all consoles defined below.
#  However, the last one defined is the one the kernel sets
#  as the default console for stdout (where user programs send
#  their output by default).  So you will want to list the serial
#  console as the last console.  Substitute the correct serial console
#  below (if you use ttyS1, replace all ttyS0 below).
append=”console=tty0 console=ttyS0,9600”

Your changes will not be effective until you run the following command to install the new boot record:

/sbin/lilo

At this point, if you reboot your system and monitor the serial console, you will see the BIOS output (as configured in the last section) followed by the lilo: prompt.  You should be able to press the shift key and select a kernel.  If you can do this, the boot loader is successfully using the serial console.  Once a kernel is selected, you will see the message indicating that the kernel is being loaded.  After this one line is finished, the kernel takes over and all of the kernel’s probe/initialization messages should be displayed on the serial console.  If you see the line stating that the kernel is loading, but then nothing else happens, you have configured the boot loader correctly but there is a problem with the configuration of the kernel’s serial port redirection.  As mentioned earlier, it would be good to have a monitor/keyboard attached during this process.  In this case, you can at least see that the system is still booting (if the messages are appearing on the monitor) even if you do not see anything on the serial console.  In addition, you will need the keyboard/monitor to log onto the system (unless you log in over the network) until you configure your system for login over the serial console (discussed in a following section). 

If you have successfully redirected the output from your kernel, you will see all of its messages over the serial console.  After the kernel finishes all of its tasks, it starts the first user-space process, init.  The init process is responsible for, among other things, starting the processes that provide you with a login prompt.  However, we have not yet taken the steps necessary to provide a login prompt to users connecting via a serial port.  This will be covered in a later section.  Even though you will not be able to log in over the serial console yet, you will be able to see, over the serial console, all of the output from init and all of the other startup processes since the kernel has set the default console to the serial port we specified in the boot loaders configuration file.

8.1.3.2.2   GRUB

To get GRUB to use a serial console, add the following lines to the GRUB configuration file:

# The serial line gives GRUB information about the serial port
#     unit=0 corresponds to ttyS0 and unit=1 to ttyS1
serial –unit=0 –speed=9600

# The terminal line tells GRUB which consoles are available
#     serial is the default (since it is listed first)
#     and console indicates the keyboard/monitor
#     The timeout is how long to wait for input from one of the
#     listed consoles before using the default
terminal –timeout=10 serial console

# Comment out the splashimage line (the graphical interface to GRUB)
# splashimage=<path to image>

To configure your kernel to use a serial console, modify your kernel lines in the GRUB configuration file as follows:

# The kernel sends its output to all consoles defined below.
#  However, the last one defined is the one the kernel sets
#  as the default console for stdout (where user programs send
#  their output by default).  So you will want to list the serial
#  console as the last console.  Substitute the correct serial console
#  below (if you use ttyS1, replace all ttyS0 below).

kernel <existing kernel parameters> console=tty0 console=ttyS0,9600

At this point, if you reboot your system and monitor the serial console, you will see the output from your BIOS (as configured in the last section) and you will receive the prompt from GRUB to select a kernel.  If you are able to do this, you have configured the boot loader correctly.  After GRUB has loaded the kernel, control of the system is handed over to the kernel.  Thus you should start to see all of the probe/initialization messages from the kernel over the serial console.  If you can use the GRUB menu to select a kernel over the serial console, but you do not see the kernel probe/initialization messages over the serial console, you have configured the boot loader correctly but there is a problem with the configuration of the kernel’s serial port redirection.  As mentioned earlier, it would be good to have a monitor/keyboard attached during this process.  In this case, you can at least see that the system is still booting (if the messages are appearing on the monitor) even if you do not see anything on the serial console.  In addition, you will need the keyboard/monitor to log onto the system (unless you log in over the network) until you configure your system for login over the serial console (discussed in a following section).

If you have successfully redirected the output from your kernel, you will see all of its messages over the serial console.  After the kernel finishes all of its tasks, it starts the first user-space process, init.  The init process is responsible for, among other things, starting the processes that provide you with a login prompt.  However, we have not yet taken the steps necessary to provide a login prompt to users connecting via a serial port.  This will be covered in a later section.  Even though you will not be able to log in over the serial console yet, you will be able to see, over the serial console, all of the output from init and all of the other startup processes since the kernel has set the default console to the serial port we specified in the boot loaders configuration file.

8.1.3.3     Serial Consoles and Logging In

When the system starts to boot the OS, the init process is started.  The init process will read through the file /etc/inittab to do a number of essential startup tasks.  One of these tasks is to start “getty” processes [1] to wait for users to log on.  A getty process displays the “login:” prompt where you enter your username.  The getty process, in turn, starts up a login process that prompts you for your password.

In order to log in over a serial console, a getty process needs to be waiting for someone to connect via the serial port.  In Linux, the device names for the serial ports are /dev/ttyS0, /dev/ttyS1, etc.  We can instruct the init process to start a getty process for this purpose by adding a line to the /etc/inittab file as follows:

# Run gettys in standard runlevels

S0:12345:respawn:/sbin/agetty -h ttyS0 9600 vt100
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2

 

The lines starting with “1:” and “2:” above are examples of the getty processes that are used by an attached keyboard/monitor.  We added another line starting with “S0:” to start a getty process to monitor the serial port ttyS0.  S0” is used as the label since we used ttyS0 in the example.  You might want to use “S1” if you use ttyS1.  The field after the label (“12345”) indicates that this getty process should be started in runlevels 2, 3, 4 and 5 as well as runlevel 1 (single-user mode).  respawn” indicates that when the process is killed it should be restarted by the init process.  The final field specifies that we should use the agetty program to monitor for connections.  The agetty program is a getty process that is appropriate for monitoring serial lines.  See the agetty man page for details on available parameters.  As indicated above, you will at least need to specify the serial port to use, the speed at which to communicate, and the terminal emulation to use.  In our example, these parameters are ttyS0, 9600 baud, and vt100, respectively.  Consult the agetty man page for other parameters that might be appropriate for you.  Once you have made this change, restart the init process (type telinit q to do this).

The file /etc/securetty lists all of the consoles from which root can log in.  Thus, now that you have created a getty process to monitor the serial line, you will want to add the name of this serial line to the list in /etc/securetty.  For example, if you started a getty process on ttyS0, you will add ttyS0 on its own line in /etc/securetty.

At this point, if you reboot the system and monitor the serial console, you will see the messages from your BIOS; you will be able to interact with your boot loader; you will see all of the kernel’s probing/initialization messages; you will see all of the output from your startup scripts.  Finally, the getty process you just created will display the “login:” prompt to you and the normal login process will continue.  Both normal users and root should be able to log in via the serial console.  You should now be able to completely control your system over a serial connection from BIOS configuration to login.

8.1.4      Remote Management Port

A remote management port is a special connection on a system that can allow certain services, such as power cycling, to take place remotely.  Many vendors offer proprietary cards to insert into your system to obtain a remote management port.  Some motherboards also provide an Emergency Management Port (EMP) that is usually accessed through one of the serial ports.  The method you use to access these cards will vary by vendor.  With systems that have an EMP, a cable can be run from the appropriate serial port to a serial port concentrator.  Whichever form of remote management port is used, this is an extremely useful feature to have to ensure that a system can be managed remotely.  The serial console can be helpful for all times that do not include the system being locked up.  In this case, the remote access to the serial console will not help.  You will need a way to remotely cycle the power on the system or to run other diagnostics (which may be vendor specific).

The advantages of using a remote management port are clear.  You will have the ability to remotely cycle the power and possible perform other diagnostics on a system regardless of the state of the OS.

If you choose to or need to use a vendor-supplied card, the disadvantage will be the cost of these cards.

8.1.5      SNMP

Simple Network Management Protocol (SNMP) is a very useful tool that is used by many vendors to report various statistics and information about their hardware (and software).  At the management stations, you may find it helpful to use SNMP to collect various vital statistics about your cluster systems.  This will give you a visual indicator of any problems that may be occurring on a particular system in the cluster.  The information you can collect will depend on the information provided by your vendor.  Usually only server products will provide useful information whereas systems designed for home use may not provide any information.  There are other products, such as BigBrother, that can monitor your systems using simple methods, such as pinging your systems, and indicating any systems which may be having problems.

 

8.2 User Accounts

Methods for creating user accounts and propagating them
across the cluster will be discussed. This includes an
introduction to the authentication process (PAM), password
files, and NIS(+). Other methods such as LDAP will only be given
a overview with no implementation details.

8.2.1 Authentication Process
8.2.2 NIS and NIS+
8.2.3 Overview of LDAP

8.3 Software and Configuration File Consistancy

Maintaining copies of software and configuration files
across nodes can consume a considerable amount of time and
effort.

8.3.1 CF Engine

8.4 Node Cloning

Different strategies for cloning the nodes will be
introduced. This includes the kickstart utility from Redhat,
dump utilities, tar, system imager, and ghost. The creation of a
boot disk that has the correct utilities for network building or
restores of compute nodes will be given. Other methodologies
such as bootp and the Intel network card boot utility will be
presented.

8.4.1 Boot Disk Creation
8.4.2 Kickstart
8.4.3 Filesystem dump utility
8.4.4 GNU tar
8.4.5 SystemImage
8.4.6 Ghost

8.5 High Availability

Failover of cluster resources will be presented in this
chapter.

8.5.1 Data and System Backup
8.5.2 Availability of headnode
8.5.3 Compute Node Failure

8.6 Security

For a system that will be connected to the internet it is
very important that certain steps be followed to ensure the
security of the cluster. SSH, tcp_wrappers, inetd, and auditing
of system daemons will be introduced.



[1] There are a number of actual programs that are referred to as “getty” processes, such as getty, agetty, and mingetty.  We will refer to the use of any one of these as the use of a getty process.