Adding wdt support to Linux-Pebble on Soekris systems. ------------------------------------------------------ Last updated 4/28/2003 Feedback to nospam-pebble@voipexperts.com Why bother? ----------- It's nice to know if the software does kick the bucket, as unlikely as that might be, I have another layer of defense for getting the system back up. And these Soekris boxes tend to end up in remote unpleasant places that no one wants to visit at 3am. Thoughts from someone else on this: From http://www.linuxfocus.org/English/July2002/article239.shtml "What is a watchdog? A watchdog in computer terms is a very reliable hardware which ensures that the computer is always running. You find such devices in the Mars Pathfinder (who wants to send a person to the mars to press the reset button?) or in some extra expensive servers. The idea behind such a watchdog is very simple: The computer has to "say hello" from time to time to the watchdog hardware to let it know that it is still alive. If it fails to do that then it will get a hardware reset. Note that a normal Linux server should be able to run uninterrupted for several month, in average probably 1-2 years without locking up. If you have machine that locks up every week then there is something else wrong and a watchdog is not the solution. You should check for defect RAM (see memtest86.com) overheated CPUs, too long IDE cables ... If Linux is so reliable that it will run for a year without any problems then why do you need a watchdog? Well the answer is simple to: make it even more reliable. There is as well a human problem related to that. A server that made no trouble for a year is basically unknown to the service personal. If it fails then nobody knows where it is? It might as well lock up just before Christmas when everybody is at home. In all such cases a watchdog can be very useful. A watchdog does however not solve all of the problems. It is no protection against defect hardware. If you include a watchdog in your server then you should also ensure that you have well dimensioned (probably not the latest BIOS bugs and chipset bugs, properly cooled hardware)." Steps to add support for ELAN SC520 Watchdog timer. --------------------------------------------------- *** Note if you google for the "Elan SC520 watchdog Linux" you will end up at "http://open.iliad.fr/~clecourt/wdtsc520/" which is another kernel driver for the watchdog timer. I took simpler approach which is to use SC520 wdt driver included in the 2.4.20 kernel. I am not sure if this driver is problematic or not. But it worked in my limited testing. *** Remeber before you start to remount the filesystem as read-write(run /usr/local/sbin/remountrw) 1. Compile the wdt driver found in the standard Linux 2.4.20 kernel. I have the Linux 2.4.20 kernel source on my server system that I use as my general workbench. I found the driver under "Character Devices | Watchdog Cards |AMD "Elan" SC520 Watchdog". After compiling the modules I ended up with file linux- 2.4.20/drivers/char/sc520_wdt.o copy this module to your Soekris system to the /lib/modules/2.4.20-pebble/kernel/drivers/char/sc520_wdt.o dir. If you want my compiled module grab it from http://voipexperts.com/sc520_wdt.o 2. Configure Pebble to load the module. Edit /etc/modules file adding sc520_wdt to the end of file. This will cause the module to be loaded at boot time. Run update-modules This will update module dependencies so the module can be found at boot time. Note! Watchdog is not running yet, only the driver has been installed. 3. Installed the watchdog daemon. A userspace daemon is required as the last building block. The kernel module handles communication directly with the hardware wdt. The driver will remind the hardware wdt that the system is healthy on regular basis. It will NOT activate the hardware wdt until the userspace daemon starts. Communication is handled via the /dev/watchdog device. Once the daemon starts the driver module will expect to hear from the daemon on regular basis or it will stop informing the hardware wdt that all is well. Once the wdt notices the missing heartbeats from the kernel the wdt will reset the system. The watchdog daemon we are going to use is simply called watchdog (and so is the Debian package). apt-get install watchdog Edit the daemon configuration file /etc/watchdog.conf uncomment "watchdog-device = /dev/watchdog" That's the last step. *** Remember remount the file system readonly with (run /usr/local/sbin/remountro) when done updating the system. Now when the system restarts the following messages will appear. sc520_wdt: WDT driver for SC520 initialised. sc520_wdt: Watchdog timer is now enabled. The first message indicates the driver module has loaded. The second messages means the userspace daemon has told driver module to activate the hardware wdt. Now if you system hangs or the watchdog daemon dies it will cause the hardware wdt to reset the system. If both messages do not appear your not done. Read the man on the watchdog daemon it can do all sorts of other nifty things. Test it killing the watchdog daemon, try this from the console: pebble:~# ps -ef|grep watchdog root 126 1 0 07:59 ? 00:00:00 /usr/sbin/watchdog root 239 228 0 08:07 /rw/dev/ttyp0 00:00:00 grep watchdog pebble:~# kill 126 pebble:~# sc520_wdt: device file closed unexpectedly. Will not stop the WDT! POST: 012345d6789bcefgh Pri Mas SanDisk SDCFB-256 LBA 980-16-32 251 Mbyte PXE-M00: BootManage UNDI, PXE-2.0 (build 082) Slot Vend Dev ClassRev Cmd Stat CL LT HT Base1 Base2 Int ------------------------------------------------------------------- 0:00:0 1022 3000 06000000 0006 2280 00 00 00 00000000 00000000 00 0:17:0 104C AC51 06070000 0107 0210 10 3F 82 A0000000 020000A0 10 0:17:1 104C AC51 0607iajklnopq,,,tvwxy Good luck, - Dustin -