[PLUG] server crash

Anthony Schlemmer aschlemm at attbi.com
Mon May 6 19:41:32 UTC 2002


Yea the auto shutdown thing needs to be tested very well before turning 
it loose on a critical system. Even with my own little programs I wrote 
I saw a case where once in while the /proc/sys/dev/sensors would report 
a high temperature at one time and then the next time the temperature 
was checked it was OK. 

I had to change my monitor to detect 3 high temperatures before it would 
go off. I'm always a bit suspicious since I have a Asus A7V133 mobo 
with the AS99127F chip. Asus won't release any specs on the thing and 
so it's a "best effort" on these values. I've compared what I get from 
the "sensors" program with what the CMOS setup shows me and the values 
are very similar. The CPU temperature in the CMOS shows about a 2 
degree Celsius difference from what "sensors" reports. That pretty 
close IMHO.

Tony

On Sunday 05 May 2002 19:22 pm, Tyler F. Creelan wrote:
> > Do any of these sensor daemons allow you do set them up so in the
> > case of a high temperature a log entry is made to the syslog and
> > then the system is shutdown to protect against any damage. This
> > would be really
>
> The "sensord" package does some of that:
>
>  "This package contains a daemon that logs hardware health status to
> the system log with optional warnings on potential system problems."
>
> Maybe it has an option which will also shut the system down, or this
> could be configured manually. Sounds like a cool idea; but if there
> was a bug, the system might keep shutting itself down, and this might
> prove difficult to fix. :)
>
>
> Tyler
>
> -------
> Package: sensord
> Priority: extra
> Section: utils
> Installed-Size: 144
> Maintainer: David Z Maze <dmaze at debian.org>
> Architecture: i386
> Source: lm-sensors
> Version: 2.6.3-5
> Replaces: lm-sensors (<< 2.5.4-8)
> Depends: lm-sensors, libc6 (>= 2.2.4-4), libsensors1
> Filename: pool/main/l/lm-sensors/sensord_2.6.3-5_i386.deb
> Size: 30326
> MD5sum: 2cf59d11cdd908741a81947f669b7a9d
> Description: Hardware sensor information logging daemon
>  Lm-sensors is a hardware health monitoring package for Linux. It
> allows you to access information from temperature, voltage, and fan
> speed sensors. It works with most newer systems.
>  .
>  This package contains a daemon that logs hardware health status to
> the system log with optional warnings on potential system problems. .
>  You will need lm-sensors and i2c kernel modules to use this package.
>
> -----
>
> On Sun, 5 May 2002, Anthony Schlemmer wrote:
> > Do any of these sensor daemons allow you do set them up so in the
> > case of a high temperature a log entry is made to the syslog and
> > then the system is shutdown to protect against any damage. This
> > would be really useful in the event of a fan failure and a
> > subsequent overheat condition.
> >
> > Since all of the sensor stuff can be gotten as text data from the
> > "/proc/sys/dev/sensors" directory, I've toyed with both a shell
> > script and C program which monitors the temperature values by
> > polling at a specified interval. I know some of the vendors refuse
> > to release any specs for their monitor hardware and so the
> > lm_sensors team can only do a "best effort" on getting some of
> > these values. I don't like reinventing the wheel and so if there's
> > some sort of monitoring daemon that can handle an automatic
> > shutdown on an fan and/or overheat failure I would be interested.
> >
> > Tony
> >
> > On Sunday 05 May 2002 17:10 pm, Tyler F. Creelan wrote:
> > > On Sun, 5 May 2002, Josh Orchard wrote:
> > > > Does linux have a CPU temp. gague?
> > >
> > > lm-sensors will do this. If lm-sensors is already set up, gkrellm
> > > should detect it and display the information. If you don't use
> > > gkrellm there are other frontends for it, ie:
> > >
> > > $ apt-cache search lm-sensors
> > > i2c-source - sources for drivers for the i2c bus
> > > ksensors - lm-sensors frontend for KDE
> > > libsensors-dev - Lm-sensors development kit
> > > libsensors1 - Library to read temperature/voltage/fan sensors
> > > lm-sensors - Utilities to read temperature/voltage/fan sensors
> > > lm-sensors-source - Kernel drivers to read
> > > temperature/voltage/fan sensors (source)
> > > sensord - Hardware sensor information logging daemon
> > > wmsensors - WindowMaker dock applet for lmsensors
> > >
> > > On Sun, 5 May 2002, Josh Orchard wrote:
> > > > I don't think the computer is being used that hard.  It is
> > > > mostly a webserver and email server for a variety of people. 
> > > > But the CPU usage is very small.  Only durning a few high times
> > > > does it go up. Or when I"m compiling. :-)
> > > >
> > > > But I have now setup up netsaint on my other linux box to
> > > > monitor the server with the problem.  I did this so I could see
> > > > what is happening and if the server goes down.
> > > >
> > > > So, if I don't see anything in the logs I will have a problem
> > > > starting to figure it out?  I has ran fine for over a year.  It
> > > > is a 2.2 kernal and may go to the 2.4 for other reason.  Don't
> > > > see how that would affect it. I have most updates on all my
> > > > programs so feel good there.
> > > >
> > > > Maybe it got too hot.  I don't know.  Does linux have a CPU
> > > > temp. gague?
> > > >
> > > > Josh
> > > >
> > > > > I have a client with a dell server that does the same.  It is
> > > > > because these guys keep the machine running 98% on both procs
> > > > > 24/7.  It has to finally die.  By that point all logging is
> > > > > stopped.  The best I can do is keep top running 24/7 from
> > > > > home and show them the screenshot when it dies.  We are
> > > > > upgrading the box so hopeflly the problem will go away too.  
> > > > > It took about three screenshots of top to finally convince
> > > > > them it was working too hard.   This box is still up too,
> > > > > just no services work including keyboard attached.  I showed
> > > > > them the nice command and it seems to be well for the moment.
> > > > >  Are you working yours too hard too?
> > > > >
> > > > >
> > > > > On Fri, 3 May 2002 23:32:16 -0700 (PDT)
> > > > > "Josh Orchard" <josh at emediatedesigns.com> spewed into the
> > > > > bitstream:
> > > > >
> > > > > ~My server stopped responding a while back and I was locked
> > > > > out of the ~box. All services were gone.  Since I was unable
> > > > > to go to the box I had ~it rebooted and all is fine.
> > > > > ~
> > > > > ~What I would like to do now is find out why.
> > > > > ~
> > > > > ~I have gone through all the log files I thought would tell
> > > > > me something ~and the best I find is that right before it
> > > > > stopped responding someone ~was getting mail.  pop3.
> > > > > ~
> > > > > ~Is there some place other then
> > > > > messses/security/maillog/boot.log ~that I could look that
> > > > > could help me understand why all services ~stopped? I was
> > > > > told the power was on to the box but it wasn't ~responding.
> > > > > ~Thanks.
> > > > > ~
> > > > > ~Josh
> > > > > ~
> > > > > ~P.S.  It is a RH 7.0 Machine if that helps. Kernel
> > > > > 2.2.19-7.0-12.  I ~know there is an update to -16 but that
> > > > > will be this weekend. ~
> > > > > ~
> > > > > ~
> > > > > ~
> > > > > ~
> > > > > ~_______________________________________________
> > > > > ~PLUG mailing list
> > > > > ~PLUG at lists.pdxlinux.org
> > > > > ~http://lists.pdxlinux.org/mailman/listinfo/plug
> > > > > ~
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >       .--.
> > > > >
> > > > >      |o_o |     Michael H. Collins
> > > > >      |
> > > > >      |:_/ |     Admiral, Penguinista Navy
> > > > >
> > > > >     //   \ \    http://www.linuxlink.com
> > > > >    (|     | )   http://kpig.com
> > > > >   /'\_   _/`\   http://kuro5hin.org
> > > > >   \___)=(___/
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > PLUG mailing list
> > > > > PLUG at lists.pdxlinux.org
> > > > > http://lists.pdxlinux.org/mailman/listinfo/plug
> > > >
> > > > _______________________________________________
> > > > PLUG mailing list
> > > > PLUG at lists.pdxlinux.org
> > > > http://lists.pdxlinux.org/mailman/listinfo/plug
> > >
> > > _______________________________________________
> > > PLUG mailing list
> > > PLUG at lists.pdxlinux.org
> > > http://lists.pdxlinux.org/mailman/listinfo/plug
> >
> > --
> > Anthony Schlemmer
> > aschlemm at attbi.com
> >
> > >>>>This machine was last rebooted: 6:12, days users hours ago<<
> >
> > _______________________________________________
> > PLUG mailing list
> > PLUG at lists.pdxlinux.org
> > http://lists.pdxlinux.org/mailman/listinfo/plug
>
> _______________________________________________
> PLUG mailing list
> PLUG at lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug

-- 
Anthony Schlemmer
aschlemm at attbi.com
>>>>This machine was last rebooted: 13:09, days users hours ago<<





More information about the PLUG mailing list