[PLUG] Unexpected machine reboot

King Beowulf kingbeowulf at gmail.com
Thu Apr 18 01:33:15 UTC 2013


On 04/17/2013 03:24 PM, Dale Snell wrote:

> My understanding is that lm-sensors is not a kernel module _per
> se_.  Rather, it talks to modules already in the kernel in order
> to read the data out of the various hardware info & control chips.
> So it wouldn't show up in an lsmod listing.
> 
> Hope this helps.
> 
> --Dale
> 

Yes, lm_sensors uses libsensors to expose sensor kernel modules to
various apps.  Some apps don't use it and read the data directly from
/proc/*

Make sure you have libsensors installed.  It drives me nuts that Debian
and clones split packages into <name> and <name>-dev. Idiots.

> sensors-detect runs but hangs the whole machine -- at least the mouse 
> and keyboard -- when it tries to read the nVidia temperature.

Yes, his can happen when it probes for incompatible sensors, or chips
that advertise as a sensor but are not. For nvidia, you need to read the
temp with:

nvidia-settings -q GPUCoreTemp -t

NOTE: you do not usually run sensors-detect in a terminal with X active.
Get thee to a proper command line!

>> Is the CPU heatsink loose?
> 
> I'll have to check that, but I don't think so.
> 
>> When was the last time you reseated it with fresh thermal paste?
> 
> Never. I haven't heard of this being recommended/necessary.
>  
Until we can read the CPU temp, we can skip this for now. Yes: heat
sinks do come loose OFTEN. I've had to reseat them after cross country
moves. Or when I overclock due to a wicked gaming addiction...

> Okay, I've done some googling, and found a few things.  According to
> Gateway's web site[1], the motherboard in your box is an Intel
> D945GPBG1. 

That mobo is a 945G chipset with ICH7- very common and very well Linux
supported, iirc. ICH7 is the I/O chip and should be similar to the ICH5
and ICH9 I have floating around (I do mostly AMD style and Slackware...)
"lsmod" should show something similar to:

Module                  Size  Used by
lm85                   14453  0
hwmon_vid               2304  1 lm85
i915                  372988  1
processor              23020  0
thermal_sys            12122  2 processor,video
hwmon                   1033  2 thermal_sys,lm85
i2c_algo_bit            4543  1 i915
i2c_i801                6952  0
i2c_core               16454  6 i2c_i801, i2c_algo_bit, drm, \
drm_kms_helper, i915, lm85

Now, "lm85" was detected by sensors-detect for my thermal chip, and I
added it via /etc/modprobe.d as it is not autodetected. You will have a
different one.  "i915" for on-board video since this a small server.
The rest should be similar to yours.

hwmon, processor (or something related to CPU), i2c_core (linked to
i2c_*) are absolutely essential.  i2c_801 is for my mobo, there shodl be
something similar for yours.

Now, note also that sensors-detect will try to load some modules, such
as "i2c-dev" and "cpuid" so these must be present in your kernel modules
package. Once everything is loaded, using the default config files, run
"sensors -s" ONCE as root (or *ugh* sudo) and then you can run "sensors"
to display all the goodies.

To summarize your reboot issue:

1. Something fishy with your Ubuntu install - or Canonical buggered your
kernel (no messages or syslog? The Horror!)
2. CPU or chipset overheating

AND..we haven't even gotten to these yet:

3. Bad power supply. lm_sensors will read the voltages.....or sheck with
voltmeter.
4. Bad RAM.  One module may be flaky.  Clean or swap.

Easy check of RAM: Pull them out, check for discoloration, deformities
on the chips.  Dust out the sockets with compressed air.  Then clean the
RAM module pins either with a PCB cleaner, or rub gently with a rubber
eraser. I use a Staedtler Mars Plastic eraser brick Ive had for decades.
Firmly but carefully reseat the modules.

If you are local, this might be a task for PLUGs Sunday Q&A if there is
one this week.

Have fun.
Ed






More information about the PLUG mailing list