[PLUG] Unexpected machine reboot
King Beowulf
kingbeowulf at gmail.com
Thu Apr 18 01:33:15 UTC 2013
On 04/17/2013 03:24 PM, Dale Snell wrote:
> My understanding is that lm-sensors is not a kernel module _per
> se_. Rather, it talks to modules already in the kernel in order
> to read the data out of the various hardware info & control chips.
> So it wouldn't show up in an lsmod listing.
>
> Hope this helps.
>
> --Dale
>
Yes, lm_sensors uses libsensors to expose sensor kernel modules to
various apps. Some apps don't use it and read the data directly from
/proc/*
Make sure you have libsensors installed. It drives me nuts that Debian
and clones split packages into <name> and <name>-dev. Idiots.
> sensors-detect runs but hangs the whole machine -- at least the mouse
> and keyboard -- when it tries to read the nVidia temperature.
Yes, his can happen when it probes for incompatible sensors, or chips
that advertise as a sensor but are not. For nvidia, you need to read the
temp with:
nvidia-settings -q GPUCoreTemp -t
NOTE: you do not usually run sensors-detect in a terminal with X active.
Get thee to a proper command line!
>> Is the CPU heatsink loose?
>
> I'll have to check that, but I don't think so.
>
>> When was the last time you reseated it with fresh thermal paste?
>
> Never. I haven't heard of this being recommended/necessary.
>
Until we can read the CPU temp, we can skip this for now. Yes: heat
sinks do come loose OFTEN. I've had to reseat them after cross country
moves. Or when I overclock due to a wicked gaming addiction...
> Okay, I've done some googling, and found a few things. According to
> Gateway's web site[1], the motherboard in your box is an Intel
> D945GPBG1.
That mobo is a 945G chipset with ICH7- very common and very well Linux
supported, iirc. ICH7 is the I/O chip and should be similar to the ICH5
and ICH9 I have floating around (I do mostly AMD style and Slackware...)
"lsmod" should show something similar to:
Module Size Used by
lm85 14453 0
hwmon_vid 2304 1 lm85
i915 372988 1
processor 23020 0
thermal_sys 12122 2 processor,video
hwmon 1033 2 thermal_sys,lm85
i2c_algo_bit 4543 1 i915
i2c_i801 6952 0
i2c_core 16454 6 i2c_i801, i2c_algo_bit, drm, \
drm_kms_helper, i915, lm85
Now, "lm85" was detected by sensors-detect for my thermal chip, and I
added it via /etc/modprobe.d as it is not autodetected. You will have a
different one. "i915" for on-board video since this a small server.
The rest should be similar to yours.
hwmon, processor (or something related to CPU), i2c_core (linked to
i2c_*) are absolutely essential. i2c_801 is for my mobo, there shodl be
something similar for yours.
Now, note also that sensors-detect will try to load some modules, such
as "i2c-dev" and "cpuid" so these must be present in your kernel modules
package. Once everything is loaded, using the default config files, run
"sensors -s" ONCE as root (or *ugh* sudo) and then you can run "sensors"
to display all the goodies.
To summarize your reboot issue:
1. Something fishy with your Ubuntu install - or Canonical buggered your
kernel (no messages or syslog? The Horror!)
2. CPU or chipset overheating
AND..we haven't even gotten to these yet:
3. Bad power supply. lm_sensors will read the voltages.....or sheck with
voltmeter.
4. Bad RAM. One module may be flaky. Clean or swap.
Easy check of RAM: Pull them out, check for discoloration, deformities
on the chips. Dust out the sockets with compressed air. Then clean the
RAM module pins either with a PCB cleaner, or rub gently with a rubber
eraser. I use a Staedtler Mars Plastic eraser brick Ive had for decades.
Firmly but carefully reseat the modules.
If you are local, this might be a task for PLUGs Sunday Q&A if there is
one this week.
Have fun.
Ed
More information about the PLUG
mailing list