[PLUG] smartmontools test accuracy?

Larry Brigman larry.brigman at gmail.com
Fri Apr 30 19:56:23 UTC 2010


On Fri, Apr 30, 2010 at 11:25 AM, Dale Snell <ddsnell at verizon.net> wrote:
> On Thu, 29 Apr 2010 14:00:08 -0700
> Scott Garman <sgarman at zenlinux.com> wrote:
>
>> I have a server with a SATA drive in it which is filling the syslogs
>> with the following error:
>>
>> [455740.180143] sd 2:0:0:0: [sda] Add. Sense: ATA pass through
>> information available
>> [457540.199345] sd 2:0:0:0: [sda] Sense Key : Recovered Error
>> [current] [descriptor]
>> [457540.199360] Descriptor sense data with sense descriptors (in hex):
>> [457540.199367]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00
>> 00 [457540.199384]         00 4f 00 c2 40 50
>>
>> smartctl -a shows some enormous values for the following parameters:
>>
>> Raw_Read_Error_Rate: 179400719
>> Seek_Error_Rate: 7232305
>> Hardware_ECC_Recovered: 179400719
>>
>> However, I have just run the short and long tests from smartmontools
>> and they both passed without errors.
>>
>> So yes, I'm going to replace the drive immediately. But it surprised
>> me that the SMART tests would pass. I'm curious if anyone else has
>> run into this, and why it is the case?
>>
>> Scott
>
> Is that a Seagate drive, by any chance?  I have a Seagate FreeAgent Go
> USB drive that has similar behavior.  Nothing shows up
> in /var/log/messages, but both  smartctl -a  and Palimpsest show huge
> numbers for raw read error, seek, and hardware ECC correction rates.
> Yet both programs give the drive a healthy status.  For the record,
> here's some of the output of  smartctl -a :
>
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-redhat-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Momentus 5400.5 series
> Device Model:     ST9320320AS
> Serial Number:    5SX43J0Z
> Firmware Version: BS04
> User Capacity:    320,072,933,376 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Fri Apr 30 10:44:03 2010 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> .
> .
> .
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   114   100   006    Pre-fail  Always       -       81947045
>  7 Seek_Error_Rate         0x000f   062   060   030    Pre-fail  Always       -       1636116
> 195 Hardware_ECC_Recovered  0x001a   054   048   000    Old_age   Always       -       81947045
>
> SMART Error Log Version: 1
> No Errors Logged
>
> (Sorry for the long lines, that's what smartctl outputs.)
>
> Since the drive is backed up, and not really vital (it's got my music
> collection), I'm just going to keep using it 'til it dies.  I am
> puzzled as to why it isn't flagged as dying, though.
>

The raw value is just a count of the times things needed to be
recovered.  It is of not processed.
The more you use a drive the larger these numbers will get as there
are always recoverable errors.

On each of the Attributes if it is Pre-fail and the Value item gets to
the thresh (it's count down) then
the drive is a smart failure.  If it get to this value in the warranty
period then you can get it replaced.

Smart is designed  to provide predicative failure such that you can
still recover your data before
it totally fails.  It mostly works for things that degrade like
head/disk interfaces.   It doesn't do anything
for catastrophic failures that don't show degrading trends in
performance or that drop to the floor.



More information about the PLUG mailing list