[PLUG] Sun

Steve Bonds 1s7k8uhcd001 at sneakemail.com
Tue Aug 13 23:32:23 UTC 2002


On Tue, 13 Aug 2002, Anthony Schlemmer aschlemm at attbi.com wrote:

> When I was working a contract at Boeing, we had our, as of yet unused,
> production server die several times from hardware failures. Generally
> I've been pleased with Sun's reliability and this was the first time I
> had ever seen a hardware failure on a Sun machine.

Let me guess-- ECache parity error?  Me and several of my sysadmin buddies
all had a number of these failures with the Sun 400/450MHz SPARC
processors.  Sun kept coming out to replace them, and the engineers
definately knew that something was wrong, but the problems persisted
without any official comment from Sun.

This was across 6 different companies with systems purchased over about a
2 year timespan (1999-2001).  I personally witnessed the failure of about
10 CPUs with the same problem over those two years, and among my friends
this count goes into the hundreds.

My guess is a major engineering defect, but again, Sun never confessed.  I
do know that their external cache (the Ecache in the above error) is not
ECC protected, so it's quite possible that they just had a nasty run of
chips with higher-than-normal alpha decay in there.

  -- Steve






More information about the PLUG mailing list