no easy answer (was Re: [PLUG] MySQL v. PostgreSQL -- perhaps an easy answer)

Kyle Hayes kyle at silverbeach.net
Thu Jan 15 12:37:02 UTC 2004


This one is getting out of hand.  What was the PLUG non-technical list?

On Thursday 15 January 2004 08:09, Randal L. Schwartz wrote:
> >>>>> "Kyle" == Kyle Hayes <kyle at silverbeach.net> writes:
>
> Kyle> We repeatedly evaluated Postgres and stuck with MySQL because we
> Kyle> kept finding that Postgres docs said that until the DB was down,
> Kyle> the data on disk wasn't coherent :-( If this has changed, I'd
> Kyle> love to know about it.  There appeared to be nothing like FLUSH
> Kyle> TABLES that actually gave us a coherent disk image of the data.
>
> You aren't supposed to touch the disks while the system is up.
> Instead, perform a dump from the client interface, which does it
> within a transaction so the data is consistent.

Several of the Oracle backup tools do not go through Oracle to dump data, but 
directly hit the raw partitions while Oracle is running.  They do talk to 
Oracle sufficiently to do some locking.  I'm not familiar enough with Sybase/
DB2 etc. to comment on those.  The argument sounds suspiciously like "that's 
not a feature that databases should have, so you don't need it."  I'm not 
supposed to read the disks?  Why not?  I can in Oracle.  I can in MySQL.

We had several conversations directly with the LVM folks at Sistina (now part 
of Red Hat) and did our own tests because we were worried about LVM's 
stability.  For the 2.4 kernels, LVM has worked flawlessly for years for what 
we were doing.  For 2.6, I'd probably use EVMS on top of device mapper.  But, 
it will be a while before I move any servers to 2.6.   It needs time to 
mature, though it seems better than 2.4 was initially!

In terms of stability and other problems we found that we had the following 
ordered list of problems:

1) human error (admins doing stupid human tricks)
2) human error (coders doing stupid human tricks)
3) human error (users doing things that triggered point 2)
4) kernel bugs (subtle and rare disk corruption is a really ugly thing)
5) glibc bugs (thread bugs and various leaks)
6) network hardware problems (failed switches, NICs and the like)
7) disk hardware problems
8) SCSI controller hardware problems
9) Colo problems (the idiots had multiple 4 hour power outages and our UPSes 
eventually failed, the colo is gone now)
10) MySQL problems (the ELT and mutex problems)

99% of our problems were caused by human error.  The data corruption thing had 
us stumped for a long time.  We had to keep failing over to the slave 
machines because the master would eventually corrupt the database and we 
couldn't figure out why the master would corrupt but not the slave.  Finally 
we tracked it down to a kernel bug that would eventually write bad data to a 
disk under very high loads.  Once we updated the kernel, the problem went 
away.

> Kyle> MySQL introduced subselects in version 4.1 which is now in beta
> Kyle> I think.  Version 5.0 was just released in an alpha form and it
> Kyle> has stored procedures.  The MySQL guys have a really good track
> Kyle> record when it comes to versions being stable when they claim
> Kyle> they are.  In six months or a year when 5.0 is stable, I really
> Kyle> wouldn't see any functional reason to use Postgres for anything.
>
> I would.  I care about my data.  See the reasons that even
> MySQL+InnoDB is not ACID compliant at
> <http://sql-info.de/mysql/gotchas.html>.

Note that some of these have been addressed in 4.1.x.  Since SAP is working 
with MySQL to make sure that SAP works on top of MySQL (by changing MySQL, 
not SAP!), I am sure that anything like this that remains will be gone by the 
time that 5.0 is stable.  After all, SAP works against "real" databases and 
does require foreign key constraints etc. all working correctly.

Your argument is interesting because I said that I thought that by 5.0 stable, 
MySQL would be ready, but you are arguing that since MySQL isn't ready now, 
it isn't going to be ready?  That's an interesting way to convince me :-)  
Arguing about the future is always a mess.  Perhaps all the core MySQL 
developers will suddenly decide to become Hari Krishnas :-)

SAP wants to be able to ship with a DB for free that they don't have to 
maintain.  They've said this and done a deal with MySQL AB to provide said 
DB.  They even gave MySQL AB the source for SAP DB in the bargain.  I have 
worked in Fortune 100 companies and SAP is a big deal.  It puts huge stresses 
on any DB.  If MySQL is ready for SAP by 5.0 (this is what MySQL AB has said 
on the mailing list in the past), then it is going to need all the features 
and behavior that SAP needs and those are legion.  This is why I think that 
by version 5.0 stable MySQL will be able to compete with Postgres on every 
feature point and then some.  Do you disagree with this and if so, why?  I 
don't pretend to have deep inner knowlege of the workings of the Postgres 
community.

Side note:

How come SAP isn't going with Postgres?  I don't mean this as a rhetorical 
question, I'm genuinely curious.  Red Had ships it as their flagship DB.  SAP 
should be a relatively easy port.  It works on DB2, Oracle etc. and those 
have all kinds of ugly incompatibilities.  Why did SAP go with MySQL when it 
was so clearly so far from adequate?  It seems like it would have been a much 
faster path to throw a few developers at Postgres and get SAP working on 
Postgres in a few months rather than the multi-year path that SAP chose.  
Ever since the MySQL/SAP announcement, I've been wondering what made SAP do 
this.

Does anyone know?  It seems very counter intuitive.

> Kyle> We needed things that Postgres could definitely not do at the
> Kyle> time (and it appears may still not be able to do) thus that was
> Kyle> the deciding factor for us.
>
> Like?

Fast backups and snapshots, a DB that could stay up for months at a time (this 
started before Postgres 7.0 was debugged) etc.  One of the people we had on 
staff had used Postgres for several years whereas the rest of us were just 
learning MySQL, so it wasn't a knowledge problem.  We were leaning strongly 
toward Postgres and kept finding things that did not work for us.  If we were 
to do the evaluation now, it would be a much closer contest.  I'd previously 
used Oracle and was dragged kicking and screaming to use MySQL.  

Another poster mentioned using Python/etc. as scripting languages in the DB. I 
cannot possibly see how this is a selling point to me until that is part of a  
standard DB definition.  Those scripts will work fine with Firebird and SAP 
DB (now MaxSQL), right?  If not, they are just proprietary extensions that I 
cannot use.  I don't use AUTO_INCREMENT fields and TIMESTAMP fields in MySQL 
either for the same reason.  As handy as they are, I don't use ENUM fields.  
I try to keep any proprietary functionality out of my apps where at all 
possible.  I want to be able to switch databases.

Another mention was made of MySQL falling over on large DBs or high volumes.  
That doesn't wash with my personal experiences at all.  Perhaps the apps I've 
done just happened to be a good fit for MySQL? I won't argue that very large 
database would stress MySQL to the breaking point, but I would argue that 
Postgres isn't appropriate for the sizes I think are large.  See below.

I won't claim to have done every kind of application on a database.   I did 
multiple web ecommerce systems, VoIP data collection backends (lots of data 
flowing during a call), live customer call reports, real-time resource 
allocation for calls, daily, weekly, monthly, quarterly and yearly roll-up 
reports (both scheduled and live), ad-hoc web-based query systems, and 
real-time system quality monitoring using MySQL.  I also worked on multiple 
replication and hot-failover systems, again all using MySQL.  The apps were 
written primarily in Perl with a few large chunks in Java.  The apps ranged 
from standalone multithreaded servers to multi-process XML-RPC servers to CGI 
scripts.  Clearly there are all kinds of apps that I haven't worked on with 
MySQL!

We've had multi GB databases (up to about 45GB) without any particular 
problems or slow downs.  We did do tuning of MySQL as we grew the databases 
and found various bottlenecks, but it wasn't much change from the default 
configuration.  We added a few indexes as we added new kinds of queries.  We 
had systems that pumped data into MySQL (i.e. heavily weighted toward 
INSERTs, not SELECTs) with upwards of 300 simultaneous connections on single 
processor Pentium III 500 machines without problems.    We did this for 
years.  The raw speed of INSERTs was never particularly an issue with either 
database.  I do not believe that I've argued anywhere above that MySQL was 
particularly faster than Postgres. There are too many other factors in a full 
system to say categorically that one database is faster than another.

I worked in a large pharmaceutical distribution company that had a pricing 
table that was 250M rows and over 4.5TB of data.  That project used Oracle 
and I would not use MySQL or Postgres for it.  This was a while ago, but the 
maxed out Sun E10Ks were struggling on some of the queries.  Is that large 
data?

I hope I've made it clear why a statement like "MySQL doesn't do well with 
large data or high loads" is not going to be very convincing to me without 
some empirical data to back it up.  I'm not trying to start a "mine is bigger 
than yours" argument here, but I would like to inject my actual experience.  
If anyone has experience that does not agree with mine, I'd honestly like to 
know.  Under what circumstances was the slowdown seen?  What kinds of queries 
were being done?  What was the data size?  Was was the query size?  Was the 
database local or remote?  What kind of OS was used?  What kind of disk 
subsystem was used?  SMP or not?  What was the hardware in the server?

I want to expand my knowlege of DB systems and their quirks.  My experience 
does not agree with the statements that have been made about MySQL and so I'm 
stating my experience.  While I find that web reports can be interesting, 
very few reports detail exact configurations, applications, query mixes etc.  
Both MySQL and Postgres have far too many "ra ra ra" partisans and those 
people just annoy those who are trying to make a realistic decision based on 
realistic data.  My kill file had a fair number of entries just for people on 
the main MySQL mailing list.  You'd think that MySQL was going to save 
mankind according to some of those people :-)

I _have_ seen many apps that do cartesian joins, don't have indexes on fields 
being used in WHERE clauses etc., that run like snot on a glacier in winter.  
But, I think that is simply bad coding.  I don't care what the DB is behind 
the curtain if the code is like that.  I see all too many projects with those 
kinds of flaws.  Do some speed tests with your application and your usage 
patterns.  We did.

This discussion is rapidly turning into a "vi vs. emacs or Perl vs. Python" 
type of thread :-)  There's religion and there's religion.  For what I was 
doing, MySQL made sense and Postgres didn't.  Postgres isn't the best 
solution in all cases.  Neither is MySQL.  The beauty of open source is that 
you can chose.

That said, I think that politics and expediency make more technologies gain 
large marketshare than any technical reason and I feel that my opinion is 
well supported by history.  That's why if MySQL and Postgres were public 
companies, I'd be buying MySQL stock.  I'd also buy Linux stock over *BSD 
stock.  It is not that I think that MySQL and Linux are necessarily better 
technologies.  It is that I think that they are the technologies I'm going to 
be dealing with in the future.  Politics and expediency.

I've been burned too many times rooting for things that are technically better 
and seeing them lose to think that anything will win on technical merits 
alone anymore.  VHS vs. BetaMax anyone?  OS/2 vs. DOS/Windows?  Mac vs. 
Windows? x86 vs. 68k?  IDE vs. SCSI?  There are lots of examples of this :-)

Being burned a lot has made me lose my religion pretty much completely.  I 
gave my Amiga away a long time ago :-)

My opinions shift with the changing projects.  If this discussion had happened 
two weeks ago, I would have plugged Postgres.  Since the MySQL 5.0 
announcement surprized me (I thought it wouldn't be out in alpha until early 
2005), I've changed my mind.

Neither project is standing still and both continue to gain marketshare.  I 
have not seen projected figures for Postgres usage.  MySQL AB's marketing 
department doesn't hesitate to use 4M (don't know if that is sites or seats 
or installs) as the number.  How they've measured that is anyone's guess.  I 
have no idea if it is correct or not.

Does PLUG have a plug-flame list?  If so, this should probably move over 
there :-)

Best,
Kyle

P.S. Randal pointed out SQLLite (two "L"s?).  It is a very nice little project 
and one I hope to add to my tool box in the near future.  I can think of a 
bunch of cases where I wished I had had something like it.





More information about the PLUG mailing list