[PLUG-TALK] SSD lstat performance questions and seeking hard proof

Richard plug at hackhawk.net
Fri Nov 25 22:11:59 UTC 2016


Hello,

I am seeking advice before moving forward with a potential large
investment.  I don't want to make such a large purchase unless I'm
absolutely certain it's going to solve the problem that I perceive to be
my biggest problem right now.  I figured there would be a plethora of
expertise on this list.  :-)

I'm considering switching from network storage of NFS shares (SAS 15k
RAID 5, 10 spindles) to solid state drives directly connected to the
server.  But alas, the SSD's are extremely expensive, and I'm not sure
how to go about ensuring they're going to improve things for me.  I can
only surmise that they will.

Here is what I've found by running strace on some of my larger web based
PHP applications.  As one example, I've got one WordPress install that
opens 1,000+ php files.  The strace is showing 6,000+ lstat operations
across all of these files, and it is taking roughly 4 seconds to get
through all of this.  Not being super knowledgeable about interpreting
the strace logs, I do wonder if the 4 seconds is mostly related to disk
latency, or if some large percentage of those 4 seconds are also
attributed to CPU and memory as the files are
processed/compiled/interpreted.  My monitoring of memory and CPU have
not revealed anything significant.

I have some suspicion that by switching from the network storage to
directly attached SSD's, I will reduce my example app's response time by
2 or more seconds.  And, if this is true, than I would happily spend
that $10k+ and switch directions in how I've been managing my network. 
However, if the payoff only turns out to be 1 second or less shaved off
the response time, then it's not really worth the investment to me.

How might someone go about getting hard data on such a thing?  Is there
such a thing as an open source lab available where someone like me can
come in and run a real world test that specifically applies to my
particular situation?  If I were to buy a new car, I'd expect to test
drive the thing.  Well, can I do the same thing with a $10k+ server
investment?  Sadly my experience tells me no.  But I figured I'd ask
others anyway.

One test that surprised me was when I mounted ramdisk's for 4 of the
most highly accessed folders/files of this web application.  It resulted
on virtually no improvement.  It had me wondering if the lstats are
still having to access the root partition for their work, and even
though the file read performance might be improved by switching to a
ramdisk, perhaps the lstat's are still having to run against the root
partition, which is on an NFS network share.  Does that make sense to
anyone here that might be in the know?  Anyway, I need to know if it's
the processing/compiling that is the bottleneck, or if the lstat's are
the bottleneck, or some combination of the two.  I don't want to just
guess about it.

For the record, I know that I can improve this applications performance
with caching mechanisms.  I've already proven this to be true.  The
problem is that I'm trying to increase performance across the board for
everyone on my servers.  I don't want to enforce caching on my customers
as that comes with an entirely different set of problems.

Thanks in advance for any advice.  And...  Happy Thanksgiving and Black
Friday.
Richard






More information about the PLUG-talk mailing list