[PLUG-TALK] SSD lstat performance questions and seeking hard proof

Tue Nov 29 20:14:52 UTC 2016

For anyone that is interested.  Average load times (from Status Cake)
for three configurations.  1 Day averages.

Network Storage (NFS:SAS:10k) : 4.68 seconds
Direct Attached SATA 7.2k     : 3.76 seconds
Linode.com (SSD)              : 2.11 seconds

Now, since the Linode configuration is not on my own equipment, I can't
know for certain how much is attributable to the SSD's, and how much
might be attributable to other areas like network latency or CPU/memory
differences.  But all configurations are pretty similar, with the
exception of the storage types used.  Also, that NFS solution is
competing with nearly 2 dozen other VM's, so perhaps the performance
would be better if it was the only a single VM using that network
storage.  But then, wouldn't it also be most likely that linode.com is
sharing the SSD drives with other VM's?  It's not like they're giving me
dedicated physical SSD's all to myself.

The Linode/SSD solution has consistently loaded in half the time as the
NFS mount points.

So far I'm leaning toward investing in at least some level of SSD's. 
I'm thinking the potential performance gain and stability gain for me
and my customers will be well worth the investment.  I think in
combination with caching, I can get my average load time down below 1
second, at least for the example website that I'm using.

Richard

On 11/25/2016 2:11 PM, Richard wrote:
> Hello,
>
> I am seeking advice before moving forward with a potential large
> investment.  I don't want to make such a large purchase unless I'm
> absolutely certain it's going to solve the problem that I perceive to be
> my biggest problem right now.  I figured there would be a plethora of
> expertise on this list.  :-)
>
> I'm considering switching from network storage of NFS shares (SAS 15k
> RAID 5, 10 spindles) to solid state drives directly connected to the
> server.  But alas, the SSD's are extremely expensive, and I'm not sure
> how to go about ensuring they're going to improve things for me.  I can
> only surmise that they will.
>
> Here is what I've found by running strace on some of my larger web based
> PHP applications.  As one example, I've got one WordPress install that
> opens 1,000+ php files.  The strace is showing 6,000+ lstat operations
> across all of these files, and it is taking roughly 4 seconds to get
> through all of this.  Not being super knowledgeable about interpreting
> the strace logs, I do wonder if the 4 seconds is mostly related to disk
> latency, or if some large percentage of those 4 seconds are also
> attributed to CPU and memory as the files are
> processed/compiled/interpreted.  My monitoring of memory and CPU have
> not revealed anything significant.
>
> I have some suspicion that by switching from the network storage to
> directly attached SSD's, I will reduce my example app's response time by
> 2 or more seconds.  And, if this is true, than I would happily spend
> that $10k+ and switch directions in how I've been managing my network. 
> However, if the payoff only turns out to be 1 second or less shaved off
> the response time, then it's not really worth the investment to me.
>
> How might someone go about getting hard data on such a thing?  Is there
> such a thing as an open source lab available where someone like me can
> come in and run a real world test that specifically applies to my
> particular situation?  If I were to buy a new car, I'd expect to test
> drive the thing.  Well, can I do the same thing with a $10k+ server
> investment?  Sadly my experience tells me no.  But I figured I'd ask
> others anyway.
>
> One test that surprised me was when I mounted ramdisk's for 4 of the
> most highly accessed folders/files of this web application.  It resulted
> on virtually no improvement.  It had me wondering if the lstats are
> still having to access the root partition for their work, and even
> though the file read performance might be improved by switching to a
> ramdisk, perhaps the lstat's are still having to run against the root
> partition, which is on an NFS network share.  Does that make sense to
> anyone here that might be in the know?  Anyway, I need to know if it's
> the processing/compiling that is the bottleneck, or if the lstat's are
> the bottleneck, or some combination of the two.  I don't want to just
> guess about it.
>
> For the record, I know that I can improve this applications performance
> with caching mechanisms.  I've already proven this to be true.  The
> problem is that I'm trying to increase performance across the board for
> everyone on my servers.  I don't want to enforce caching on my customers
> as that comes with an entirely different set of problems.
>
> Thanks in advance for any advice.  And...  Happy Thanksgiving and Black
> Friday.
> Richard
>
>
>
> _______________________________________________
> PLUG-talk mailing list
> PLUG-talk at lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug-talk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pdxlinux.org/pipermail/plug-talk/attachments/20161129/a0f872f6/attachment.html>