[PLUG-TALK] SSD lstat performance questions and seeking hard proof

Mon Nov 28 17:46:22 UTC 2016

Thanks for the response Tomas.

It's good to know of the some real usage scenarios. And to know that
SSD's may not necessarily result in a significant improvement.

Knowing that linode uses internal SSD is promising as a test scenario. 
I use them for a backup name server, and so far they have been pretty
awesome.  Wouldn't take much for me to setup a test web server over
there too.

Thanks again
Richard

On 11/27/2016 8:33 PM, Tom wrote:
> Hi Richard,
>
> Linode instances come with internal SSD storage. Perhaps that would be
> good target for your experiment. Check it out.
>
> I feel that you do have a lot of going on over the NFS and 15VMs
> sharing it. I used similar architecture and I also felt that a web app
> + DB bottleneck was disk bound. I have done a lot of experiments,
> including replacing NFS for internal disks and SAS SSDs.
>
> Unfortunately, in my case it DID NOT lead to significant enough
> application response time improvement. Local disks and especially SSDs
> made the OS's way more responsive, but not the web application. My
> conclusion at that time - Linux cashing can be quite good at speeding
> up network access to a lot of small files with small enough
> throughput. See your RAM disk experiment. Anyway, my root cause was DB
> related. In the end and it was resolved via application architectural
> change influenced by a "real DBA".
>
> Tomas
>
> On Fri, 2016-11-25 at 17:31 -0800, Richard Powell wrote:
>> On 11/25/2016 3:17 PM, Chris Schafer wrote:
>>>
>>> I feel like you aren't completely describing the architecture.
>>>
>>
>> Well, that's true.  I was just trying to provide what I thought was
>> most relevant.  :-)
>>
>>> It seems like there is some virtualization.  A NAS. Networking of
>>> unknown configuration.
>>>
>>
>> Yes.  I'm using VMware.  ESXi 5.5.  The primary drives for the VM's
>> are being served up from NFS shares on a 10 spindle RAID 5 array that
>> has 15k SAS drives.  The shares are served over an internal 1GB
>> network.  There is a 50GB write cache SSD on that network storage
>> device.  But that doesn't help with the read's at all.
>>
>>> 10k is a lot of on board ssd.
>>>
>>
>> Indeed it is.  It's not just the storage though.  That's for a
>> completely new server that includes 256GB RAM and 2, 10 core
>> processors (2.4GHz).  It includes roughly 9TB of usable SSD storage
>> with RAID 6.
>>
>>> Also this seems like you are doing a lot of things on this array.
>>>
>>
>> Mostly just shared hosting.  But spread across multiple VM's.  All
>> VM's have their primary hard drives on that same storage array. 
>> There is approximately 15 VM's being served up from that same storage
>> array.
>>
>>> Give that the mix could have a significant effect.  You could
>>> probably test on AWS instances using different storage types before
>>> jumping in. 
>>>
>>
>> I'm curious.  How could AWS simulate the scenario of having SSD's
>> directly installed on a server running ESXi, and also loading the
>> VM's files from that same SSD storage?  I mean, perhaps I could use
>> an AWS scenario to compare performance to my own.  But that wouldn't
>> necessarily tell me how switching to directly connected SSD's will
>> effect my current situation.
>>
>> Thanks for the response.
>> Richard
>>
>>
>>
>>>
>>> On Nov 25, 2016 3:15 PM, "Richard" <plug at hackhawk.net
>>> <mailto:plug at hackhawk.net>> wrote:
>>>> Hello,
>>>>
>>>> I am seeking advice before moving forward with a potential large
>>>> investment.  I don't want to make such a large purchase unless I'm
>>>> absolutely certain it's going to solve the problem that I perceive
>>>> to be
>>>> my biggest problem right now.  I figured there would be a plethora of
>>>> expertise on this list.  :-)
>>>>
>>>> I'm considering switching from network storage of NFS shares (SAS 15k
>>>> RAID 5, 10 spindles) to solid state drives directly connected to the
>>>> server.  But alas, the SSD's are extremely expensive, and I'm not sure
>>>> how to go about ensuring they're going to improve things for me.  I can
>>>> only surmise that they will.
>>>>
>>>> Here is what I've found by running strace on some of my larger web
>>>> based
>>>> PHP applications.  As one example, I've got one WordPress install that
>>>> opens 1,000+ php files.  The strace is showing 6,000+ lstat operations
>>>> across all of these files, and it is taking roughly 4 seconds to get
>>>> through all of this.  Not being super knowledgeable about interpreting
>>>> the strace logs, I do wonder if the 4 seconds is mostly related to disk
>>>> latency, or if some large percentage of those 4 seconds are also
>>>> attributed to CPU and memory as the files are
>>>> processed/compiled/interpreted.  My monitoring of memory and CPU have
>>>> not revealed anything significant.
>>>>
>>>> I have some suspicion that by switching from the network storage to
>>>> directly attached SSD's, I will reduce my example app's response
>>>> time by
>>>> 2 or more seconds.  And, if this is true, than I would happily spend
>>>> that $10k+ and switch directions in how I've been managing my network.
>>>> However, if the payoff only turns out to be 1 second or less shaved off
>>>> the response time, then it's not really worth the investment to me.
>>>>
>>>> How might someone go about getting hard data on such a thing?  Is there
>>>> such a thing as an open source lab available where someone like me can
>>>> come in and run a real world test that specifically applies to my
>>>> particular situation?  If I were to buy a new car, I'd expect to test
>>>> drive the thing.  Well, can I do the same thing with a $10k+ server
>>>> investment?  Sadly my experience tells me no.  But I figured I'd ask
>>>> others anyway.
>>>>
>>>> One test that surprised me was when I mounted ramdisk's for 4 of the
>>>> most highly accessed folders/files of this web application.  It
>>>> resulted
>>>> on virtually no improvement.  It had me wondering if the lstats are
>>>> still having to access the root partition for their work, and even
>>>> though the file read performance might be improved by switching to a
>>>> ramdisk, perhaps the lstat's are still having to run against the root
>>>> partition, which is on an NFS network share.  Does that make sense to
>>>> anyone here that might be in the know?  Anyway, I need to know if it's
>>>> the processing/compiling that is the bottleneck, or if the lstat's are
>>>> the bottleneck, or some combination of the two.  I don't want to just
>>>> guess about it.
>>>>
>>>> For the record, I know that I can improve this applications performance
>>>> with caching mechanisms.  I've already proven this to be true.  The
>>>> problem is that I'm trying to increase performance across the board for
>>>> everyone on my servers.  I don't want to enforce caching on my
>>>> customers
>>>> as that comes with an entirely different set of problems.
>>>>
>>>> Thanks in advance for any advice.  And...  Happy Thanksgiving and Black
>>>> Friday.
>>>> Richard
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> PLUG-talk mailing list
>>>> PLUG-talk at lists.pdxlinux.org <mailto:PLUG-talk at lists.pdxlinux.org>
>>>> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
>>>> <http://lists.pdxlinux.org/mailman/listinfo/plug-talk>
>>>>
>>>
>>>
>>> _______________________________________________
>>> PLUG-talk mailing list
>>> PLUG-talk at lists.pdxlinux.org <mailto:PLUG-talk at lists.pdxlinux.org>
>>> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
>>
>> _______________________________________________
>> PLUG-talk mailing list
>> PLUG-talk at lists.pdxlinux.org <mailto:PLUG-talk at lists.pdxlinux.org>
>> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
>
>
> _______________________________________________
> PLUG-talk mailing list
> PLUG-talk at lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug-talk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pdxlinux.org/pipermail/plug-talk/attachments/20161128/3056cb14/attachment.html>