[PLUG-TALK] SSD lstat performance questions and seeking hard proof
Richard
plug at hackhawk.net
Mon Nov 28 17:46:22 UTC 2016
Thanks for the response Tomas.
It's good to know of the some real usage scenarios. And to know that
SSD's may not necessarily result in a significant improvement.
Knowing that linode uses internal SSD is promising as a test scenario.
I use them for a backup name server, and so far they have been pretty
awesome. Wouldn't take much for me to setup a test web server over
there too.
Thanks again
Richard
On 11/27/2016 8:33 PM, Tom wrote:
> Hi Richard,
>
> Linode instances come with internal SSD storage. Perhaps that would be
> good target for your experiment. Check it out.
>
> I feel that you do have a lot of going on over the NFS and 15VMs
> sharing it. I used similar architecture and I also felt that a web app
> + DB bottleneck was disk bound. I have done a lot of experiments,
> including replacing NFS for internal disks and SAS SSDs.
>
> Unfortunately, in my case it DID NOT lead to significant enough
> application response time improvement. Local disks and especially SSDs
> made the OS's way more responsive, but not the web application. My
> conclusion at that time - Linux cashing can be quite good at speeding
> up network access to a lot of small files with small enough
> throughput. See your RAM disk experiment. Anyway, my root cause was DB
> related. In the end and it was resolved via application architectural
> change influenced by a "real DBA".
>
> Tomas
>
> On Fri, 2016-11-25 at 17:31 -0800, Richard Powell wrote:
>> On 11/25/2016 3:17 PM, Chris Schafer wrote:
>>>
>>> I feel like you aren't completely describing the architecture.
>>>
>>
>> Well, that's true. I was just trying to provide what I thought was
>> most relevant. :-)
>>
>>> It seems like there is some virtualization. A NAS. Networking of
>>> unknown configuration.
>>>
>>
>> Yes. I'm using VMware. ESXi 5.5. The primary drives for the VM's
>> are being served up from NFS shares on a 10 spindle RAID 5 array that
>> has 15k SAS drives. The shares are served over an internal 1GB
>> network. There is a 50GB write cache SSD on that network storage
>> device. But that doesn't help with the read's at all.
>>
>>> 10k is a lot of on board ssd.
>>>
>>
>> Indeed it is. It's not just the storage though. That's for a
>> completely new server that includes 256GB RAM and 2, 10 core
>> processors (2.4GHz). It includes roughly 9TB of usable SSD storage
>> with RAID 6.
>>
>>> Also this seems like you are doing a lot of things on this array.
>>>
>>
>> Mostly just shared hosting. But spread across multiple VM's. All
>> VM's have their primary hard drives on that same storage array.
>> There is approximately 15 VM's being served up from that same storage
>> array.
>>
>>> Give that the mix could have a significant effect. You could
>>> probably test on AWS instances using different storage types before
>>> jumping in.
>>>
>>
>> I'm curious. How could AWS simulate the scenario of having SSD's
>> directly installed on a server running ESXi, and also loading the
>> VM's files from that same SSD storage? I mean, perhaps I could use
>> an AWS scenario to compare performance to my own. But that wouldn't
>> necessarily tell me how switching to directly connected SSD's will
>> effect my current situation.
>>
>> Thanks for the response.
>> Richard
>>
>>
>>
>>>
>>> On Nov 25, 2016 3:15 PM, "Richard" <plug at hackhawk.net
>>> <mailto:plug at hackhawk.net>> wrote:
>>>> Hello,
>>>>
>>>> I am seeking advice before moving forward with a potential large
>>>> investment. I don't want to make such a large purchase unless I'm
>>>> absolutely certain it's going to solve the problem that I perceive
>>>> to be
>>>> my biggest problem right now. I figured there would be a plethora of
>>>> expertise on this list. :-)
>>>>
>>>> I'm considering switching from network storage of NFS shares (SAS 15k
>>>> RAID 5, 10 spindles) to solid state drives directly connected to the
>>>> server. But alas, the SSD's are extremely expensive, and I'm not sure
>>>> how to go about ensuring they're going to improve things for me. I can
>>>> only surmise that they will.
>>>>
>>>> Here is what I've found by running strace on some of my larger web
>>>> based
>>>> PHP applications. As one example, I've got one WordPress install that
>>>> opens 1,000+ php files. The strace is showing 6,000+ lstat operations
>>>> across all of these files, and it is taking roughly 4 seconds to get
>>>> through all of this. Not being super knowledgeable about interpreting
>>>> the strace logs, I do wonder if the 4 seconds is mostly related to disk
>>>> latency, or if some large percentage of those 4 seconds are also
>>>> attributed to CPU and memory as the files are
>>>> processed/compiled/interpreted. My monitoring of memory and CPU have
>>>> not revealed anything significant.
>>>>
>>>> I have some suspicion that by switching from the network storage to
>>>> directly attached SSD's, I will reduce my example app's response
>>>> time by
>>>> 2 or more seconds. And, if this is true, than I would happily spend
>>>> that $10k+ and switch directions in how I've been managing my network.
>>>> However, if the payoff only turns out to be 1 second or less shaved off
>>>> the response time, then it's not really worth the investment to me.
>>>>
>>>> How might someone go about getting hard data on such a thing? Is there
>>>> such a thing as an open source lab available where someone like me can
>>>> come in and run a real world test that specifically applies to my
>>>> particular situation? If I were to buy a new car, I'd expect to test
>>>> drive the thing. Well, can I do the same thing with a $10k+ server
>>>> investment? Sadly my experience tells me no. But I figured I'd ask
>>>> others anyway.
>>>>
>>>> One test that surprised me was when I mounted ramdisk's for 4 of the
>>>> most highly accessed folders/files of this web application. It
>>>> resulted
>>>> on virtually no improvement. It had me wondering if the lstats are
>>>> still having to access the root partition for their work, and even
>>>> though the file read performance might be improved by switching to a
>>>> ramdisk, perhaps the lstat's are still having to run against the root
>>>> partition, which is on an NFS network share. Does that make sense to
>>>> anyone here that might be in the know? Anyway, I need to know if it's
>>>> the processing/compiling that is the bottleneck, or if the lstat's are
>>>> the bottleneck, or some combination of the two. I don't want to just
>>>> guess about it.
>>>>
>>>> For the record, I know that I can improve this applications performance
>>>> with caching mechanisms. I've already proven this to be true. The
>>>> problem is that I'm trying to increase performance across the board for
>>>> everyone on my servers. I don't want to enforce caching on my
>>>> customers
>>>> as that comes with an entirely different set of problems.
>>>>
>>>> Thanks in advance for any advice. And... Happy Thanksgiving and Black
>>>> Friday.
>>>> Richard
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> PLUG-talk mailing list
>>>> PLUG-talk at lists.pdxlinux.org <mailto:PLUG-talk at lists.pdxlinux.org>
>>>> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
>>>> <http://lists.pdxlinux.org/mailman/listinfo/plug-talk>
>>>>
>>>
>>>
>>> _______________________________________________
>>> PLUG-talk mailing list
>>> PLUG-talk at lists.pdxlinux.org <mailto:PLUG-talk at lists.pdxlinux.org>
>>> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
>>
>> _______________________________________________
>> PLUG-talk mailing list
>> PLUG-talk at lists.pdxlinux.org <mailto:PLUG-talk at lists.pdxlinux.org>
>> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
>
>
> _______________________________________________
> PLUG-talk mailing list
> PLUG-talk at lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug-talk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pdxlinux.org/pipermail/plug-talk/attachments/20161128/3056cb14/attachment.html>
More information about the PLUG-talk
mailing list