[PLUG] tracking/accounting for memory use
Larry Brigman
larry.brigman at gmail.com
Mon Dec 14 17:59:53 UTC 2009
On Fri, Dec 11, 2009 at 4:13 PM, Russell Senior
<russell at personaltelco.net> wrote:
>
> I've got a problem tracking down where memory is disappearing to on an
> embedded linux platform. I know basically about caches and buffers
> and such and have looked at /proc/meminfo and /proc/slabinfo and kind
> of understand about how slabs work. However, I don't know what
> numbers in /proc/meminfo are supposed to add up to what in a way that
> is going to give me clues where the memory is disappearing to.
>
> A little more detail. We've got a bunch of Netgear WGT634U devices
> running a customized OpenWrt scattered around town and we collect data
> from them every 5 minutes or so via SNMP. We have Cacti graphs of
> memory utilization, e.g.:
>
> https://personaltelco.net/graphs/graph.php?action=view&local_graph_id=253&rra_id=all
>
> We are running NoCatAuth (a captive portal system that uses Perl),
> OpenVPN, OLSRd and SNMPd. The WGT634U has 32 meg of RAM. After a
> reset, we usually have about 18 megabytes in free+cache+buffers. Over
> time, that total tends to degrade down to about 14 meg (or less), at
> which time we become more susceptible to running out of memory during
> forking (e.g. the NoCatAuth software forks 10 processes for every
> authorization). Typically, we see the failure in NoCat which causes
> it to die (breaking our node in the process), but sometimes other
> programs die instead. Usually, the system stays running and we get
> alerted and we can log in and fix it, but with 30-50 of these and a
> 1/week failure rate, we have to fix a few every day, which is
> annoying.
>
> When I have looked, it did not *seem* that our userspace programs were
> growing fast enough to account for the degradation in
> free+cache+buffers, but maybe I was looking at the wrong thing.
> Busybox ps is only giving me VSZ and not RSS. Can someone suggest a
> robust way of tracking memory usage? I am particularly interested in
> figuring out and accounting for what userspace is using, and what the
> kernel is using, to see what exactly is growing so that I can make it
> stop!
>
> Grateful for any pointers. Thanks!
I collect info on log running process usage to look for memory leaks
by using /proc/$pid/statm.
Here is a snippet of perl code that expect an array of process names
to look for and store the info in a rrd database.
foreach my $prog (@progs) {
foreach my $prog_pid ( `ps -C $prog -o pid=`) {
if ( $prog_pid =~/(\d+)/ ) {
my $file = "/proc/$1/statm";
open(PROG_MEM, $file) if -r $file or next;
while (<PROG_MEM> ) {
if ( /(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+\d+\s+(\d+)/ ) {
my $rrd="$prog.rrd";
create_rrd($rrd) if not -w $rrd;
RRDs::update("$rrd","N:$1:$2:$3:$4:$5");
my $err = RRDs::error;
die "Error updating rrd file($rrd): $err\n" if $err;
}
}
}
}
}
More information about the PLUG
mailing list