[PLUG] tracking/accounting for memory use

Larry Brigman larry.brigman at gmail.com
Mon Dec 14 17:59:53 UTC 2009


On Fri, Dec 11, 2009 at 4:13 PM, Russell Senior
<russell at personaltelco.net> wrote:
>
> I've got a problem tracking down where memory is disappearing to on an
> embedded linux platform.  I know basically about caches and buffers
> and such and have looked at /proc/meminfo and /proc/slabinfo and kind
> of understand about how slabs work.  However, I don't know what
> numbers in /proc/meminfo are supposed to add up to what in a way that
> is going to give me clues where the memory is disappearing to.
>
> A little more detail.  We've got a bunch of Netgear WGT634U devices
> running a customized OpenWrt scattered around town and we collect data
> from them every 5 minutes or so via SNMP.  We have Cacti graphs of
> memory utilization, e.g.:
>
>  https://personaltelco.net/graphs/graph.php?action=view&local_graph_id=253&rra_id=all
>
> We are running NoCatAuth (a captive portal system that uses Perl),
> OpenVPN, OLSRd and SNMPd.  The WGT634U has 32 meg of RAM.  After a
> reset, we usually have about 18 megabytes in free+cache+buffers.  Over
> time, that total tends to degrade down to about 14 meg (or less), at
> which time we become more susceptible to running out of memory during
> forking (e.g. the NoCatAuth software forks 10 processes for every
> authorization).  Typically, we see the failure in NoCat which causes
> it to die (breaking our node in the process), but sometimes other
> programs die instead.  Usually, the system stays running and we get
> alerted and we can log in and fix it, but with 30-50 of these and a
> 1/week failure rate, we have to fix a few every day, which is
> annoying.
>
> When I have looked, it did not *seem* that our userspace programs were
> growing fast enough to account for the degradation in
> free+cache+buffers, but maybe I was looking at the wrong thing.
> Busybox ps is only giving me VSZ and not RSS.  Can someone suggest a
> robust way of tracking memory usage?  I am particularly interested in
> figuring out and accounting for what userspace is using, and what the
> kernel is using, to see what exactly is growing so that I can make it
> stop!
>
> Grateful for any pointers.  Thanks!

I collect info on log running process usage to look for memory leaks
by using /proc/$pid/statm.

Here is a snippet of perl code that expect an array of process names
to look for and store the info in a rrd database.

foreach my $prog (@progs) {
	foreach my $prog_pid ( `ps -C $prog -o pid=`) {
	   if ( $prog_pid =~/(\d+)/ ) {
		my $file = "/proc/$1/statm";
		open(PROG_MEM, $file) if -r $file or next;
		while (<PROG_MEM> ) {
			if ( /(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+\d+\s+(\d+)/  ) {
			  my $rrd="$prog.rrd";
			  create_rrd($rrd) if not -w $rrd;
			  RRDs::update("$rrd","N:$1:$2:$3:$4:$5");
			  my $err = RRDs::error;
		  	  die "Error updating rrd file($rrd): $err\n" if $err;
			}
		}
	   }
	}
}



More information about the PLUG mailing list