Finding memory issues in PHP programs - Part 2

22‑02‑2019 Tom den Braber 8 min.

The cause of a memory issue is often hard to find. In the previous post, we looked at two methods for finding the culprits of a memory issue. In case you missed it, check it out!

In this post, we will look at another tool for finding memory leaks: php-memprof, created by Arnaud Le Blanc.

Another profiler... why?

Each profiler has its own characteristics. In the previous post, we saw that the Xdebug profiler generates a 'cumulative memory profile': it shows the sum of all the allocations made by a certain function over the lifetime of the program. This makes the generated profile sometimes hard to interpret. With the profiles generated by Xdebug, you cannot see if the memory is already released at the time the profile is generated.  However, such a situation could very well indicate a memory leak. php-memprof is made to fill in this gap. It provides information which can be used to find whether a function 'leaks' memory.

Setting up php-memprof

The installation of php-memprof can be done using PECL (pecl install memprof), or manually. After downloading the source, make sure you have libJudy installed and run the following commands in the source directory.

phpize
./configure
make
make install

Make sure to load the extension (set extension=memprof.so in your php.ini or run your script with php -dextension=memprof.so my_script.php). For my own analyses, I use a Docker image that has the extension already installed and loaded by default.

Note that running Xdebug and php-memprof alongside will not work. To be able to run php-memprof, Xdebug has to be disabled.

Running example

As our running example, we will use a simple array-based cache implementation. The store function is an excellent example of (valid) code that keeps memory allocated, even after its termination.

<?php 
class ArrayBasedCache {
	private $cache = [];

	public function store(string $key, $data) {
		$this->cache[ $key ] = $data;
	}

	public function has(string $key) {
		return isset($this->cache[ $key ]);
	}

	public function get(string $key) {
		if ($this->has($key) === false) {
			throw new LogicException("Tried to get '$key' from cache, but it does not exist. has(\$key) should be checked first!");
		}
		return $this->cache[ $key ];
	}

	public function clear() {
		$this->cache = [];
	}
}

Using php-memprof

To signal php-memprof that it should start profiling, the memprof_enable() function has to be called. After the call to this function, php-memprof starts tracking memory allocations. This implies that memory allocated before memprof_enable() was called will not be taken into account.

The information that php-memprof gathered can be dumped using one of the following functions:

  • memprof_dump_array() dumps the information in an array format
  • memprof_dump_callgrind(resource $stream) dumps the information to the given $stream in the callgrind format, which can be analysed using KCachegrind or qCachegrind (as shown in the previous post)
  • memprof_dump_pprof(resource $stream) dumps the information to the given $stream in the pprof format

Let's run the following code:

<?php 
$cache = new ArrayBasedCache;

memprof_enable();
printf("A:\n%s\n", json_encode(memprof_dump_array(), JSON_PRETTY_PRINT));

$cache->store("a", 123);

printf("B:\n%s\n", json_encode(memprof_dump_array(), JSON_PRETTY_PRINT));
memprof_disable();

The JSON-encoded output of this script is as follows:

A:

{
    "memory_size": 0,
    "blocks_count": 0,
    "memory_size_inclusive": 0,
    "blocks_count_inclusive": 0,
    "calls": 1,
    "called_functions": []
}

B:

{
    "memory_size": 103,
    "blocks_count": 3,
    "memory_size_inclusive": 479,
    "blocks_count_inclusive": 5,
    "calls": 1,
    "called_functions": {
        "ArrayBasedCache::store": {
            "memory_size": 376,
            "blocks_count": 2,
            "memory_size_inclusive": 376,
            "blocks_count_inclusive": 2,
            "calls": 1,
            "called_functions": []
        }
    }
}

The first thing to note is that the dumped structure is recursive. The top-level fields indicate the data from the main program, that is the code that is executed outside any function.
In the called_functions field, we can see the callees from the overarching context. For each of these called functions, the same fields are present again.
Let's go over all the fields that we see in the output to find out what they mean:

  • memory_size: the number of bytes allocated from the current context;
  • blocks_count: the number of memory blocks allocated from the current context;
  • memory_size_inclusive: the number of bytes allocated by the function itself plus all the memory that was allocated by any of the callees of the current context;
  • blocks_count_inclusive: same as for memory_size_inclusive, but expressed in blocks;
  • calls: the number of times this function was called from the overarching context;
  • called_functions: a dictionary with the keys being function-names and the fields being all the fields that are in this table. This field represents the call-stack.

We can see that the A-dump does not contain information about the creation of the ArrayBasedCache itself: this call happened before the enabling of php-memprof and is therefore not taken into consideration. As the A-dump is generated immediately after enabling php-memprof, we see that the dump has no information about any allocated memory.
The B-dump shows that ArrayBasedCache::store was called once from the 'main' program. We can also see that ArrayBasedCache::store has a memory_size of 376 bytes, eventhough its execution was already terminated by the time the B-dump was generated. These bytes are still in memory because the store method added data to the local $cache field, and did not remove it.

Let's see another example:

<?php 
$cache = new ArrayBasedCache;

memprof_enable();

$cache->store("a", 123);
$cache->clear();

json_encode(memprof_dump_array(), JSON_PRETTY_PRINT);
memprof_disable();

and its output:

{
    "memory_size": 103,
    "blocks_count": 3,
    "memory_size_inclusive": 103,
    "blocks_count_inclusive": 3,
    "calls": 1,
    "called_functions": {
        "ArrayBasedCache::store": {
            "memory_size": 0,
            "blocks_count": 0,
            "memory_size_inclusive": 0,
            "blocks_count_inclusive": 0,
            "calls": 1,
            "called_functions": []
        },
        "ArrayBasedCache::clear": {
            "memory_size": 0,
            "blocks_count": 0,
            "memory_size_inclusive": 0,
            "blocks_count_inclusive": 0,
            "calls": 1,
            "called_functions": []
        }
    }
}

After calling $cache->clear(), all items in the cache are removed. The allocation that was done by $cache->store(...) is thereby deleted, and we can see in the dump that there is indeed no memory allocated by ArrayBasedCache::store at the time the dump is made.

As you can imagine, the array output will become very large for larger programs. For small, isolated scripts the memory_dump_array works fine. For larger programs, I find it way more useful to look at a visualisation of the output of the memory_dump_callgrind function.

Wrapping up

In this post, we looked at another profiler, which approaches memory profiling from a different angle as compared to the Xdebug profiler we looked at last time. This two-part series is meant to help you find the causes of memory issues faster and with a more structured process. Several techniques, like logging memory usage, using profilers, and profile analysis tools have been discussed. Do you miss an important technique that should have been discussed in this series? Let us know!


 

Photo by Daan Mooij on Unsplash

php memory leak memory profiling learning

Deel deze blog