Finding memory issues in PHP programs

11‑10‑2018 Tom den Braber 8 min.

"Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 528384 bytes)."

If this error sounds familiar, this post is for you. The problem with this message is that it does not tell you a lot: it does not tell you where all the memory was allocated. Locating the places where a lot of memory is consumed in large and complex systems is not easy. Luckily, there are some tools available which can help finding the problematic code. In this post, we will cover two methods for finding places in your program where a lot of memory is allocated.

Running example

We will be using the following code as our running example. The purpose of the code is finding Nemo. There are two functions: one that reads a file in which Nemo could be located, and the other which tries to find a line of which the content is equal to 'nemo'. The problem with this snippet is that it sometimes consumes too much memory. Not always, but with certain files, the program crashes.

<?php
function fetch_data_from_file(string $file_path) : iterable {
	$resource = fopen($file_path, 'r');
	$lines = [];
	while (($line = fgets($resource)) !== false) {
		$lines[] = trim($line);
	}
	return $lines;
}

function finding_nemo(string $filepath) : int {
	$lines = fetch_data_from_file($filepath);
	foreach ($lines as $line_number =----> $line) {
		if ($line === "nemo") {
			return $line_number;
		}
	}
	return -1; //nemo not found
}

finding_nemo("the_sea.txt");

All the techniques and tools described are not needed to solve this issue (can you already spot the problem?), but it enables us to see how those tools and techniques work in practice.

memory_get_usage()

PHP has two functions which can tell you something about the memory usage of you program: memory_get_usage and memory_get_peak_usage.
memory_get_usage only gives insight in how much memory is in use at the moment of the function call. memory_get_peak_usage returns the maximum number of bytes allocated by the program until the function call. Both of these functions take one boolean argument: $real_usage. If $real_usage is set to true, memory_get_usage returns the total amount of memory that is actually allocated from the operating system, but some of it might not (yet) be in use by your program. If it is set to false, it returns the number of bytes which PHP has requested (and received) from the operating system, and which is actually in use by the program. The following statement always holds: memory_get_usage(true) >= memory_get_usage(false). Memory is requested in blocks, which are not fully used all of the time.

An advantage of using these functions is that they are really easy to use. One of the possible ways of finding your memory leak is scattering calls to memory_get_usage all over your code, and logging its output. You can then try to find a pattern: where does the memory usage increase?
A drawback of these functions is that their use is limited, as they do not provide insight in which functions or classes are using all that memory.

Let's use these functions to get an idea of where our current problem might reside. In the example below, I use marker characters like 'A', 'B', etc. to be able to track a log entry back to a location in the code. Another option is to include 'magic constants' like __FILE__ and __LINE__ in your log output.

<?php
function fetch_data_from_file(string $file_path) : iterable {
	error_log(sprintf("A: %d bytes used\n", memory_get_usage()));
	/** original code... **/
	error_log(sprintf("B: %d bytes used\n", memory_get_usage());
	return $lines;
}

function finding_nemo(string $filepath) : int {
	error_log(sprintf("C: %d bytes used\n", memory_get_usage()));
	/** original code... **/
	error_log(sprintf("D: %d bytes used\n", memory_get_usage()));
	return -1; //nemo not found
}

finding_nemo("the_sea.txt");

When we run our example now, we have the following log output:

C: 406912 bytes used
A: 406912 bytes used
B: 3599952 bytes used
D: 3591384 bytes used

That's interesting: until marker A, there is no problem. Between line A and B, the memory suddenly starts to increase. These markers correspond to the start and end of the fetch_data_from_file function. Let's try to confirm this hypothesis using another technique.

Xdebug profiler

As a PHP programmer, you probably have heard of (and used) Xdebug. If you haven't, check it out and make sure to install it. What you might not know, is that it also comes with a profiler: a tool which provides insight in the run time behaviour of a program. This profiler is much more sophisticated than the PHP functions mentioned earlier: instead of giving you just information about how much memory is used, it also provides insight in which functions are actually allocating memory. This is an advantage over the previous technique, because if you don't really have a clue where to look for your memory problem, you will have to scatter a huge amount of calls to memory_get_usage all over your codebase. Before being able to use the profiler, there are some things that need to be configured in your php.ini. Note that most of these options cannot be set at run time using ini_set.
First, you have to enable the profiler. This can be done in two ways: either by using xdebug.profiler_enable = 1 or by using xdebug.profiler_trigger_enable = 1. When using the first option, a profile is generated for every run of your program. The second option only creates a profile of your running program if there is a GET/POST variable or COOKIE set with the name XDEBUG_PROFILER. You also have to tell Xdebug where it has to store the generated files, using xdebug.profiler_output_dir.
There are more things to configure, but with these settings you are already good to go.

Now, run the script again with the Xdebug profiler enabled. If we look into our configurated output directory, we can find the generated profile there. However, before we can open it, we need another tool: qCachegrind for Windows or kCachegrind for Linux. I will use qCachegrind for now.

When opening the profile with qCachegrind you will see something like the picture below.

Make sure you select 'Memory' in the dropdown menu at the top of the window, as opposed to 'Time' (this option can be useful if performance is an issue).
When looking at the 'callee map' of the {main} entry in the function list, you can see by the size of the blocks how the called functions have allocated memory. The larger blocks are the most interesting: these are the functions that allocate the most memory. Each called function is located inside the caller in the callee map.
In the 'Flat Profile' section on the left, you can see a list of functions. For each function, there is an 'Incl.' and a 'Self' column. 'Incl' indicates the amount of memory allocated by this function, including all the memory which is allocated by the callees of that function. 'Self' shows the memory which is allocated by the function itself.
The functions that are most interesting to look at, are those functions that have a relatively high value in the 'Self' column.
As we can see, there are two functions which take up a lot of memory itself: php:fgets and php:trim. But wait... trim() only trims one line at a time, and fgets only reads one line at a time, right? Why are these functions using so much memory? Here we get at one of the drawbacks of the Xdebug profiler: it generates a 'cumulative memory profile', i.e. when a function is called multiple times, it shows the sum of all the memory that was used over the different times it was called.

Although the Xdebug profile has its drawbacks, it enables us to see where (potentially) a lot of memory is allocated. We can confirm our hypothesis, namely that the fetch_data_from_file seems to have a problem, as this function calls two PHP functions which allocate a lot of memory.

Fixing the script

Note that a profiler, or logging memory usage, will almost never give you an exact answer of what or where your memory problem is located. Manual analysis will always be part of your debugging process. However, the tools do help you to build an idea of where the problem might be. At this point, we know which function likely has a problem. Upon closer analysis of the fetch_data_from_file function, we can see that it uses an array to buffer the complete file. If the file is large, the program will run out of memory. Now we do have enough information to fix it.
Let's work with the assumption that fetch_data_from_file is also used elsewhere, and that its behaviour should not change. Luckily, there is a solution for this problem: we do not actually have to load the complete file.

A relatively simple way to work around this problem is to use a Generator. This excellent post describes the concept in more detail.

In short, a Generator enables you to write a basic iterator, where you have the control over what information is needed in memory. When looping over an iterator, the loop is in control of when it fetches the next item from the iterator. As the iterator knows how to fetch the next item, it does not neccesarily need to have all items in memory. In this example, this means that there will only be one line of the file in memory at a time.

Lets look at the example code from above, with a Generator:

<?php
function fetch_data_from_file(string $file_path) : iterable {
	$resource = fopen($file_path, 'r');
	while (($line = fgets($resource)) !== false) {
		yield trim($line);
	}
}

function finding_nemo(string $filepath) : int {
	$lines = fetch_data_from_file($filepath);
	foreach ($lines as $line_number => $line) {
		if ($line === "nemo") {
			return $line_number;
		}
	}
	return -1; //nemo not found
}

Interestingly, the finding_nemo function did not have to change: foreach loops have no problem with Generators. The fetch_data_from_file function did change: it now contains the yield statement.

When we log the memory usage for this piece of code, we can see that the usage stays low. However, because Xdebug generates a cumulative memory profile, the Xdebug profile will look more or less the same. This happens because in total, the fetch_data_from_file function indeed allocates the same amount of memory. However, the function now frees its allocated memory sooner, leading to a memory usage that is overall much lower than in the previous version. This is one of the drawbacks of using the Xdebug profiler. In a follow-up post, I'll show how to use php-memory-profiler, which generates another type of memory profiles.

Conclusion

In this post, we saw two methods of locating places in your PHP program where a lot of memory is allocated: first, by using PHP's memory_get_usage function, thereafter by generating a memory profile using Xdebug and analyzing it with qCachegrind. One thing to keep in mind is that there is no tool or technique available which will indefinetely point to the problem. As such, your debugging process will always at least partly consist of manual analysis. In the next post, I'll show how php-memory-profiler can help you to find memory leaks in your program.

PHP memory profiling memory leak

Deel deze blog