Minimize memory usage when dealing with large files

suggest change

If we need to parse a large file, e.g. a CSV more than 10 Mbytes containing millions of rows, some use file or file_get_contents functions and end up with hitting memory_limit setting with

Allowed memory size of XXXXX bytes exhausted

error. Consider the following source (top-1m.csv has exactly 1 million rows and is about 22 Mbytes of size)

var_dump(memory_get_usage(true));
$arr = file('top-1m.csv');
var_dump(memory_get_usage(true));

This outputs:

int(262144)
int(210501632)

because the interpreter needed to hold all the rows in $arr array, so it consumed ~200 Mbytes of RAM. Note that we haven’t even done anything with the contents of the array.

Now consider the following code:

var_dump(memory_get_usage(true));
$index = 1;
if (($handle = fopen("top-1m.csv", "r")) !== FALSE) {
    while (($row = fgetcsv($handle, 1000, ",")) !== FALSE) {
        file_put_contents('top-1m-reversed.csv',$index . ',' . strrev($row[1]) . PHP_EOL, FILE_APPEND);
        $index++;
    }
    fclose($handle);
}
var_dump(memory_get_usage(true));

which outputs

int(262144)
int(262144)

so we don’t use a single extra byte of memory, but parse the whole CSV and save it to another file reversing the value of the 2nd column. That’s because fgetcsv reads only one row and $row is overwritten in every loop.

Found a mistake? Have a question or improvement idea? Let me know.

Table Of Contents