How to Make a Disk Cache Using PHP


If you have a busy PHP driven website and don’t want to make constant queries to the database for each user that will see the same data, then you can easily resolve this problem by letting the first of these visitors generate a cache for the consecutive visitors.

Letting this cache last for a few minutes allows your database to not get overloaded and allows for a much faster website experience as well as a much cheaper hosting bill for you at the end of the month.

Explain with code

Let’s say that you have a function that gets called each time someone visits your website:

function heavyDatabaseFunctionCall() {
  $data = array();

  // This SQL has indexes but the DB server gets hit hard
  $sql = "SELECT mt.*, amt.reviews FROM massive_table mt
          JOIN another_massive_table amt ON r.mid=mt.rid
          WHERE mt.status='1'
          GROUP BY mt.active
          LIMIT 100";

  // get the data from the server
  $res = $DB->query($sql);
  if (count($res)>0) {
    // populate if available
    $data = $res->fetch_arr();
  }

  // return the data
  return $data;
}

The above function makes a call to the Database Server and returns the data if it’s available.

In this case study, it is declared that the same data is shown to all website visitors, and therefore a cache of the data can be served to all visitors. However, it would be ideal to refresh this data every few minutes in case there is an update to the database. This amount of time can be adjusted down the line, but for now, we will settle with 5 minutes as our expiry time for the cache.

How to approach this problem

A really easy way to approach this problem is to let the first visitor who makes a call to this function, persist the $data into a file on disk, which will then be read back in the function for everyone else that calls this function within the expiry time.

To do this, we can create a writable directory called ./cache/ and store our data in there.

Solving with code

If we were to rewrite the function, we could do something like this:

function heavyDatabaseFunctionCall() {
  $data = array();

  // set some variables for the cache
  $cachename = "heavyDatabaseFunctionCall";
  $cachefile = 'cached-'.$cachename.'.json';
  $cachetime = 300; // 5 minutes

  // if the cache file is present
  // ..and is not over 5mins old
  // ..the use it instead of the database call!
  if (file_exists($cachefile) && time() - $cachetime < filemtime($cachefile)) {
    //get the file
    $json = file_get_contents($cachefile);
    // convert the json to an array
    $data = json_decode($json, true);
  } else {
    // This SQL has indexes but the DB server gets hit hard
    $sql = "SELECT mt.*, amt.reviews FROM massive_table mt
            JOIN another_massive_table amt ON r.mid=mt.rid
            WHERE mt.status='1'
            GROUP BY mt.active
            LIMIT 100";

    // get the data from the server
    $res = $DB->query($sql);
    if (count($res)>0) {
      // populate if available
      $data = $res->fetch_arr();

      // encode the array to json
      $json = json_encode($data);
      // save the json to disk
      file_put_contents($cachefile, $json);
    }
  }

  // return the data
  return $data;
}

In the above function, we can see how adding a little bit of code has now saved the SQL from executing each and every time the function is called.

Production grade thinking

While there are much more “production-grade” methods to achieving cache systems, especially when you have to address distributed web servers and more complex setups; there exists alternate solutions. Some of these may include setting up a Redis layer, or even Memcached, or perhaps a Varnish layer.

This tutorial highlights the potential for an alternative viewpoint to these traditional resolutions for smaller setups that may not require the bulk of a larger solution.