Getting nodes and display a list

This article focuses on one of the main reasons applications are rejected, that it acquires a list of nodes and renders some sort of output to the browser. It's a common task often seen and often done totally wrong.

So, by example, lets look at a snippet of often seen code that's reviewed and rejected and then we'll break it down and examine each point.

  1. function foo() {
  2.   $sql = "SELECT nid, title FROM {node} WHERE type = 'foo'";
  3.   $result = db_query($sql);
  4.   while ($row = db_fetch_array($r)) {
  5.     $output .= '<li><a href="node/'.$row['nid'].'">'. $row['title'].'</li>';
  6.   }
  7.   return $output;
  8. }

Yes, shocking, but unfortunately common. So what's wrong here? First, lets get the "niggles" out of the way.

In the above snippet, $output is returned but in fact, $output may not even exist if no rows where found. In this case PHP would return NULL so no big deal there. However, depending on the error_reporting level of PHP this is basically what's known as an E_ALLi violation. You have effectively invoked PHP's error system and generated a warning. These warnings can be annoying, esp for developers who run their dev sites with a low warning threshold. Another point is that even if no PHP message is generated, you've invoked PHP's error system anyway costing the CPU precise cycles just to determine if a warnings is even required. And if a warning is generated it's often logged to disk, more IO you could have avoided. Don't do it. Define a variable before you test or return it.

Let's move on to some of the real problems here.

  1.  $sql = "SELECT nid, title FROM {node} WHERE type = 'foo'";

In this SQL code there are a number of problems. First and most obvious is the missing status test. The above code will select both published and unpublished nodes. To correct this, we simply add the status test thus:-

  1.  $sql = "SELECT nid, title FROM {node} WHERE status = 1 AND type = 'foo'";

But, yes, there's more! Why is "title" being selected? On the face of it it seems fairly obvious, the title is going to be required to build the links in the code that followed. Well, that's all well and fine unless of course other modules in the system want to know that you did something with a node, even if it was just to display a title. This part isn't exactly crucial to operation of your module, but you do cut out the chance for other modules to interact with you. Imagine for example another module actually changed the title in some subtle way. You never give it a chance by selecting the title directly and displaying it.

Lets look at the "best practice" Drupal way of doing this. We'll re-write the code to cover what we have discussed so far:-

  1. function foo() {
  2.   $sql = "SELECT nid FROM {node} WHERE status = 1 AND type = 'foo'";
  3.   $result = db_query($sql);
  4.   while ($row = db_fetch_array($r)) {
  5.     $node = node_load($row['nid']);
  6.     $output .= '<li><a href="node/'.$row['nid'].'">'.$node->title .'</li>';
  7.   }
  8.   return $output;
  9. }

Looking better (but still with problems we'll discuss shortly). Here, the use of Drupal's API node_load()< function we now get a properly populated node and all other modules have had their chance to do what ever it is they want with it. So, we're almost there. But this code still has an access by-pass problem as well as others yet to discuss.

The main problem we still have in terms of access is node visibility. Modules such as OG and Simple Node Access can only do their work if you play along with Drupal's node access system. To put this in simple terms, you need to allow Drupal to alter your SQL so that it follows Drupal's access restriction system. It's very simple do implement. Firstly, the SQL should be formatted correctly and secondly it needs to be passed through Drupal's db_rewrite_sql()< function. So lets add this in and see where we are:-

  1. function foo() {
  2.   $sql = "SELECT n.nid FROM {node} n WHERE n.status = 1 AND n.type = 'foo'";
  3.   $result = db_query(db_rewrite_sql($sql));
  4.   while ($row = db_fetch_array($r)) {
  5.     $node = node_load($row['nid']);
  6.     $output .= '<li><a href="node/'.$row['nid'].'">'.$node->title .'</li>';
  7.   }
  8.   return $output;
  9. }

Notice that the standard table "short name" is used. For the node table it's simply n. Each field is prefixed with this so as to avoid any table/field ambiguity when the SQL engine parses the SQL (as Drupal may add tables/fields of it's own and we need to avoid SQL parse problems). Then, the SQL is run through db_rewrite_sql() to let Drupal do it's work.

Point to note, this system is also employed when selecting taxonomy terms.

OK, we've made progress but there's still some "awfulness" we need to clean up on. Let's first look at the glaring
XSS< problem and despatch it to the "I won't do that again, honest" queue. This is so common a problem that we'll fix the above code just to remove it in an easy to understand way.

Basically, Drupal works on a simply filter philosophy, input is unfiltered (excepting for SQL injection protection) and output is filtered. So, in the above, it should now be obvious that $node->title went into the database unfiltered yet is now on it's way to the browser also unfiltered. This is so easily fixed thus:-

  1. function foo() {
  2.   $sql = "SELECT n.nid FROM {node} n WHERE n.status = 1 AND n.type = 'foo'";
  3.   $result = db_query(db_rewrite_sql($sql));
  4.   while ($row = db_fetch_array($r)) {
  5.     $node = node_load($row['nid']);
  6.     $title = check_plain($node->title);
  7.     $output .= '<li><a href="node/'.$row['nid'].'">'.$title .'</li>;
  8.  }
  9.  return $output;
  10. }

The simple addition of check_plain()< just spared all your potential users from some potential awfulness in their browser. Always remember, when handling data ask yourself "where did this data come from?". If it's remotely at all possible it came from somewhere you have no clue, then it came from an "untrusted source". Always filter untrusted data with one of the check_*() or filter_xss()< functions. In short, read the many guides about XSSi. You'll be surprised how common it is missed, especially when it comes to a node or comment title. But look at all your data types. Think them through carefully.

OK, so we're getting better but we are not there yet. So what's next? The above is output links. But the links are generated hard coded. There's no opportunity for path aliases to be used or other goodness. So lets introduce the l()< function and work it's magic.

  1. function foo() {
  2.   $sql = "SELECT n.nid FROM {node} n WHERE n.status = 1 AND n.type = 'foo'";
  3.   $result = db_query(db_rewrite_sql($sql));
  4.   while ($row = db_fetch_array($r)) {
  5.     $node = node_load($row['nid']);
  6.     $link = l($node->title, 'node/'. $row['nid']);
  7.     $output .= '<li>'. $link .'</li>';
  8.   }
  9.   return $output;
  10. }

You may now be asking "where did check_plain go? Hasn't this reintroduced the previous XSSi problem?". Well, no. The l()< function will actually filter your link anchor text for you. A number of Drupal's functions will do this along with Drupal's FormAPI.

Well, the function above is basically returning a list of links. It's a string contain HTML. Now, the HTML may be fine for you and your site. In your setting you can do what you like, adjust to suit as you like. But when going to the world of Contrib and sharing your module, others may well want to adjust the HTML your module returns. Here, their only solution is to actually alter you modules code. Now, that's not good on many levels. Easiest to understand is "what do these people do when you produce a new version?", Hack you code again? That just isn't thoughtful of you! So how do we produce HTML that behaves with interoperability in mind? That's where Drupal's theme system comes into play.

There are a number of ways to work this one. I'm going to choose the simplest and use an internal Drupal theme function designed to format lists. Here's the final theme friendly version:-

  1. function foo() {
  2.   $links = array<();
  3.   $sql = "SELECT n.nid FROM {node} n WHERE n.status = 1 AND n.type = 'foo'";
  4.   $result = db_query(db_rewrite_sql($sql));
  5.   while ($row = db_fetch_array($r)) {
  6.     $node = node_load($row['nid']);
  7.     $links[] = l($node->title, 'node/'. $row['nid']);
  8.   }
  9.   return theme('item_list', $links);
  10. }

And, well, there it is. Something that looks a lot better than when we started. There are other things that are not covered here like "what string is returned if no rows selected?" etc. But that's more down to your own personal best practice at coding.