This is something wonderfully straight forward in putting together a really nice SELECT statement in SQL. There’s all your data just flowing back out at you in the order you asked for. Slap together your HTML around it with PHP and you’ve got yourself a dynamic web page.
If you’ve been doing this PHP thing for more than a project or two you have certainly noticed the mess that is the database functions. An entire stack of functions for MySQL, Microsoft SQL, ODBC, PostgreSQL, etc. Sure, PHP is rich in interfaces to a wide variety of database platforms, but trying to change which platform your using can be a little nightmare. Were you supposed to run mysql_query() or pg_query()? Like many of you, I put together a little set of abstraction tools that we’re pretty much a set of functions that added a layer between the code that needed the data and the database itself. Now that PDO is on the scene it would seem that these kinds of custom libraries are no longer needed.
There is still a place for creating your own libraries though. Implementing tools that push the SQL into the background and allow you to focus on your logic and presentation. In this article I will share some methods I use to abstract UPDATES and INSERTS.
Not So Friendly SQL
Most of the web projects I work on require a lot of data lookups from a database. This really hits home with the publicly viewable pages where data lookups are critical to performance. On the other hand, the data needs to get in there somehow in the first place. The need for content management screens has always been a part of any dynamic web site.
As a developer I really used to hate dealing with forms. Those darn things are just filled with bugs the moment you start in to them. A few of the pitfalls in dealing with them…
- Setting the max length on input fields
- Checkboxes that don’t pass anything along if not checked
- Basic validation
- Filling in values for editing
- Does the database need to UPDATE or INSERT
This list could go on for a while. I used to deal with these things manually, and without exception I always ran into issues with the UPDATE and INSERT actions. I’d quote an integer, or forget to quote a string, or some other typo would stop the process. I think I have spent more time debugging forms than any other kind of code. Then when you consider what a nasty set of security glitches can happen when a user gets to enter data into your code… yuck!
It Was Always Right There
So one day I got to looking at the database tables that I was using on a project. It sort of just hit me that almost everything I needed to know about my forms was just sitting there in the database. I created those tables to reflect the kind of data I would need to display, and the forms to get that data in there. Every field in the table had a clue as to what I was going to do with that data.
It was one of those huge, “well duh!” moments. Something so obvious that I’ve been overlooking it for far too long. So how do I put this to use?
I can tell you right now that this approach isn’t a catch all for every kind of data updating. If your site is getting hammered with updates and inserts you most likely don’t want any kind of abstraction going on. You certainly don’t want to waste time looking in the database to see what data needs to go in. You’ll want to have hard coded what fields are getting data. To make what I’m about to discuss work you need to run an additional query on the database to figure out what goes where, then you can follow that up with putting data in.
For the bulk of the work I do I’m usually not that concerned with squeezing the last performance drop out of putting data into the database. Usually there are only a handful of people who are authorized to enter data, so I’m not looking at a bottleneck for most data writing. A small performance hit is acceptable here for some big integrity and simplification gains.
What You Know About Your Data
I mostly work with MySQL for my web site needs. Despite some very good arguments against using them, I almost always have an auto incrementing field acting as my primary index. Again, this is not for a site getting hammered with updates. The only type of table I wouldn’t have a primary index for would one that I flush the entire table out before importing new data. A feed from another site or alternate data source for example. It’s safe to say that every other table in use will have that primary index field in there.
I also know that if my updating routine is not passed a value for this primary index, then I’m looking at performing an INSERT. When an index is passed along I know an UPDATE is needed, since I wouldn’t have had that index value unless a record already existed. Pretty basic stuff.
Every other field in a table is going to be of some “type”. This could be a varchar, integer, or any other valid MySQL field type. From the stand point of figuring out how to dynamically put together the SQL from PHP I don’t need to get that specific. I do need to know some basics though.
- I need to know if the type should be expecting a string or not, since I need to know if quotes around the value are needed.
- If a string is of a particular size, like a varchar(120), then I’ll want to make sure I don’t try to put more than the value needed.
- Integer and float types better not be getting letters stuffed in there.
- A float can have a decimal, an integer can’t.
- All strings should be run through the appropriate escape routines so invalid characters don’t muck up the SQL.
- Dates and times need a bit of special handling to make sure they’re in the correct format.
- Which field is the primary key.
From that criteria I determined that what I needed was 4 basic field types that my code could utilize, rather than trying to have a special case for every kind of MySQL type. Also, this allows for abstracting to other databases that have an even larger variety of field types. The types in question are:
- string
- integer
- float
- date
To handle the conversion between the MySQL field type and the one that I will be using I just put together an associative array to handle this.
$dataTypes = array("tinyint" => "integer", "smallint" => "integer",
"mediumint" => "integer", "int" => "integer",
"bigint" => "integer",
"float" => "float", "decimal" => "float",
"double" => "float",
"varchar" => "string", "text" => "string",
"char" => "string",
"datetime" => "date", "date" => "date",
"time" => "date");
I realize there are more field types then this, but these are the ones I personally utilize most often. Besides, I’m not going to do ALL the work for you. What fun would that be?
Looking Up What To Look Up
So this article doesn’t turn into a full blown book here I’m going to focus in on how to do this with those old style MySQL functions. I’m also going to put this together with a set of functions, though this really works better within the framework of an object. You’re going to want to tweak this to your needs, and there’s certainly more work than just throwing PDO at it to fully abstract multiple databases.
I’m going to assume that you’ve already established a connection to your database and have a resource pointer for it. I’ll call it $dbPtr here. This also assumes you have a table with a primary index and some fields of various types. We’ll just call this “InfoTable”.
With all the disclaimers out of the way, let’s get that field information from the table, which we’ll be using later. To do this I’m going to run a query out to the database asking for all the columns, or fields, from the specified table. This is probably the fastest way to get everything we need out of there since we’re only making one request to the database server. I will then examine each field as it comes out to determine which of my four basic field types each one of these fits into.
When field types come out of MySQL they may contain an attribute, like “varchar(120)”. From the perspective of what I’m trying to do here these are two valuable pieces of information, so I need extract that number as the “size” of the field and get “varchar” by itself so I just have the type. Once the type is known, I then look it up in my $dataTypes array for my more generic type.
function dbFields_mysql($table)
{
global $dbPtr;
$fieldList = array();
$fieldIndex = "";
$dataTypes = array("tinyint" => "integer", "smallint" => "integer",
"mediumint" => "integer", "int" => "integer",
"bigint" => "integer",
"float" => "float", "decimal" => "float",
"double" => "float",
"varchar" => "string", "text" => "string",
"char" => "string",
"datetime" => "date", "date" => "date",
"time" => "date");
// Go out and hunt down all that field information
$query = "SHOW COLUMNS FROM $table";
$queryResource = mysql_query($query, $dbPtr);
// Run through it all and fill in the fieldList and fieldIndex vars
while ( $row = mysql_fetch_assoc($queryResource) ) {
$field = $row["Field"];
$prop = $row["Type"];
$key = $row["Key"];
// Look for a value stored in parenthesis
$sizeStart = strpos($prop, "(") + 1;
$sizeLength = strpos($prop, ")") - $sizeStart;
if ($sizeStart > 1) { // We have a parenthesis
$type = substr($prop, 0, $sizeStart - 1);
$size = substr($prop, $sizeStart, $sizeLength);
} else { // No parenthesis, so set the size to zero
$type = $prop;
$size = 0;
}
// Look for a space in the Type, as that is used to specify unsigned
// or some other type of attribute for this field.
if ( strpos($prop, " ") ) {
$attrib = substr($prop, strpos($prop, " ") + 1);
} else {
$attrib = FALSE;
}
// Assign the generic data type as the type we'll be storing in the
// returned array.
if ( array_key_exists($type, $dataTypes) ) {
$type = $dataTypes[$type]; // Assign a generic data type
}
// Assign what we've discoverd to the fieldList array
$fieldList[$field]["type"] = $type;
$fieldList[$field]["size"] = $size;
$fieldList[$field]["attrib"] = $attrib;
// Note if this is the primary key or not
if ($key == "PRI") {
$fieldList[$field]["key"] = TRUE;
} else {
$fieldList[$field]["key"] = FALSE;
}
}
// Send it all back to the caller
return $fieldList;
}
Using What The Database Told Us
Thanks to that dbFields_mysql() function we now know what is allowed to go into the database. We know the type, the size, and the attribute of each field. We even know which field is the primary index.
Although this post is about simplifying the construction of INSERT and UPDATE operations, you should also consider what else you can do with this. You could put together routines that know exactly what the maximum length of an input field should be on a form. Upon submitting you can use this information to validate that numbers are where they need to be, and strings where they need to be. All of this is now dynamically figured out, greatly reducing the amount of code needed to run basic validation. This may be a fine subject for a new article, but for now let’s get back to slapping some data into the database.
Normally you would take all the information coming in from a form submission and go about constructing your SQL with that. New records would get all the INSERT syntax, where existing records get all that UPDATE syntax. Of course, neither of these two look anything like the other. You would either create 2 sections of code to handle each, or perhaps utilize the INSERT ON DUPLICATE KEY UPDATE utility. When I’m trying to get data stored I just don’t want to deal with the hassle of the syntax intricacies involved with any of this. I’ve got this data that needs to go in these fields, just do it already!
What if we could just match up the data to the fields, then let some other bit of code deal with putting together the SQL. How about a nice dbUpdate() function? We’d have to tell it what table we want to update, as well as the data itself. Also need to give it the value of our primary index when we’re doing an edit. Okay, I can deal with that:
$tableData = array("name" => $name,
"address" => $address,
"state" => $state,
"zip" => $zip,
"salary" => $salary,
"friend" => $friend);
$indexValue = 3;
$tableName = "InfoTable";
// All the data is together, so update the database
$returnIndex = dbUpdate($tableName, $tableData, $indexValue);
This is pretty much all the code you’d need to update this person in our make believe table. No need to check for strings being properly escaped, or whether quotes are needed. Even check boxes are dealt with properly. Leave out the $indexValue, or set it to zero, and it runs an INSERT automatically. It’s also far easier to read than the SQL you would normally use, thus making typos a little less likely. Even a typo won’t stop this from working, as we know what fields are allowed. We can weed out fields that don’t really exist.
Constructing The SQL
Now that you’ve seen what an INSERT or UPDATE would look like using this dbUpdate() routine, let’s get into actually coding some of this up. Be warned up front, I have not thrown in all of the data validation that can be done here. My intention is to show the technique, not put together a ready to use package.
function dbUpdate($table, $fldValArray, $indexValue=0)
{
global $dbPtr;
$q = "";
$indexValue = intval($indexValue); // Make sure this is an integer
// Make sure we've at least got the minimum stuff to run this
if (strlen($table) < 2 or count($fldValArray) == 0) {
return FALSE;
}
$fieldList = dbFields_mysql($table); // Load up all the stuff we need to know
// about the fields in this table.
if (count($fieldList) < 1) { // doesn't look like we have a field list
return FALSE;
}
if ($indexValue != 0) { // Do we need to know which is the primary key?
// Yes we do. Find it!
$primaryKey = "";
foreach ($fieldList as $field => $value) {
if ($fieldList[$field]["key"]) {
$primaryKey = $key;
}
}
if ( !$primaryKey ) { // An index value, but no primary key... ERROR!
return FALSE;
}
}
if ($indexValue > 0) { // With an index value we need to UPDATE the table
$q = "UPDATE $table\n";
$q .= "SET\n";
foreach ($fldValArray as $field => $value) {
if ( array_key_exists($field, $fieldList) ) { // Only run for valid fields
switch ( $fieldList[$field][$type] ) { // Prepare the value based on the type of field
case "string":
$value = strval($value);
$value = trim($value);
if ( $fieldList[$field]["size"] > 0 ) { // Chop it down to size
$value = substr($value, 0, $fieldList[$field]["size"]);
}
// Make sure we've got a clean input
$value = mysql_real_escape_string($value, $dbPtr);
$q .= "$field = \"$value\",\n";
break;
case "date":
$value = strval($value);
$value = trim($value);
$q .= "$field = \"$value\",\n";
break;
case "integer":
$value = intval($value);
$q .= "$field = $value,\n";
break;
case "float":
$value = floatval($value);
$q .= "$field = $value,\n";
break;
default:
$q .= "$field = \"$value\",\n";
}
}
}
$q = substr($q, 0, -2); // Strip the last comma and line feed off
$q .= "\nWHERE ".$primaryKey." = ".$indexValue;
$qr = mysql_query($q, $dbPtr); // Run the update
} else { // No id passed, so put together an INSERT
$fldStr = "("; // Begin the field section of the INSERT
$valStr = "("; // Begin the values section
foreach ($fldValArray as $field => $value) {
if ( array_key_exists($field, $fieldList) ) { // Only run for valid fields
$fldStr .= $field.", ";
switch ( $fieldList[$field][$type] ) {
case "string":
$value = strval($value);
$value = trim($value);
if ( $fieldList[$field]["size"] > 0 ) {
$value = substr($value, 0, $fieldList[$field]["size"]);
}
$value = mysql_real_escape_string($value, $dbPtr);
$valStr .= "\"$value\", ";
break;
case "date":
$value = strval($value);
$value = trim($value);
$valStr .= "\"$value\", ";
break;
case "integer":
$value = intval($value);
$valStr .= "$value, ";
break;
case "float":
$value = floatval($value);
$valStr .= "$value, ";
break;
default:
$valStr .= "$value, ";
}
}
}
$fldStr = substr($fldStr, 0, -2).")"; // Strip comma and add a closing ")"
$valStr = substr($valStr, 0, -2).")"; // Strip comma and add a closing ")"
$q = "INSERT INTO $table\n";
$q .= $fldStr."\nVALUES\n".$valStr;
$qr = mysql_query($q, $dbPtr); // Run the INSERT
$indexValue = mysql_insert_id(); // Pull in the primary key ID
}
return $indexValue;
}
So let’s walk this through step by step.
The indexValue defaults to 0 so that can be an optional argument. I then make sure that this is an integer no matter what.
So things don’t totally go awry I check that a table and some data has been passed on in here.
I then populate my fieldList array with the dbFields_mysql() function shown earlier. That fieldList array has everything I need to know about getting this SQL together. I then make sure something is loaded in that array before I let this go any further. There may be a typo with the table name.
I only need to know which field is the primay key if I’m going to update this record, which means I only need to hunt it down if indexValue has a value. If it does, I then store the name of this field as $primaryKey.
The next big section is broken into two segments depending on whether an index value exists. The first section puts together and updates the database. The second does almost the same thing for an insert. Both sections utilize a switch/case setup to determine what kind of clean up and formatting is needed for each field. If the field this function has been asked to update doesn’t exist, it doesn’t add it to the SQL.
Whether an UPDATE or an INSERT take place you always get back the value for the primary index.
What Is Missing
This isn’t exactly the code I use in production since I have further abstraction going on within a class usually. I didn’t want to get into the full framework I use right here, so I presented a procedural approach to this. Utilizing an object oriented approach allows me to store the field list and primary index as member data, so I normally don’t have to dig that kind of stuff out.
The worst bug in this version is that it doesn’t take into account what to do if none of the fields being requested to update exist. If you do run into a situation like this I would recommend that you stop trying to type with your elbows and start working with your fingers. Seriously though, a solid function should anticipate this and deal with it.
You could also work in much tighter controls on numeric and date fields to further validate what is going in to make sure the values are not jumping out of bounds and are in the correct format. I left that out of there in this example to keep things brief.
Does It Work?
Since putting this together into a class I’ve found that I rarely have to write INSERT or UPDATE syntax. I’ve also spent very little time having to debug why an update to the database has failed. These functions are tightly aligned to how I put together my database tables which stay pretty consistent between projects, so it’s worked out very nicely for me. Yes, there is a slight performance cost, as I have already discussed, but not one that’s noticeable under a light updating load.
To apply this to other database engines you will want to do more than just use a simple abstraction layer like PDO. Each one has different field types and ways in which to extract the field information from them. Still, there’s no reason why you couldn’t duplicate this effort for another engine with a simple wrapper function to decide which bit of code needs to run for which engine. Once you have established a call to user defined functions as your means to talk to your database, you won’t have to go revisit that code again.
Summary
I hope I’ve provided to you a different way of approaching getting data into your database. By creating functions or methods for handling this task I believe you will simplify your code, provide for better data integrity, and generally make the world a more wonderful place.
[...] ArtoHeikinaro wrote an interesting post today onHere’s a quick excerptThis is something wonderfully straight forward in putting together a really nice SELECT statement in SQL. There’s all your data just flowing back out at you in the order you asked for. Slap together your HTML around it with PHP and you’ve got yourself a dynamic web page. If you’ve been doing this PHP thing for more than a project or two you have certainly noticed the mess that is the database functions. An entire stack of functions for MySQL, Microsoft SQL, ODBC, PostgreSQL, etc. Sure, PHP is rich in interfaces to a wide variety of database platforms, but trying to change which platform your using can be a little nightmare. Were you supposed to run mysql_query() or pg_query()? Like many of you, I put together a little set of abstraction tools that we’re pretty much a set of functions that added a layer between the code that needed the data […] [...]
Pingback by A Different Kind Of DB Abstraction — March 8, 2008 @ 5:52 pm |
I found your site on google blog search and read a few of your other posts. Keep up the good work. Just added your RSS feed to my feed reader. Look forward to reading more from you.
- Sue.
Comment by Sue Massey — March 8, 2008 @ 6:00 pm |