At one point or another, all content management systems (CMS) come down to some kind of datatype. You have to be able to set a field to a string, or an integer, or whatever, and then enforce and manage that piece of data. The idea is that you take these datatypes and glue them together to form classes of objects.
(Note: “field” and “datatype” are two different things. A “datatype” is a description of a type of data: string, datetime, etc. A field is an example of that: title, body, author, etc. You may have a dozen fields in a database, all of the “varchar” datatype.)
In a lot of systems that are specific to one type of content, the fields are known to the system. Movable Type, for instance, knows that the title field is going to be a string datatype, so it can handle it as such.
But what about systems that allow developers to create their own classes by gluing datatypes together? (See “Open and Closed Content Management” for some more insight here.) How do you validate an object if it could be comprised of any kind of data?
The answer is to make the fields smart: don’t do their work for them, make them do it themselves. Then all you need is some “controller” code that only needs to know one thing: what fields do I need to order around? The controller doesn’t know or care how the fields are completing each action it tells them to complete — that’s the field’s responsibility.
I’ve done this on a fairly large scale, and it’s worked beautifully.
The Field Object
Each field in my database corresponds to a “Field” object instance. This object knows its datatype — whether or not its a string or an integer or whatever. (Technically speaking, it could extend or mix-in a “Datatype_String” or “Datatype_Integer” class in order to get all the methods it needs. In my case, there’s a “meta table” in the database that describes all the fields of the various content objects and what the datatypes are for each.)
Here are some methods I’ve put on my “Field” class which correspond to stuff every self-respecting field needs to be able to do for itself:
Making These Methods Do Something Useful
Consider the moment an edit form is submitted. Using our Field class, the controlling code (the code that fields the inbound request) just needs to know what fields are expected, then iterate over them and:
The key here is that the controlling code doesn’t know how any of this happens — all that is handled inside the various Field objects. It just needs to issue the order to each field in turn.
There actually may be more actions that need to be taken (you could have a “Log Change” method, for instance), but that doesn’t matter — the point is that the controller has elevated itself above the dirty work of doing these things. It just sits back and tells the fields to do it to themselves, and since the fields know what datatype they are, they know how to do it.
Lame analogy: Instead of tying the shoes of every kindergartner in the classroom, you teach them how to tie their own shoes (which could be different for each one, given sandals, boots, velcro straps, etc.), then you just walk down the row and tell each one: “Tie your shoes.”
A Note on Validation
Field types can actually be classified into “Data Types” and “Value Types.”
Data Type is the actual field of the database — be it varchar, int, datetime, whatever. This is the value that the machine cares about. Value Type is the functional value of the field — what the field represents. This is the value that the human cares about.
For instance —
A field may have a Data Type of “int,” meaning it needs to be an integer to fit into the database. However it might have a Value Type of “year,” meaning it’s comprised of exactly four numerals. Value Type is necessarily more restrictive than Data Type — all years are valid integers, but not all integers are valid years.
This means that validating the field should go from Value Type to Data Type — if the Value Type validation passes (that the value is exactly four numerals), then the Data Type validation must be assumed as true.
Complex Datatypes and Serialization
Datatypes can be “simple” or “complex.” A simple type is data that was really meant to be stored in a single database field — a string, for instance. A complex type is data that wasn’t meant to be stored in a single field. Since you don’t want to hack the data model for every new datatype, you need a method to force fit the data.
For instance, an “image” datatype could actually store several things (“sub-fields,” if you will):
All this information needs to be “rolled up” into a string of XML or YAML or whatever so you can insert it. This is part of the Store method — each field knows how to serialize itself — the “string” datatype returns its value pretty much unchanged, while the “image” datatype does some serious serialization acrobatics before returning the “storable” representation of itself.
(Additionally, what’s great about making the fields responsible is that the “Render for Edit” method can render an input widget however it needs to be — in this case, it would actually return a chunk of HTML with four or five input fields that would have names like “image[file]” and “image[caption]”. Then the “Retrieve from POST” method would know how to gather all these from the POST and “reconstitute” them into an object.)
The one problem with serialization is pretty obvious: marshalled data is tough to search with SQL. Don’t forget that you’re denormalizing your data and that has consequences. What if you wanted to find all images with “gumby” in the caption? Tough to do when that information is buried in a XML string (unless your database can do XPath queries on fields — then you’re in great shape). We talked about this same thing here, where I called it “data globbing.”
Making It All Work
Yes, this is complicated, but it all comes together beautifully in that fateful moment when you have to create a new datatype. Like this —
Say you have a system to track all the dates you’ve ever been on (hint: this means you’re a loser). Within this system, you have a “Date” object. This object has fields for “Partner,” “Date and Time,” “How Fun Was It,” etc.
You decide you want a field for “Movie We Went To” and you want to track this information (these “sub-fields”):
Now, normally “Movie” would become a new class in the system and a “Date” object would store a reference to a “Movie” object. However, if you’ve built your datatype right, it can all be stored and managed as a field on another object, you just need to honor the contracts of the methods we’ve discussed:
Remember that these methods are all happening in amongst the other fields. This field is called in turn as the controller iterates over the fields, each one validating their contents in their own special way.
I’ve used this theory on a sizable system and it works beautifully. I implemented it when I was confronted by a boss who would march down to my desk at any moment and say “I want a field for X” and expect that field to be in the interface and fully functional by the time he got back to his desk (in fact, I wrote this post about just that problem).
One table in this system has grown to 108 fields. This is a lot of fields for a database table, but it’s manageable because the fields all have datatypes, and the datatypes know how to manage themselves. So when the boss wants to store something new, I just add that field to the database table, then add a record to the “fields” table which describes (1) the name of the field, and (2) the datatype of the field.
The rest is pretty simple. So long as the field knows where to put its data and how to take care of itself, there’s not much to it and you can start writing new datatypes for anything that may come along.
YamlInFiveMinutes: I mentioned YAML a couple of times in a prior post, and while looking for a good link, I found this little tutorial. It's quite good, and, true to the title, it's short -- I think I got through it in four minutes. YAML is so much nicer than…
Here's something I've learned over the years: when modeling data to build a database, be very careful what fields you decide to include. Don't throw in extraneous fields just because "someone might want to store that piece of information someday, and it's no big deal to include it..." It is…
As I become a more experienced developer, I'm learning when you should and shouldn't break the rules. While following every rule of programming and data modeling is wonderful, sometimes you need to bend the rules for the sake of simplicity and expediency. Always remember, an app in the hand is…
I was over at OpenSourceCMS.com today playing around with some content management systems: phpWebSite, Xoops, and PostNuke. They were all quite good, with phpWebSite being the one I enjoyed the most. But I want to articulate something I've had I was over at OpenSourceCMS.com today playing around with…
Great idea. If there is a way you can publish some of your works, that will be really helpful.
Great article :) Only two things sccaried me:
However it might have a Value Type of “year,” meaning it’s comprised of exactly four numerals.
Looks like another millenium bug, which will reveal itself at year 9999 :D
One table in this system has grown to 108 fields
Man! 108 fields?! :o Why? Wasn't it possible to break it for some smaller [read: easier to maintain and comprehend] tables? :|