Databases / XML

RSS feed for this category only.

MySql dips its toe into closed source

MySQL reserves features for paying customers; open-source community up in arms: This is interesting, and perhaps not surprising given the acquisition.

Open-source darling MySQL is facing a new uprising within its customer base over plans disclosed this week to reserve some key upcoming features, and their source code, for paying users of its namesake database.

Officials at Sun Microsystems Inc., which acquired MySQL in February, confirmed that new online backup capabilities now under development will be offered only to MySQL Enterprise customers — not to the much larger number of users of the free MySQL Community edition.

This will have very little effect on the Web development community, since we’re not the ones to use the high-end features anyway. It’s more targeted at making sure enterprise customers are truly paying for the product.

There’s really two ways this can develop:

  1. From here on out, all “enterprise-y” features are only released to the Enterprise version of the product
  2. There’s some delay, with these features eventually filtering to the Community version after some arbitrary time period has elapsed

Try as I might, I just can’t fault MySQL for this. Read this Wired article about the acquisition:

One analyst thinks MySQL’s annual revenue may come in at $70 million to $85 million.

“Less than 1 percent of MySQL users pay,” wrote Global Equities Research analyst Trip Chowdhry. “Users are in the habit of getting free stuff and it’s impossible to break the habit… We don’t see Sun (being) able to monetize it.”

Open source is wonderful, but there comes a point where a company has to turn a profit. Sun didn’t cough up $1 billion just for fun.

Amazon SimpleDB

Amazon SimpleDB™- Limited Beta: Amazon has announced SimpleDB, which allows you to build a database “in the cloud” and query it from anywhere. It’s not SQL, however, and it’s not relational — it’s “structured data.”

Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.

Pricing is $0.14 per “machine hour consumed,” and about the same per GB data transfer both in and out.

Fiction Markup Language Redux

Fiction Markup Language: Years ago I wrote about an idea called Fiction Markup Language. I said:

When is someone going to come up with Fiction Markup Language — an XML spec solely for annotating fiction?

Well, today I got an invite to a Google Group on this. Russell Warner is actually giving it a go, as detailed here. He seems to have a good start on it as well. There’s a snippet of markup in there which looks really promising.

i-am-a-dummy

Here’s a command line switch for MySQL that’s both funny and pretty handy:

--i-am-a-dummy  Synonym for option --safe-updates, -U.

Yes, there is a MySQL switch called “i-am-a-dummy”. It’s an alias of another switch, which provides more information:

--safe-updates  Only allow UPDATE and DELETE that uses keys.

Now, that’s handy. This means you couldn’t accidentally wipe out your table because the system won’t process DELETEs or UPDATEs that don’t specify primary key values. Smart. It strikes me that the alias should be “i-am-not-a-dummy.”

PHPRunner 3.1

PHPRunner 3.1 released: Sergei has released the latest version of PHPRunner. I’m a big fan of this software — it’s so handy for quick database interfaces, and I’ve actually used it in production a few times, which is saying something because I’m a snob.

Look at it like this, PHPRunner is a lot like phpMyAdmin, but you can lock it down harder and pretty it up for non-database types.

Virtual Tables as Join Fodder

Databases should have some built-in “virtual” tables filled with universal sequential data, against which you can join during queries. For instance, there should be a table called “dates” which is just a single-field table with a sequential list of dates, stetching into eternity in either direction.

How would you use this? Say I wanted to get a list of the number of comments posted every day on this blog. I could just use COUNT and GROUP BY to roll them up, but if there were no comments on a particular date, then that date just doesn’t show up (since there’s noting to count).

To get a “0” on date where there were no comments, I would need to join my original query on a table of dates. As it sits, I need to create this table and populate it and maintain it myself. Why couldn’t this be built into a database platform?

There should be a table for numbers too. I was thinking about letters of the alphabet (but then you get into localization issues), days of the week, and months of the year. However, you can create those and maintain them easily, since they’re finite. Dates and numbers stretch into infinity in both directions, so they’re harder to deal with.

Does something like this exist on any platforms that you know of?

Group_Concat in MySQL

Quick Merge in MySQL: I wished for this functionality in SQL Server for years. There are so many situations when this would come in handy.

Using the GROUP_CONCAT function though, I no longer need to loop through them. I concatenate the list on the MySQL side, which returns me one record with one field.

XML Cooktop

I’m doing a project with Ektron CMS400.Net right now, and that system relies pretty heavily on XSL, which I hate. I really think XSL is a tool of devil, meant just to drive me mad. But I digress.

I needed a good little XSL authoring app. XMLSpy is the absolute gold standard in this space, but it’s also $400, which I didn’t really want to pay. Additionally, it’s huge. Hugely good, but bloated beyond belief.

The solution was a neat little open-source app called XML Cooktop. It was exactly what I needed: an interface that takes an XML file and an XSL file, and lets you transform the two with the click of a button. It shows you the raw HTML (well, XML — whatever) it produces, as well as an embedded view of how the HTML would look in a browser (IE, mind you, but that’s what I’m developing for in this case anyway).

Additionally, it has a nice little XPath debugger, kind of like The Regex Coach is for regular expressions.

It’s lightweight, free, and does one thing really well. Exactly what I needed.

The MSDE

One of our clients has started working with some software that uses the MSDE — the Microsoft SQL Desktop Engine. This is a stripped down, black-box version of SQL Server for people that need a database server but don’t want to pay for SQL Server nor need all its super-powers.

MSDE is stripped down in two ways.

  1. It comes with no client tools, like Enterprise Manager or Query Analyzer. It’s supposed to be a black box which your software knows how to operate, but you can’t see into.
  2. It’s performance limited. It has a “workload governor” that “stalls” the app for a few milliseconds whenever it’s fielding more than 5 - 8 connections (the exact number depends on the version).

The second limitation there may not be a big deal for an office scenario. In our case, the client has a dozen users that may or may not be connected at any given time, so the scalability limitations are never an issue.

The first limitation can suck, however. I don’t like black boxes, especially when it comes to data storage. I want to know what’s going on inside my database. To do this, I found a few tools that will do just this for the MSDE, and bring it ever so closer to SQL Server.

  1. DbaMgr gives you a really nice graphic interface into your MSDE install, including databases, tables, users, and access to automated backup. It’s free.
  2. msde.biz is a site that sells two tools and gives one away. MSDE Query is free, and lets you run ad hoc SQL on your install, a la Query Analyzer. MSDE Backup does just that for $18, and MSDE Admin gives you a little Enterprise Manager goodness for $22.
  3. SQL Buddy doesn’t appear to be under active development anymore, but it’s open source.
  4. MSDE Manager might be a Cadillac of MSDE tools at $79. The screenshots look great.

Microsoft has a whole list of “MSDE partners” here.

So would the MSDE work for Web sites? I don’t know — it’s certainly a step up from Access, but I don’t use Access anymore either.

Another question about MSDE: with the advent of SQL Server Express — the free version of SQL Server — is the MSDE still going to hang around? Or are its days numbered? I can’t find anything on “MSDE 2005,” and if you Google for that, Microsoft has purchased AdWords to direct you to SQL Server Express, which would seem to indicate that it’s a dead product.

Finally, for the record, while I’ve never been a big fan of Microsoft for server technologies, I’ve always liked SQL Server. My first official Microsoft training was an MCDBA track on SQL Server 7.0, and I’ve used the information I learned from that class six years ago over and over again since then.

Microsoft may not do everything right with servers, but SQL Server is a fine product, no matter how much you hate Redmond.

Book on Hierarchies in Databases

Joe Celko’s Trees and Hierarchies in SQL for Smarties: A book devoted to nothing but representing hierachies and trees in relational (SQL) databases. Am I a dork because I want it?

In this book, Celko illustrates several major approaches to representing trees and hierarchies and related topics that should be of interest to the working database programmer. These topics include hierarchical encoding schemes, graphs, IMS, binary trees, and more.

I paged through it at the bookstore. It’s heavy on the nested set theory, rather than parent-child.

Oracle 10g Express Edition

There can be no doubt now: Oracle is trying to destroy MySQL. Consider:

  1. They buy Innobase, the company responsible for the back end data format that makes MySQL go.
  2. They forge a partnership with Zend, the company responsible for PHP, which is the lanuguage most often used with MySQL on the Web.
  3. Now, word via Slashdot and CNet, is that they’re going to offer a free “Express Edition” of their 10g database.

They’re not even trying to hide the fact that they’re gunning for MySQL:

The new low-end edition is aimed squarely at free and open-source alternatives to Oracle’s namesake database, said Andrew Mendelsohn, senior vice president of Oracle’s server technologies division.

Here’s my prediction for a secondary effect —

This action will ultimately lower Oracle DBA salaries. Oracle always had a huge barrier to entry in its price and infrastructure requirements, so you didn’t run it unless you were in a big enterprise. Consequently, there are less Oracle DBAs in the world than other database platforms.

But now we’re going to see a huge wave of people who know how to make Oracle go on a small scale. Companies will start hiring these people for less than a full-blown Oracle DBA salary, and either use them for smaller Oracle installments, or “train them up.”

Oracle Buys Innobase

Oracle buys Innobase. MySQL between rock and hard place?: This is interesting. Oracle bought Innobase, which is a close partner of MySQL, that company having used InnoDB as a storage architecture for years.

MySQL is now faced with the prospect of licensing technology they cannot ship without from their biggest rival. Interestingly, there’s always been once piece of the InnoDB puzzle that’s not available under the GPL: the InnoDB Hot Backup Tool. Without it, database administrators cannot backup their InnoDB tables without shutting down MySQL or at least locking out all transactions.

Oracle just bought themselves a whole lot of leverage with MySQL AB and a talented team of database engineers to boot.

I’ve always wondered why MySQL AB didn’t buy Innobase Oy years ago. It always made complete sense from where I sat. But I’m hardly an insider when it comes to the relationship between those companies. Needless to say, that relationship just got far more “interesting.”

Fabian Pascal Loves Me

DATABASE DEBUNKINGS - HOME: My dreams have come true — I am Fabian Pascal’s “Quote of the Week.” He took this quote from the rip roarin’ argument we had a few weeks ago. I said:

What I need in a database is place where I can persist and retrieve data. Those are my requirements — let me persist data somewhere, then let me get it back when I need it, in the form that I need it. Relational databases — whoops, I mean SQL databases — have allowed me to do this for years. And how exactly can that be so bad?

Dear Fabian uses these quotes as a supposed method of pointing out stupidity, but I utterly and completely stand by everything in the quote above. What I said was true then, it is now, and it will be in the future no matter how much Fabian wishes it not to be so.

Something even cooler: I’m number six on Google for “Fabian Pascal.” Fabian has been fantastic for my traffic.

(Note: the quote rotates, so the link above won’t show the quote forever. Eventually we’ll be relegated to the past quotes page)

Database Platform Comparison

Comparison of relational database management systems: I found this Wikipedia page at the bottom of the SQL comparison I linked to yesterday. It’s a comparison of major database platforms and their support of various features. It doesn’t get into the SQL like the other page, but there are more platforms to look at and more high-level features covered.

I’m surprised by how bare-bones MySQL is. I knew it was stripped down, but it looks worse when you see it laid out like this.

Common Database SQL Support

Comparison of different SQL implementations: This is a phenomenal page listing how the different database server platforms handle standard SQL tasks, for instance row limited, ORDER BY, NATURAL JOINS, etc. Extremely well-researched.

The following tables compare how different DBMS products handle various SQL (and related) features. If possible, the tables also state how the implementations should do things, according to the SQL standard.

Just reading this will teach you something you didn’t know about your preferred platform.

Fabian Pascal is Smarter Than Me

The legendary Fabian Pascal showed up yesterday to tell us all that we were stupid over in the relational data model post. Specifically, his comment was:

None of you know the relational model, which is why you think current products are relational, which they are not.

This whole thread is nonsense, because you never defined what “best” means, and “best for what?”.

He has a point, certainly, even if he was a jerk about it. I went to his site — dbdebunk.com — to see what he was talking about, and it seemes Pascal is a serious database bada**. He’s old school, from the street, yo. And all us playa’s are just frontin’ ‘bout our database skillz. He’s the Herbert Kornfield of relational databases.

Some quotes from the site:

[Current DBMS] deficiencies are, it seems to me, directly due to the widespread lack of understanding (not least on the part of vendors), of fundamental database principles.

A lot of what is being said, written, or done in the database management field—or whatever is left of it—by vendors, the trade press and “experts” is irrelevant, misleading, or outright wrong. While this is to a degree true of computing in general, in the database field the problems are so acute that, claims to the contrary notwithstanding, technology is actually regressing!

He’s written some books, along with his partner, C. J. Date. Date, for his part, wrote “An Introduction to Database Systems” which is one of the seminal books on the subject.

I’m going to sit down and do some reading over the weekend to try and figure out where I’m so mistaken, but it seems that most of the site consists of making fun of people who aren’t as smart as Pascal and Date — which is about everybody, and certainly myself, I’ll freely admit. Their quotes page is full of people they apparently consider to be idiots.

I can’t put it any better than Pascal’s own Wikipedia page:

He is known for his extremely vitriolic criticisms of almost all DBMS vendors, users, and experts, on the basis that they do not subscribe to a pure form of the relational model of database management

Inheritance in PostgreSQL

PostgreSQL 8: Inheritance: Joseph Scott — a champion of PostgreSQL — pointed this little feature out in a comment to my post on the relational data model.

It’s a method in PostgreSQL of subclassing tables. You can create a table that would have a one-to-one relationship with a “parent” table if the relationship was done manually via foreign keys. But Postgres maintains this relationship internally with no visible reference in either table to the other table — it’s like creating a view for each parent-child instance.

Here’s the example from the page:

CREATE TABLE cities (  
name            text,  
population      float,  
altitude        int     -- (in ft)  
);  

CREATE TABLE capitals (  
state           char(2)  
) INHERITS (cities);

So I can insert stuff into “capitals,” referencing fields in “cities.” “Capitals” will contain all the fields that “cities” does, plus its own fields — it’s a subclass, automatically represented in SQL and the underlying table structure.

I’ve heard that DB2 does this too. It seems like it’d be very nice and save lots of code. This makes me want to create my object model in stored procs instead of PHP or whatever. Perhaps that’s unworkable in practice, but it would be more elegant from a data perspective.

Is the Relational Model the Best Model?

Is the relational model of data storage the best, most effiicient way to store data? I’m talking about the traditional database model of tables, fields, row, foreign keys, etc.

What are the other ways? There’s object oriented, where you have a table of classes and attributes, object instances and properties (the “key-value” table). Then there’s XML and the document-centric model. I’m sure there are others, but I’ve worked with these two so they jump out at me.

Sometimes I think that these methods are just ways to impersonate the relational model without the problems of a rigid data structure. Because, in my experience, the relational model is, without a doubt, the easiest to store data in and query, primarily because SQL has been designed around it.

In contrast, the object-oriented method is inferior in terms of inserting and updating data (you have one table row for every object property, so you have to do multiple updates and inserts) and in terms of querying data (you have to use temporary tables like crazy to iterate through different filters to get your results).

It’s easy to store things in XML, but searching has problems, primarily that there aren’t good and simple systems around for managing the data in aggregate. Yes, XPath will let you get information from a single XML document, but what if you have 10,000 XML documents in a directory? Find me all with a “author” tag equal to “Hemingway” — this is trivial with a relational database, but with XML, not so much. (Are there good systems for this yet? I’ve been complaining about this for a long time.)

So why do we ever entertain storage methods other than relational? One of the big reasons, I think, is that other methods make it easy to maintain data models. The object-oriented method, for instance, lets you create new “classes” by inserting table rows, rather than by changing the actual table structure. And XML is even more flexible. For all its advantages, relational databases have rigid structures.

So, again I ask: is the relational model the best way of storing data? If maintaining the data model (designing and changing the defined tables and columns) was abstracted away to be utterly trivial, would you consider anything else? Have you had a decision recently where you specifically chose another method over the relational method? Why? What was the application that made you think another way of doing things was better?

This question has come up in my mind because I keep running into systems that don’t do things relationally, and instead have many different ways for impersontating the relational model. For example —

I was on a conference call the other day with a vendor who sells an enterprise-level CMS that relies heavily on XML. I was trying to get him to explain to me how I could create a “book” object (for example), and just link to an “author” object instead of embedding the author data in the book object. I found myself saying, “…you know, like if I just had a foreign key in a relational database.”

Relational data models are great reference implementations because all developers understand them. And do we understand them simply because they’re generally the best way to do things? I’m curious.

Databases and Trust

Here’s something I’ve learned over the years: when modeling data to build a database, be very careful what fields you decide to include. Don’t throw in extraneous fields just because “someone might want to store that piece of information someday, and it’s no big deal to include it…”

It is a big deal, and here’s why —

When building and populating a domain of data, users need to trust it. They need to believe that the data in the data store is good, solid data. If you’re stingy with your fields, and you only include the fields that people are going to use, then most of the data will be good data, users will know this, and they’ll trust it.

However, if you just start throwing in fields willy-nilly, and a lot of them don’t get used, your users are going to develop a trust problem. One day, one of them is going to depend on that field — they’re going to do a search on it, or they’re going to pull a report based on it, and their search or their report is going to suck because no one uses the field. Then they start to wonder if they should trust any of the data…

Here’s an example —

You work for a construction firm, and you’re building a comprehensive database of subcontractors in the metropolitan area — electricians, plumbers, etc. Your boss gets angry one day because he’s been let down by a subcontractor one too many times, so he asks you to add a field for “Reliability” to the subcontractor database.

You think this is a silly idea, but who cares, right? If they use it, fine. If not, no harm. You add the field, and a few people use it to quantify a few subcontractors’ reliability. Most don’t use it, but a few do, and your boss is happy.

However, one day a VP in your firm searches for subcontractors with a “Reliability” rating of “High,” and he only gets back the precious few subcontractors for whom the field has been set. He trusts this data for a while, but then slowly realizes that he’s only getting a list of seven plumbers when he should be getting a list of 100. This is because there are 93 plumbers for which no Reliability rating has been set.

Your database has just been wounded. This VP is now suspect of your data. It’s let him down once, and when he begins to understand the mechanics of why, he’s going to wonder what other fields in the database are unreliable too. Since it’s hard to get an “overhead view” of a database, whenever he gets back results that he doesn’t expect, he’s going to start thinking, “Well, I bet this is another one of those situations. This database sucks. I’m going to keep my own subcontractor list in Excel.” And he will.

This is how databases begin the downhill slide into non-use. The usage and adoption of your database depends on your users trusting it. If they start to distrust it, you’re in a world of hurt. So be stingy with the fields you add — defend the integrity of your database — so you don’t make it easy for your data to spoil.

Firebird Database Server Penetration

‘Most important ever’ MySQL reaches beta: I found this stat in this article about MySQL 5.0, and I’m amazed by it:

[MySQL] accounted for 40 percent of open source database deployments, while Firebird and PostgreSQL accounted for 39 percent and 11 percent of deployments respectively.

Firebird has a 39% share of open-source database deployments? One percent behind MySQL? That’s amazing, as I thought it was still a hidden gem.

We’ve talked about Firebird here and here. Great database platform, by all appearances. I’m moving an enterprise system I built off of SQL Server, and I made the decision to put it on Firebird, so know that you’ll be hearing much more about it in the coming months.