Programming and Web Development

RSS feed for this category only.

The Peril of Self-Replicating Hyperlinks

I built an intranet for a client. One of the functional items is a viewer into an Exchange calendar. We use a handy third-party component to display the contents of an Exchange public folder on a page.

The month and year to be viewed is driven off the querystring. Something like:

/month.aspx?m=11&y=2010

So you can look at any month by writing your own querystring. We check for valid input and everything, but so long as you enter a valid month and year in the querystring, you can (could) look up any logical month in existence, as far ahead or behind as you want.

Each month has helpful “Next” and “Previous” links on it that form the URL for the next or previous month.

Sadly, we’re also indexing the intranet via a Google Mini.

Astute readers will see the problem here…

Two things happened:

  1. The number of pages in the Mini spiked. The client was suddenly hitting their document limit. They only had about 10,000 actual pages of content, but the Mini was claiming it had indexed four or five times that number.

  2. We started to get reports about odd months being returned in search results. Months like “November 2609” for example…

The Mini’s crawler, bless its heart, was dutifully following the “Next” and “Previous” links in the calendar into infinity in either direction. It was, in effect, inventing its own URLs…forever. Every new page in the calendar gave it a new URL it hadn’t seen before. The Mini’s crawler had fallen down the rabbit hole.

Easy problem to fix, but an embarrassing oversight nonetheless. We now drop the “Next” and “Previous” links at 24 months out in either direction, and we throw a 410 for anything outside those bounds in the past, and a 404 for anything outside those bounds in the future.

I just checked today, and the number of pages in the Mini came down 2,000 yesterday, as it rechecks out-of-bounds URLs and gets back 410s and 404s.

I wonder how many sites on the public Internet have this same problem? I wonder if crawlers have any logic to detect this?

410 Gone

HTTP Error 410: Gone: I found this page today when searching for a refresher on the 410 status code. It means “gone.” Forever. Not just “not found” right now, but forever more. Gone, baby.

We should use status code 410 more.

As far as I can tell, it’s the forgotten stepchild of error 404 (Resource not found). Error 410 means Resource gone, as in, a resource used to exist at this location, but now it’s gone. Not only is it gone, but I don’t know (or I don’t want to tell you) where it went. If I knew where it went, and I wanted to tell you, I would use error 301 (Permanent redirect) and any smart client would simply redirect to the new address. But 410 means Resource gone, no forwarding address. Train gone sorry.

Sam Ruby brings up a good point about when it would be a great time to use 410.

[…] you would think that any decent aggregator would respect a 410, wouldn’t you?

I’ve long since removed my scraped feed, and marked it gone.

Despite this, a number of aggregators continue to relentlessly poll for changes.

I had this same problem when I removed the RSS feeds for comments on individual posts:

[…] this brings up the question of how to notify people that I’m doing this. How do you notify RSS users? The solution is pretty simple I guess — I’m going to tack an entry to the top of this XML file that just explains that we’re not maintaining these feeds anymore, rebuild them all one last time, and then leave the files out there long enough for everyone to ping them again.

But when do you pull the files? When I delete these 4,600 XML files, do I redirect requests for them to another feed with a single entry that explains that they’re gone? Are there best practices for this?

stackoverflow.com

stackoverflow.com: With both Spolsky and Atwood involved, I suspect this will be huge.

Jeff Atwood and I decided to do something about it. We’re starting to build a programming Q&A site that’s free. Free to ask questions, free to answer questions, free to read, free to index, built with plain old HTML, no fake rot13 text on the home page, no scammy google-cloaking tactics, no salespeople, no JavaScript windows dropping down in front of the answer asking for $12.95 to go away. You can register if you want to collect karma and win valuable flair that will appear next to your name, but otherwise, it’s just free.

The no registration thing fits. Joel’s forums have been no-registration for years, and it’s worked out great — good content, good answers, very little cruft.

Coding Horror Donates to Open Source

Donating $5,000 to .NET Open Source: Last year, Jeff Atwood from Coding Horror promised to done some money from his advertising to open-source projects. He followed through on that today with a $5,000 check to the ScrewTurn Wiki project.

This is like one of those giant promotional checks you see on TV; it’s for promotional purposes only. There’s no real check. The actual money is being sent via wire transfer to Dario Solera, the ScrewTurn Wiki project coordinator. What’s Dario going to do with this money? You’ll have to ask him. That’s not for me to decide. There are no strings attached to this money of any kind. I trust the judgment of a fellow programmer to run their project as they see fit.

This is kind of cool for a number of reasons. First, it’s just neat. Second, he singles out an open-source, .Net project, which isn’t common. His rationale:

[…] open source projects are treated as second-class citizens in the Microsoft ecosystem. Many highly popular open source projects have contributed so much to the .NET community, and they’ve gotten virtually no support at all from Microsoft in return.

Would you like a side of irritation with that?

Getting into Momofuku Ko: There’s apparently a super-hot restaurant in Manhattan that only takes reservations on the Net, no exceptions. Open tables are posted at 10 a.m. and are often gone in 2 or 3 seconds. Anything happening that fast on the Web invites concurrency issues:

When the pick-a-time page is downloaded by a particular browser, it’s based on the information the web server had when it sent the page out. The page sits unchanged on your computer — it doesn’t know anything about how many reservations the web server has left to dole out — until the person clicks on a time.

High-fallutin’ New York diners are pissed.

In a nutshell, the would-be patron said (and I’m paraphrasing here), “your system is unfair and broken,” and the folks at Ko replied, “sorry, that’s how the internet works”. The comments on the post are both fascinating and disappointing, with many people attempting to debunk Ko’s seemingly lame excuse of, well, that’s how the internet works.

An exchange between the restaurant and and irritated diner appears in a New York Times blog, and you have to sympathize with the diner who just doesn’t understand how the Web works, and can’t really be expected to.

While I understand your point, the fact that you can get in to the calendar (if you try for days and days and finally get lucky), try to select a time and then not have any available is ludicrous. Once into the calendar, you should have a lock on whatever times are available.

In his understanding, pulling the calendar up in a browser is like being at the head of a line, and no one behind you can do anything until you make a choice. Sadly, this isn’t how it works.

Single Site Browsers

Bridging Desktop And Web Applications - A Look At Mozilla Prism: Here’s an article about what are being called “Single Site Browsers” (SSBs), or little standalone browsers that let you browse and interact with a single Web app in a desktop app-ish environment.

Surf to Gmail, for instance, choose “Convert to Application” in the Firefox menu, and a shortcut with the Gmail icon appears on your desktop. Clicking the icon launches Gmail in its own window. Extensive customization options are available to add things like dock badges, system tray icons and popup notifications. Web developers can add special hooks to their code so that these bells and whistles are automatically included whenever someone spins the app off onto their desktop. Prism is still very much a work-in-progress, but it has already met with some early success; recent Yahoo acquisition Zimbra, for example, is using it to deliver a desktop version of their popular web-based mail client.

I’d like to take this second to say that I wrote about this years ago in a post called Owning the Container:

In a browser, remember, your page is only in the viewable area because the user has put it there. Your page can leave just as quickly — the user could reload it, they could click on a bookmark, etc.

A browser is a container. Your app is poured into one page at a time, and can just as easily get poured out, sometimes at inopportune times.

This leads to times when you need your app to do just a little bit more than a Web app can do. These are the times when you think, “Should I do this as a compiled, installed app instead?” But that’s a big leap — there really needs to be a middle ground.

Good to see some progress is being made here — the article profiles Mozilla’s Prism (which I use for Gmail), and several others.

(You can actually approximate the Prism experience with regular Firefox. It’s not hard to make Firefox launch under a separate user profile in which you shut off all the toolbars, etc.)

What I’d like to see in these apps is a way to give Web sites “superpowers” that they wouldn’t normally have, like access to your local file system or registry. While the security implications are daunting, this is one of the last miles to bridging the gap between local and remote applications.

PAMP is like LAMP, but for phones

PHP Apps on Mobile using PAMP: This looks really cool.

If you’d like to work on your favorite PHP apps on your S60 phones, here’s PAMP - Personal Apache, MySQL and PHP. This is implemented on the Symbian OS using Open C, which is a set of industry-standard POSIX and middleware C libraries for S60.

.Net Coders = American Tourists?

Are .NET Developers the American Tourists of the Software Industry?: This is an awfully good post that examines just why we all hat e.Net developers.

The same segment of the software industry that dislikes Microsoft also views developers who use Microsoft tools and languages as inherently less skilled and less capable.

And this is why, apparently:

Americans are inherently annoying because we rarely invest any effort into learning anything about the external world. […] I think a similar dynamic occurs with .NET developers who are so busy drinking from the firehose at Microsoft that they forget about the rest of the development world entirely.

I haven’t seen that with any of the .Net coders I know personally, but .Net in general does seem to incubate a certain about of fanboy attitude about stuff. You see a lot of ignorant forum posts like, “Why would anyone use anything else but .Net?”

Read back to our post about ASP.Net Web Forms. I said, and still believe:

[…] if you’ve never done any Web development except ASP.Net using Web Forms, then there’s a lot you missed about Web development.

[…] I’ve always maintained that there’s a difference between a “Web Developer” and a “[insert platform here] Developer”. If you’ve never done any work outside of ColdFusion, then you’re not a Web Developer, you’re a ColdFusion Developer. This is fine if you work in an exclusively ColdFusion shop, but you’re depriving yourself of a lot of learning by not digging into other languages and the core protocols in general.

Wiki Markup is Doomed

Wiki markup has no future: I really agree with this. Even though I love Markdown (indeed, Blend wrote the Markdown extension for eZ publish), there’s a large group of users who will never warm up to it and really shouldn’t have to.

Ok, I’m going to confront the elephant in the room: wiki markup has no future. I know I’m going to burnt at the stake by all the wiki fanatics, but let me give a few reasons…

[…] Back in the bad old days, you needed to know all those strange HTML tags in order to publish a web page. Recognising that this wasn’t desirable, we’ve worked very hard to develop publishing tools (FrontPage, Dreamweaver, content management systems) that eliminate the need for this knowledge.

[…] So along comes wikis, and voila, a new set of markup to learn!

DreamSpark

Microsoft DreamSpark: Microsoft is trying to hook new developers young by getting them in college with free development tools.

DreamSpark is simple, it’s all about giving students Microsoft professional-level developer and design tools at no charge so you can chase your dreams and create the next big breakthrough in technology - or just get a head start on your career.

The page gets a little condescending:

Now remember these are professional tools. This means they are pretty big files so make sure you have the bandwidth and space to bring them to your machine.

Don’t run with scissors, kids.

When Ethics and Code Quality Collide

Business software is Messy and Ugly: This post tosses out an interesting point.

One of the developers asked the question point blank: “What do you do when your managers tell you to make a mess?” I responded: “You don’t take it. Behave like a doctor who’s hospital administrator has just told him that hand-washing is too expensive, and he should stop doing it.”

This post extends the point a bit further.

Where I think the advice fits is where third parties — like users — are the ones who suffer the consequences. Doctors wash their hands because their patients’ health is at risk if they don’t. If a hospital administrator asks for a cover page on the TPS reports, a doctor cannot argue that she will not do it because it is inefficient. If it won’t kill a patient, the doctor must accommodate the administrator’s decision.

In the case of software development, if I am asked to develop software that is insecure and places private user data at risk, I will make the personal choice of saying no.

Put another way, crappy code becomes an ethical issue when third-parties are involved. As a programmer, is it your responsibility to refuse an order that would injure a third party? How sharp or blurry is that line?

Parrot

Parrot virtual machine: I found this in this post about the future of Perl 6. Parrot is something like Java’s JVM or .Net’s CLR, but for multiple, dynamic, open-source languages.

Parrot is a register-based virtual machine being developed using the C programming language and intended to run dynamic languages efficiently. It uses just-in-time compilation for speed to reduce the interpretation overhead.

There’s discussion of having PHP, Puby, Python, and Perl compile down to Parrot, so you could have one part of your app written in Ruby that calls a function library written in some other language.

IBM Thinks LAMP Apps Need to "Grow Up"

At the Rational User Conference in Vegas, a guy from IBM got all condescending about LAMP.

Businesses that run on the Linux, Apache, MySQL, Perl/PHP/Python (LAMP) model will have to “grow up” to avoid reliability issues in future, an IBM executive said.

According to Daniel Sabbah, general manager of IBM’s Rational division, LAMP — the popular Web development stack — works well for basic applications but lacks the ability to scale.

Oh, good Lord, are we still arguing about this? Seriously? I thought anyone who brought this up again was just stupid by definition, but my I will defer further comments to Ryan Tomayko who gets medieval on IBM’s supposed nirvana of the “physical three-tiered app.”

Great, right? Well, no. It turns out this is a horrible, horrible, horrible way of building large applications and no one has ever actually implemented it successful. If anyone has implemented it successfully, they immediately shat their pants when they realized how much surface area and moving parts they would then be keeping an eye on.

It’s a great read, from start to finish. Joe summed it up nicely with this:

It’s not like no one has even gone to a Web site and been greeted with “null pointer exception.”

The Hacks Behind Comet

The Future of Comet: Part 1, Comet Today: Remember Comet? This came close on the heels of Ajax, and is its twin. Whereas Ajax allows the browser to “reach back” to the server and say “hey, something happened,” Comet does the same for the server. It can now “reach forward” and notify the browser of something.

This article is a good introduction to the complexity and technology behind Comet. Honesty reigns supreme:

Comet is a giant hack. Even when it works flawlessly and efficiently, bringing us real-time interactive applications, deftly weaving around firewalls and browsers, avoiding unfriendly side effects, and cutting latency to near zero, it does so using mechanisms unforeseen by browser vendors, and unspecified by web standards.

It’s a good look at ingenuity, and there’s some information about how a standard is trying to develop around it.

How Many Can You Name?

The more you use CSS, the harder this is, because you find that CSS really scales down the number of HTML tags you use. I forgot very few really easy ones — about half of the tags I missed I had never heard of before.

49

(In case the image doesn’t show, I named 49 HTML elements in five minutes on this quiz.)

Note: if you cut-and-paste the HTML they provide for the image above, beware — they sneak a text ad in the bottom. Mine was for “Fort Worth Dating.” Tricky.

What is the future of JavaScript?

Mozilla, Microsoft reps argue over the future of web scripting: Interesting article on the future of JavaScript. Do you release a new version with incremental improvements, or do you scrap it in favor of something more robust like Python or Ruby.

Critics like Microsoft and Yahoo argue that certain characteristics of the language (particularly the prototype-oriented object model) make it impossible to add modern language features to ECMAScript without dramatically increasing the complexity of the language, breaking existing code, and creating new interoperability problems. Such critics believe that the focus should be on improving interoperability between existing ECMAScript 3 implementations and that modern scripting capabilities would be best provided by using a completely different scripting language.

Making Your App Faster on the Front End

Best Practices for Speeding Up Your Web Site: This is a phenomenal page at the Yahoo! Developers Network about how to speed up pages on the front end.

The “front end” means everything after code execution is complete on the server. As developers, we tend to concentrate on server-side rendering, but we’re missing over half the problem. The front-end encompasses things like client-side caching, page rendering speed, etc.

This gets ignored too much. I just completed an CMS install using a system that’s fairly slow. But by spending three or four hours really tuning the page delivery, I was able to make up a lot of time lost to server-side execution. In particular, I managed to get the browser to consistently cache 90% of the total page weight. That helps more than you know.

An example of the tips in the article.

The second problem caused by scripts is blocking parallel downloads. The HTTP/1.1 specification suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. (I’ve gotten Internet Explorer to download over 100 images in parallel.) While a script is downloading, however, the browser won’t start any other downloads, even on different hostnames.

In particular, the eTag spec is something I’ve seen but never quite understood. Using it, you can have the browser check if the page it has in cache has changed, and pull it from cache if it hasn’t. Rather than using Last-Modified dates (hard with dynamically generated pages), it actually calcs a checksum on the content.

The New York Times and Open Source

New York Times opens up code: This is kind of cool. It’s good to see an organization like the New York Times contribute some stuff to open source.

The New York Times likes open source — so much so that, as it gradually moves more of its print operations online, it is nurturing a Web development team that has released two of its own open source projects.

Understand, however, that the two things they did contribute are fairly technical. One is an XSL caching extension for PHP, and the other is a database abstraction layer used to replicate databases using JSON over HTTP.

ASP.Net and the Confusion of GET and POST

My loathing for ASP.Net has been well-known in these pages, but part of me has made peace with it. There are some things about ASP.Net that I very much like, and I promise I’ll post about them one day.

Today ain’t that day.

I will never accept the stupidity of the server-side FORM tag and the confusion of GET and POST. Never. It drives me nuts — ASP.Net encourages you to POST everything, and that’s just not right. It’s not the intention of the HTTP spec, and it kills usability in a lot of situations.

Case in point — the Applebees menu system displayed above.

I’m having a lunch meeting at my office today for my church IT committee. We have an Applebees on the next block, and I wanted to email everyone a link to the menu for this restaurant. But I can’t.

Problem is, the developer who built their site used a DataGrid control to display the restaurant listings, and he bound the link buttons in it to server-side events. So those buttons in the picture are actual form buttons which POST data, not simple hyperlinks that link to an “email-able” URL. So when I click the buttons, I get a generic URL that’s useless without POST data…which I can’t send in an email.

Here’s the URL:

http://www.applebees.com/MenuLanding.aspx

If you click that, it will just loop back to the search form (unless you’ve been there before, because it appears to save your last restaurant as a cookie or session value).

The thing is, I know to check this. I know to look in the URL and make sure it’s valid on its own. But how many other people do? I wonder how many people every single day email that URL to someone and get a response like, “That URL you sent didn’t work for me…”

Why, oh why, do people do this? Oh yeah, because ASP.Net encourages you to do it.

(Sad thing is, making this particular situation GET-friendly wouldn’t have been much more difficult. It appears to be a cross-page postback anyway, so how hard would it have been to have just put the menu ID in the querystring and make the link an actual hyperlink? It’s not like it’s posting back to itself.)

Yes, I know, other languages encourage you to do stupid things too, however they’re usually confined to server-side and code-centric stupidity which doesn’t affect the user so much. The difference here is that ASP.Net infiltrates the client way too damn much, and stuff like this results.

Microsoft Contributes PHP Extension

SQL Server 2005 Driver for PHP: Lo and behold, Microsoft has written a PHP extension. It’s a data access component for SQL Server…which PHP already has, so I guess I’m not sure how it improves things, but it’s pretty cool coming from Microsoft, anyway.

The SQL Server Driver for PHP is designed to enable reliable, scalable integration with SQL Server for PHP applications deployed on the Windows platform. The Driver for PHP is a PHP 5 extension that allows the reading and writing of SQL Server data from within PHP scripts.

Now, add this together with FastCGI support Microsoft announced earlier this year, and you have a compelling development platform on Windows (…but, we already had that, so I don’t know how this makes things better…still, it’s the thought that counts).