Gadgetopia: Web Site Management

 This channel has it's own RSS feed at this link.

Gadgetopia Channel

Web Site Management

May 27

Mint

Mint: A Fresh Look at your Site: This is just a quick shoutout for Mint – no, not the financial management app, I mean the Web analytics app.

Mint is an extensible, self-hosted web site analytics program.

I have a client with an extensive intranet.  So, Google Analytics was out because no one can get outside the firewall.  What to do?

They started looking at some “official” intranet analytics packages, but most of them had price tags over $20K.  So, I talked them into Mint for the heady price of $30.  We loaded it up with a half-dozen plugins (I think one of them cost $15), and they’ve had a blast with it for the last 18 months.

The depth of the plugins is great (plugins are called “pepper” – pepper-mint…get it?).  We’ve installed:

And about a half-dozen other ones.  In particular, we didn’t install this one, but I want to just because of the name:

Holy Crap! is a pane-less Mint pepper that sends an email alert when your website receives a referral from the front page (or popular section) of traffic generating sites like del.icio.us, Digg, or…

We even wrote our own plugin.  They wanted to ignore traffic from a couple workstations, so I wrote a plugin that did reverse DNS on every incoming IP and eliminated those coming from a list of hostnames the client manages in their CMS (every five minutes, a job runs which gets an XML feed from Ektron, then writes this to a PHP array, which is brought into the plugin code via “require” – Rube Goldberg can suck it!).

The API is flexible enough that you could write essentially anything.  Browse the list of third-party plugins sometime, and you’ll get a feel for what people can come up with.

If I have a gripe, it’s that it’s a single user system and swears it will never be anything but (they’re big on the “less is more” philosophy that often winds up being “less is…less”). That’s a drag because with multiple users and permissions on certain reports, they could charge 10x what they do and still do great business in the smaller intranet market.  Positioned right, my client would have gladly paid $1,000 for something like this.

Another gripe that a lot of people could have (on the intranet side) is that it’s PHP/MySQL.  By the grace of God, my client happened to have a RedHat VM lying around we could use, but a lot of companies wouldn’t.

So, if Google Analytics has you feeling a little constrained and you like to tinker, try Mint out.  With the plugin API and some ingenuity, you can get some amazing control and understanding of your site traffic.


Jun 4

Improvements to the Robots.txt Protocol

One Standard Fits All: Robots Exclusion Protocol for Yahoo, Google and Microsoft: Google, Yahoo, and Microsoft have gotten together and actually agreed on extensions to the REP — the Robots Exclusion Protocol, otherwise known as your robots.txt file.

For instance, they’re going to allow a new META tag: NOSNIPPET:

Tells a crawler not to display snippets in the search results for a given page.

How about NOARCHIVE:

Tells a search engine not to show a “cached” link for a given page.

Plus, you can have wildcards in URL patterns in robots.txt now, which is something people have been after for years.

And Yahoo has taken it one step further with what I absolutely think needs to be done for every search engine. You can put a CSS class on any element called “robots-nocontent.” Yahoo will strip this before indexing.

[…] webmasters can now mark parts of a page with a ‘robots-nocontent’ tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors. We won’t use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results.

This has been available in localized search systems for years. I’ve used it in with both Swish-E and the Google Mini. It’s a great way to make sure that search engines don’t hit on irrelevant content, but instead focus on the core content of the page.

But, Joe brought up this point: shouldn’t navigation count? If I have a term in my nav, shouldn’t I get credit for this? If this is true — which it probably is — where do you draw the line? How do you decide what parts of the page are not “index worthy”?

Additionally (and cynically), why do you care for public search? The most basic SEO strategy is one of selfishness — you want every search hit, regardless of how relevant it is. For you to exclude content on your page just out of altruism or a desire to make general search results better, is just not likely.

What Yahoo needs to do is provide a benefit to doing this. If they explained that by doing this, you’re increasing keyword intensity by removing garbage words and thus making your keywords a larger proportion of total words, perhaps that would help. But there has to be an SEO advantage or no one is going to do this.

Via David Gammel


Nov 17

Making Wordpress Zippy

wpcache2.gifOver the last couple of weeks, we’ve had fun working with the gang at Federated Media to put together their Holiday Gadget Guide. Deane’s one of the contributing authors, so I’m sure he’ll post a little more about it later on.

It was a fun little site to put together and our team had a good time, but reality hit hard last night when BoingBoing posted it, traffic poured in, and everything slowed to a crawl. It became obvious that The Long Tail you hear about on blog posts can be connected to a very large dog, and Wordpress wasn’t keeping up with demand.

Enter a truly excellent Wordpress Plugin, WP-Cache2. WP-Cache2 installs into wordpress, walks you through all of the setup via the admin interface, and provides a friendly, easy-to-use caching system for Wordpress. With that, plus a few server tweaks, we were able to get things humming along again in no time. Definitely one for the bookmarks file if you run a Wordpress site and ever worry about a SlashDotting.


Aug 28

The Pointlessness of Page Views

Pageviews are Obsolete: It’s about time that this point is evangelized. Page views are a fairly pointless measurement these days with the advent of Ajax, RSS, and widely varying site designs which can have dramatic effects on how “hungry” a site is for page view stats.

But Ajax is only part of the reason pageviews are obsolete. Another one is RSS. About half the readers of this blog do so via RSS. I can know how many subscribers I have to my feed, thanks to Feedburner. And I can know how many times my feed is downloaded, if I wanted to dig into my server logs.

But what do you replace this statistic with? Via Boing Boing.


Nov 21

Silktide Sitescore v 1.7.2

Silktide’s Sitescore is kind of a neat tool. Plug in your website, and it gives you a 1-10 score on…

How well marketed, and popular the website is.
How well designed and built the website is.
How accessible the website is, particularly to those with disabilities.
How satisfying the website is likely to be.

Gadgetopia did pretty well with 8.2 points. Marketing was 9.6; design, 9.8; and experience, 9.7. The one aspect that hurt us was accessibility, which topped out at 5.6 points.

This website appears to be in violation of the British Disability Discrimination Act. All pages were found in violation of the the current W3C Web Content Accessibility Guidelines.

This website is probably unlawful in Britain from the 1st October 2004. The British Disability Discrimination Act makes it unlawful to discriminate against a disabled person by refusing to provide any service provided to members of the public - including websites.

Careful, or we might find ourselves locked up in the Tower of London. Since Silktide is a web development business, I’m sure they’d be willing to help us fix this little deficiency, for a tidy sum.


Nov 14

Google Analytics

Google Analytics: All your traffic are belong to us.

Google Analytics tells you everything you want to know about how your visitors found you and how they interact with your site. You’ll be able to focus your marketing resources on campaigns and initiatives that deliver ROI, and improve your site to convert more visitors.

Via Joseph Scott.


Nov 4

Spiders are Stupid

I’ve been monitoring the 404s on this site. I changed our URL pattern a while back, so I have a page that catches all the 404 and resolves the old pattern against the new one, then redirects. Anything that doesn’t resolve gets logged and I have an RSS feed where I can watch them all.

Which brings me to my point: Web spiders are pretty stupid. Ninety-nine percent of 404s to this site are from spiders. They’re looking for URLs that:

  • …that they couldn’t possibly have derived from any other page on the site.
    Oftentimes they screw up relative vs. absolute URLs. I usually go check, just in case I forgot to put “http://” in front of something, but I usaully find everything is in order and it must just be the spider that’s confused.
  • …existed a long, long time ago.
    I still get spiders coming in for pages with URLs that haven’t been around for three years. They must have them stored somewhere because every once in a while I’ll get about 300 consecutive requests from the same spider for the same old pattern, like it was reading them from a file somewhere.
  • …are obviously munged.
    Spiders truncate a lot, or insert random spaces in URLs. I finally modified my lookup script up to remove spaces from the target URL first, and, if it can’t find what the want, try to match what they ask for at the front of a string, so I can catch truncations.

I’ve also noticed a lot of one-off spiders that I’ve never seen before. They come out of colleges a lot, it seems.

And, of course, there are hack attempts galore. Trying to hack the XMLRPC vulnerability that was revealed a few months ago is pretty common, and I get scads of long, long requests for things in ”_vti” directories.

That said, monitoring your 404s is a really handy thing to do as it alerts you to a lot of problems. We have over 4,500 entries now, and by watching bad requests, I find out all the time about bad links, missing images, etc. It’s really a good, simple way to give you an extra leg up on fighting content rot.

But don’t think the spiders are the smart ones. You’d think since they were programmed by (supposed) professionals, and have everything in a database somewhere, that they’d be pretty on top of things. My experience, however, indicates that a bunch of two-year-olds mashing on the keyboard would probably come up with more valid URLs than your average Web spider.


Oct 3

Gadgetopia Screen Resolution Stats

I’m using a new stat tracking app called Mint. It has a plug-in called “User Agent 007” (ho, ho — what wit) that captures browser stats. Interesting stats:

  • almost 90% of Gadgetopia visitors are at 1024 x 768 or greater
  • almost 20% of visitors are at higher resolution than 1024 x 768

Here’s the entire list of resolutions and their penetration. At what point do you abandon the 800 x 600 crowd and start using the extra space for those that have it, I wonder?

But even with more space, you always run into the problem that “screen resolution” and “browser viewing area” are two very different things. With the sidebars that people run these days, you don’t get anywhere near the full width to work with. I have my resolution at 1024 right now, but I think I use 200 pixels in the bookmark sidebar in Firefox.


Sep 13

Email All Your Users Day Re-visited

Email All Your Users Day: I posted this two years ago today. I still think this is a great idea.

[…] I hereby proclaim December 1 as “Email All Your Users Day.” On that day, everyone who runs a service that has user accounts should email ALL their users to remind them they have an account, what the account is for, and where the login screen is. The member can then decide what he or she wants to do with it.


Jul 18

Do Anonymous Domain Registration Outfits Actually Work?

About anonymity …: Think you’re safe if you register your domain name “anonymously”? Apparently not:

Despite paying Domains by Proxy an additional fee to register foetry.com anonymously, they responded to a letter from a personal injury lawyer, and canceled my registration without notifying me of a complaint. Let that sink in: a personal injury lawyer’s letter is all it took for DBP to cancel my anonymity. Furthermore, the attorney’s ignorance of Internet Law didn’t even phase Domains by Proxy. (I have a copy of the attorney’s letter and I know more about Internet law than he).

Boise State University Professor Janet Holmes, simply hired the lawyer to write a letter. That’s all. There was no subpeona. No chance of a case against me. Domains by Proxy never emailed me and never telephoned. They simply canceled the anonymity and my confidential information suddenly became available. My initials, my address, and my phone number became freely available to anyone with an internet connection.

Via Metafilter.


Jul 18

Robots.txt Survey

Robots.txt, The Big Crawl: These guys grabbed 75,000 robots.txt files, and found a few problems:

[…] we found a wide array of problems with peoples robots.txt files. We found more than 5% of the robots.txt used bad style and up to 2% were so badly formed that they would not be recognized by any spider.

One of the most common mistakes is backwards syntax […] A large number of people had multiple directories per line […] Another common mistake, is editing your robots.txt in DOS mode

Not only do they tell you the problems they found, but they explain how various spiders would interpret the problems. Some of the “problems” are correct per the spec, but spiders don’t always follow the spec…


Jul 17

Unresolved 404 Patterns

I changed the URL scheme of this Web site over the weekend. I had been meaning to do it for a while, but some problems with Movable Type 3.2 kind of forced the issue. (I have got to stop rushing into every beta that presents itself…)

To make everything backwards compatible, I built a simple redirect system — I have a table in the database with every single permalink from the old site (all 9,000 of them — including entry RSS feeds and category pages) mapped to every single new URL.

If someone looks for a page which has moved, the 404 page does a lookup on this table, “resolves” the old URL against a new one, then redirects with a “301 Moved Permanently.” It seems to work well.

A side benefit of this system is that I can watch for “unresolved 404s,” meaning 404s that were not in my lookup table — a genuine 404, if you will. I’ve noticed some interesting phenomena:

  • I get hammered by referrer spam. We’ve talked about this before — this is spam created by a bot hitting any page with a fake referrer string in the hopes that you’re displaying your referrers on this site (a la Dean Allen’s Refer or similar tool).

    This results in fully half the unresolved 404s on this site coming from casino bots hitting URLs that are three years old. I know they’re that old because they use the very first URL scheme I had for this site — the default Movable Type archives URL: “archives/000355.html”, etc.

    They must be working off a very old list of URLs, which I find quite funny, and quite interesting. Why would they keep an old list of URLs lying around? Why not just re-spider? Do spammers sell lists of URLs like they do lists of emails?

  • Browsers and spiders sometimes mangle HREFs. I see impossible URLs that can only result from a mis-interpretation of the HREF in the link. IE 5.x on the Mac, for example, has problems with background images coded in CSS. The see that browser try to get this a lot:

    /'/bin/images/header.jpg'

    It's just mangled the URL of the image.

    Others, however, are more mysterious. Just two minutes ago, a spider tried to access a URL that it could only have hit if it missed the leading "/" in the HREF. Coming from this page...

    /2005/07/09/IsPerlStillRelevant.html

    …the spider tried to hit:

    /2005/07/09/4131

    I just checked that page and there’s no way it pulled that URL out of the code. The correct URL was…

    /4131

    But the URL it bounced off of could have only happened if it had a bug of some kind or if the HTML got mangled on the way down.

    I also get hits to things like this:

    /2005/04/15/EasyJavaScriptAutocompleteI

    No mystery here — that’s just a truncated version of this:

    <code>/2005/04/15/EasyJavaScriptAutocompleteIntellisenseScript.html</code>
    

    Truncation, it seems, happens a lot. The Ask Jeeves/Teoma spider, for instance, has been trying all day to get at URLs that are all truncated at 39 characters. Add “http://www.gadgetopia.com/” to that, and you get 64 characters.

    Why is this, I wonder? Was that the size of the database field they stored the URL in? More importantly, does it explain why I’ve never done so well in that index? I’m wondering now if my previously-long URLs have hurt my engine placement in other indexes besides Google.

  • As implied by the preceding two points, the vast majority of 404s are from bots. I’m sure this is true for all sites, but I never realized it so much until now.

  • Hack attempts abound. There are lots of attempts to hit DLL files in the (non-existant) “MSOffice” and “_vti/” directories. These are people trying to hack Outlook Web Access and various Web-enabled Microsoft Office technologies.

  • Spiders don’t crawl and index in the same pass. I changed the URL pattern late Friday night, then changed my mind about pattern to use when I woke up the next morning. This means the site was accessible under a certain pattern for about eight hours.

    In the following 48 hours, I saw attempt after attempt by bots to get to files under that pattern. This tells me that a crawler made a pass at the site during that eight hour window and stored the URLs it found. Then an indexer used that list to come back through the site a day later and index the text (sadly, in this case, the pattern had changed — I’ve since put in a RewriteRule to catch those).


Apr 1

AdSense and Borders

If you have Google Adsense on your site, here is the best piece of advice I can give you: don’t put borders around your ads. I had a border around my skyscraper banner on the right here, so it sat in its own little box.

A friend told me to take the border off. I figured it couldn’t hurt to try it, so I made the border white, so it just fades into the background. Nothing else was changed. I did it in the middle of the month, so the first half was with the border, the second half without.

The result in terms of clickthrough rate? A seventy percent increase.


Mar 7

Email All Your Users Day Re-Visited

Email All Your Users Day: This is one of those posts that I think never got the attention it deserved. I still maintain that this is a good idea, and now that Gadgetopia has some more traffic, I re-submit it to the blogosphere. Who’s with me?

I hereby proclaim December 1 as “Email All Your Users Day.” On that day, everyone who runs a service that has user accounts should email ALL their users to remind them they have an account, what the account is for, and where the login screen is. The member can then decide what he or she wants to do with it.


Jan 27

Snazzy 404s

404’s 4 U: this post over at Metafilter has some great links to various 404s. I liked the Zork one.

404 Research Lab . Not that I’m sorry for the double post, but I was inspired by this 404 and went searching for some more. Some of them are funny, some let you play games, some are just creepy.


Want to advertise on this site? Contact FM.
Laser Toner Cartridges UK laser toner, toner cartridges, hp toner, lexmark toner, samsung toner, canon, toner, epson toner, oki toner, kyocera toner, xerox toner, remanufactured toner, compatible toner
Direct TV Deals Free 4 room direct tv deals. no equipment to buy. free fast professional direct tv installation. this is the best direct tv deal available anywhere.
SEO Article Learn from the experts with our SEO article.
rope light Shopping with birddog distributing, inc., gives you access to the lowest prices, the best customer service and the quickest delivery times possible.
Laptop AC Adapter We offer genuine factory direct replacement AC adapters.
Direct TV Best satellite TV deals.
Direct TV Deals Direct TV programming deals are varied and include packages containing from 50 channels up to over 250 channels.
8mm film to DVD Retain family memories with the only frame by frame digital restoration service in the United States for your 8mm film to DVD today
Rubber Stamp Shop for custom self-inking stamps, hand stamps, address stamps, label stamps, check endorsement stamps, check deposit stamps, date stamps, pre inks, pocket stamps, ink and much more!

1