<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" 
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:content="http://purl.org/rss/1.0/modules/content/">

  <channel>
    <title>Gadgetopia: Web Site Management</title>
    <link>http://www.gadgetopia.com/Categories/Web Site Management.html</link>
    <description>This is a sub-feed of the main Gadgetopia RSS feed. This feed displays entries from the "Web Site Management" category.  The main Gadgetopia feed is available at http://www.gadgetopia.com/index.xml.</description>
    <dc:language>en-us</dc:language>
    <dc:creator>deane@deanebarker.net</dc:creator>
    <dc:rights>Copyright 2008</dc:rights>
    <dc:date>2008-06-04T11:35:04-06:00</dc:date>
    <admin:generatorAgent rdf:resource="http://www.movabletype.org/?v=3.35" />
    <admin:errorReportsTo rdf:resource="mailto:deane@deanebarker.net"/>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>


    <item>
      <title>Improvements to the Robots.txt Protocol</title>
      <link>http://gadgetopia.com/post/6417</link>
      <description><![CDATA[<p><a title="Yahoo! Search Blog: One Standard Fits All: Robots Exclusion Protocol for Yahoo!, Google and Microsoft" href="http://www.ysearchblog.com/archives/000587.html">One Standard Fits All: Robots Exclusion Protocol for Yahoo, Google and Microsoft</a>: Google, Yahoo, and Microsoft have gotten together and actually agreed on extensions to the REP &#8212; the Robots Exclusion Protocol, otherwise known as your robots.txt file.</p>

<p>For instance, they&#8217;re going to allow a new META tag: NOSNIPPET:</p>

<blockquote>
  <p>Tells a crawler not to display snippets in the search results for a given page.</p>
</blockquote>

<p>How about NOARCHIVE:</p>

<blockquote>
  <p>Tells a search engine not to show a &#8220;cached&#8221; link for a given page.</p>
</blockquote>

<p>Plus, you can have wildcards in URL patterns in robots.txt now, which is something people have been after for years.</p>

<p>And Yahoo has taken it one step further with what I absolutely think needs to be done for every search engine.  You can put a CSS class on any element called &#8220;robots-nocontent.&#8221;  Yahoo will strip this before indexing.</p>

<blockquote>
  <p>[&#8230;] webmasters can now mark parts of a page with a &#8216;robots-nocontent&#8217; tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors. We won&#8217;t use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results.</p>
</blockquote>

<p>This has been available in localized search systems for years.  I&#8217;ve used it in with both <a href="http://swish-e.org/">Swish-E</a> and the <a href="http://www.google.com/enterprise/mini/">Google Mini</a>.  It&#8217;s a great way to make sure that search engines don&#8217;t hit on irrelevant content, but instead focus on the core content of the page.</p>

<p>But, Joe brought up this point: shouldn&#8217;t navigation count?  If I have a term in my nav, shouldn&#8217;t I get credit for this?  If this is true &#8212; which it probably is &#8212; where do you draw the line?  How do you decide what parts of the page are not &#8220;index worthy&#8221;?</p>

<p>Additionally (and cynically), why do you care for public search?  The most basic SEO strategy is one of selfishness &#8212; you want every search hit, regardless of how relevant it is.  For you to exclude content on your page just out of altruism or a desire to make general search results better, is just not likely.</p>

<p>What Yahoo needs to do is provide a benefit to doing this.  If they explained that by doing this, you&#8217;re increasing keyword intensity by removing garbage words and thus making your keywords a larger proportion of total words, perhaps that would help.  But there has to be an SEO advantage or no one is going to do this.</p>

<p>Via <a href="http://highcontext.com/">David Gammel</a></p>
]]></description>
      <guid isPermaLink="false">6417@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="Yahoo! Search Blog: One Standard Fits All: Robots Exclusion Protocol for Yahoo!, Google and Microsoft" href="http://www.ysearchblog.com/archives/000587.html">One Standard Fits All: Robots Exclusion Protocol for Yahoo, Google and Microsoft</a>: Google, Yahoo, and Microsoft have gotten together and actually agreed on extensions to the REP &#8212; the Robots Exclusion Protocol, otherwise known as your robots.txt file.</p>

<p>For instance, they&#8217;re going to allow a new META tag: NOSNIPPET:</p>

<blockquote>
  <p>Tells a crawler not to display snippets in the search results for a given page.</p>
</blockquote>

<p>How about NOARCHIVE:</p>

<blockquote>
  <p>Tells a search engine not to show a &#8220;cached&#8221; link for a given page.</p>
</blockquote>

<p>Plus, you can have wildcards in URL patterns in robots.txt now, which is something people have been after for years.</p>

<p>And Yahoo has taken it one step further with what I absolutely think needs to be done for every search engine.  You can put a CSS class on any element called &#8220;robots-nocontent.&#8221;  Yahoo will strip this before indexing.</p>

<blockquote>
  <p>[&#8230;] webmasters can now mark parts of a page with a &#8216;robots-nocontent&#8217; tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors. We won&#8217;t use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results.</p>
</blockquote>

<p>This has been available in localized search systems for years.  I&#8217;ve used it in with both <a href="http://swish-e.org/">Swish-E</a> and the <a href="http://www.google.com/enterprise/mini/">Google Mini</a>.  It&#8217;s a great way to make sure that search engines don&#8217;t hit on irrelevant content, but instead focus on the core content of the page.</p>

<p>But, Joe brought up this point: shouldn&#8217;t navigation count?  If I have a term in my nav, shouldn&#8217;t I get credit for this?  If this is true &#8212; which it probably is &#8212; where do you draw the line?  How do you decide what parts of the page are not &#8220;index worthy&#8221;?</p>

<p>Additionally (and cynically), why do you care for public search?  The most basic SEO strategy is one of selfishness &#8212; you want every search hit, regardless of how relevant it is.  For you to exclude content on your page just out of altruism or a desire to make general search results better, is just not likely.</p>

<p>What Yahoo needs to do is provide a benefit to doing this.  If they explained that by doing this, you&#8217;re increasing keyword intensity by removing garbage words and thus making your keywords a larger proportion of total words, perhaps that would help.  But there has to be an SEO advantage or no one is going to do this.</p>

<p>Via <a href="http://highcontext.com/">David Gammel</a></p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2008-06-04T11:35:04-06:00</dc:date>
    </item>

    <item>
      <title>Making Wordpress Zippy</title>
      <link>http://gadgetopia.com/post/5639</link>
      <description><![CDATA[<p><img alt="wpcache2.gif" src="http://www.gadgetopia.com/images/wpcache2-thumb.gif" width="200" height="85" style="border: 1px solid black; margin: 1em; float:right;" />Over the last couple of weeks, we've had fun working with the gang at <a href="http://federatedmedia.net">Federated Media</a> to put together their <a href="http://holidaygadgetguide.federatedmedia.net/">Holiday Gadget Guide</a>. Deane's one of the contributing authors, so I'm sure he'll post a little more about it later on. </p>

<p>It was a fun little site to put together and our team had a good time, but reality hit hard last night when BoingBoing <a href="http://www.boingboing.net/2006/11/16/federated_medias_hol.html">posted it</a>,  traffic poured in, and everything slowed to a crawl. It became obvious that <a href="http://en.wikipedia.org/wiki/The_Long_Tail">The Long Tail</a> you hear about on blog posts can be connected to a very large dog, and Wordpress wasn't keeping up with demand.</p>

<p>Enter <a href="http://mnm.uib.es/gallir/wp-cache-2/">a truly excellent Wordpress Plugin, WP-Cache2</a>. WP-Cache2 installs into wordpress, walks you through all of the setup via the admin interface, and provides a friendly, easy-to-use caching system for Wordpress. With that, plus a few server tweaks, we were able to get things humming along again in no time. Definitely one for the bookmarks file if you run a Wordpress site and ever worry about a SlashDotting.</p>
]]></description>
      <guid isPermaLink="false">5639@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><img alt="wpcache2.gif" src="http://www.gadgetopia.com/images/wpcache2-thumb.gif" width="200" height="85" style="border: 1px solid black; margin: 1em; float:right;" />Over the last couple of weeks, we've had fun working with the gang at <a href="http://federatedmedia.net">Federated Media</a> to put together their <a href="http://holidaygadgetguide.federatedmedia.net/">Holiday Gadget Guide</a>. Deane's one of the contributing authors, so I'm sure he'll post a little more about it later on. </p>

<p>It was a fun little site to put together and our team had a good time, but reality hit hard last night when BoingBoing <a href="http://www.boingboing.net/2006/11/16/federated_medias_hol.html">posted it</a>,  traffic poured in, and everything slowed to a crawl. It became obvious that <a href="http://en.wikipedia.org/wiki/The_Long_Tail">The Long Tail</a> you hear about on blog posts can be connected to a very large dog, and Wordpress wasn't keeping up with demand.</p>

<p>Enter <a href="http://mnm.uib.es/gallir/wp-cache-2/">a truly excellent Wordpress Plugin, WP-Cache2</a>. WP-Cache2 installs into wordpress, walks you through all of the setup via the admin interface, and provides a friendly, easy-to-use caching system for Wordpress. With that, plus a few server tweaks, we were able to get things humming along again in no time. Definitely one for the bookmarks file if you run a Wordpress site and ever worry about a SlashDotting.</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2006-11-17T11:07:50-06:00</dc:date>
    </item>

    <item>
      <title>The Pointlessness of Page Views</title>
      <link>http://gadgetopia.com/post/5491</link>
      <description><![CDATA[<p><a title="evhead: Pageviews are Obsolete" href="http://evhead.com/2006/08/pageviews-are-obsolete.asp">Pageviews are Obsolete</a>: It's about time that this point is evangelized.  Page views are a fairly pointless measurement these days with the advent of Ajax, RSS, and widely varying site designs which can have dramatic effects on how "hungry" a site is for page view stats.</p>

<blockquote>
  <p>But Ajax is only part of the reason pageviews are obsolete. Another one is RSS. About half the readers of this blog do so via RSS. I can know how many subscribers I have to my feed, thanks to Feedburner. And I can know how many times my feed is downloaded, if I wanted to dig into my server logs. </p>
</blockquote>

<p>But what do you replace this statistic with?  Via <a href="http://www.boingboing.net/">Boing Boing</a>.</p>
]]></description>
      <guid isPermaLink="false">5491@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="evhead: Pageviews are Obsolete" href="http://evhead.com/2006/08/pageviews-are-obsolete.asp">Pageviews are Obsolete</a>: It's about time that this point is evangelized.  Page views are a fairly pointless measurement these days with the advent of Ajax, RSS, and widely varying site designs which can have dramatic effects on how "hungry" a site is for page view stats.</p>

<blockquote>
  <p>But Ajax is only part of the reason pageviews are obsolete. Another one is RSS. About half the readers of this blog do so via RSS. I can know how many subscribers I have to my feed, thanks to Feedburner. And I can know how many times my feed is downloaded, if I wanted to dig into my server logs. </p>
</blockquote>

<p>But what do you replace this statistic with?  Via <a href="http://www.boingboing.net/">Boing Boing</a>.</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2006-08-28T23:33:44-06:00</dc:date>
    </item>

    <item>
      <title>Silktide Sitescore v 1.7.2</title>
      <link>http://gadgetopia.com/post/4640</link>
      <description><![CDATA[<p><a href="http://www.silktide.com/tools/sitescore">Silktide's Sitescore</a> is kind of a neat tool. Plug in your website, and it gives you a 1-10 score on...</p>

<blockquote>

How well marketed, and popular the website is.<br>
How well designed and built the website is.<br>
How accessible the website is, particularly to those with disabilities.<br>
How satisfying the website is likely to be.<br>

</blockquote>

<p>Gadgetopia did pretty well with 8.2 points. Marketing was 9.6; design, 9.8; and experience, 9.7. The one aspect that hurt us was accessibility, which topped out at 5.6 points. </p>

<blockquote>

This website appears to be in violation of the British Disability Discrimination Act. All pages were found in violation of the the current <a href="http://www.w3.org/TR/WAI-WEBCONTENT/">W3C Web Content Accessibility Guidelines.</a><br>
<br>
This website is probably unlawful in Britain from the 1st October 2004. The British Disability Discrimination Act makes it unlawful to discriminate against a disabled person by refusing to provide any service provided to members of the public - including websites. 

</blockquote>

<p>Careful, or we might find ourselves locked up in the Tower of London. Since Silktide is a web development business, I'm sure they'd be willing to help us fix this little deficiency, for a tidy sum.</p>
]]></description>
      <guid isPermaLink="false">4640@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a href="http://www.silktide.com/tools/sitescore">Silktide's Sitescore</a> is kind of a neat tool. Plug in your website, and it gives you a 1-10 score on...</p>

<blockquote>

How well marketed, and popular the website is.<br>
How well designed and built the website is.<br>
How accessible the website is, particularly to those with disabilities.<br>
How satisfying the website is likely to be.<br>

</blockquote>

<p>Gadgetopia did pretty well with 8.2 points. Marketing was 9.6; design, 9.8; and experience, 9.7. The one aspect that hurt us was accessibility, which topped out at 5.6 points. </p>

<blockquote>

This website appears to be in violation of the British Disability Discrimination Act. All pages were found in violation of the the current <a href="http://www.w3.org/TR/WAI-WEBCONTENT/">W3C Web Content Accessibility Guidelines.</a><br>
<br>
This website is probably unlawful in Britain from the 1st October 2004. The British Disability Discrimination Act makes it unlawful to discriminate against a disabled person by refusing to provide any service provided to members of the public - including websites. 

</blockquote>

<p>Careful, or we might find ourselves locked up in the Tower of London. Since Silktide is a web development business, I'm sure they'd be willing to help us fix this little deficiency, for a tidy sum.</p>
]]></content:encoded>
      <dc:subject>Web Design and Usability</dc:subject>
      <dc:date>2005-11-21T00:26:14-06:00</dc:date>
    </item>

    <item>
      <title>Google Analytics</title>
      <link>http://gadgetopia.com/post/4621</link>
      <description><![CDATA[<p><a title="Google Analytics" href="http://www.google.com/analytics/">Google Analytics</a>: All your traffic are belong to us.</p>

<blockquote>
  <p>Google Analytics tells you everything you want to know about how your visitors found you and how they interact with your site. You'll be able to focus your marketing resources on campaigns and initiatives that deliver ROI, and improve your site to convert more visitors.</p>
</blockquote>

<p>Via <a href="http://joseph.randomnetworks.com/archives/2005/11/13/google-analyticts/">Joseph Scott</a>.</p>
]]></description>
      <guid isPermaLink="false">4621@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="Google Analytics" href="http://www.google.com/analytics/">Google Analytics</a>: All your traffic are belong to us.</p>

<blockquote>
  <p>Google Analytics tells you everything you want to know about how your visitors found you and how they interact with your site. You'll be able to focus your marketing resources on campaigns and initiatives that deliver ROI, and improve your site to convert more visitors.</p>
</blockquote>

<p>Via <a href="http://joseph.randomnetworks.com/archives/2005/11/13/google-analyticts/">Joseph Scott</a>.</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-11-14T08:27:13-06:00</dc:date>
    </item>

    <item>
      <title>Spiders are Stupid</title>
      <link>http://gadgetopia.com/post/4580</link>
      <description><![CDATA[<p>I've been monitoring the 404s on this site.  I changed our URL pattern a while back, so I have a page that catches all the 404 and resolves the old pattern against the new one, then redirects.  Anything that doesn't resolve gets logged and I have an RSS feed where I can watch them all.</p>

<p>Which brings me to my point: Web spiders are pretty stupid.  Ninety-nine percent of 404s to this site are from spiders.  They're looking for URLs that:</p>

<ul>
<li><em>...that they couldn't possibly have derived from any other page on the site.</em> <br />
Oftentimes they screw up relative vs. absolute URLs.  I usually go check, just in case I forgot to put "http://" in front of something, but I usaully find everything is in order and it must just be the spider that's confused.</li>
<li><em>...existed a long, long time ago.</em> <br />
I still get spiders coming in for pages with URLs that haven't been around for three years.  They must have them stored somewhere because every once in a while I'll get about 300 consecutive requests from the same spider for the same old pattern, like it was reading them from a file somewhere.</li>
<li><em>...are obviously munged.</em> <br />
Spiders truncate a lot, or insert random spaces in URLs.  I finally modified my lookup script up to remove spaces from the target URL first, and, if it can't find what the want, try to match what they ask for at the front of a string, so I can catch truncations.</li>
</ul>

<p>I've also noticed a lot of one-off spiders that I've never seen before.  They come out of colleges a lot, it seems.</p>

<p>And, of course, there are hack attempts galore.  Trying to hack <a href="http://www.gadgetopia.com/post/4092">the XMLRPC vulnerability</a> that was revealed a few months ago is pretty common, and I get scads of long, long requests for things in <a href="http://www.gadgetopia.com/post/2918">"_vti" directories</a>.</p>

<p>That said, monitoring your 404s is a really handy thing to do as it alerts you to a <em>lot</em> of problems.  We have over 4,500 entries now, and by watching bad requests, I find out all the time about bad links, missing images, etc.  It's really a good, simple way to give you an extra leg up on <a href="http://www.gadgetopia.com/post/2618">fighting content rot</a>.</p>

<p>But don't think the spiders are the smart ones.  You'd think since they were programmed by (supposed) professionals, and have everything in a database somewhere, that they'd be pretty on top of things.  My experience, however, indicates that a bunch of two-year-olds mashing on the keyboard would probably come up with more valid URLs than your average Web spider.</p>
]]></description>
      <guid isPermaLink="false">4580@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p>I've been monitoring the 404s on this site.  I changed our URL pattern a while back, so I have a page that catches all the 404 and resolves the old pattern against the new one, then redirects.  Anything that doesn't resolve gets logged and I have an RSS feed where I can watch them all.</p>

<p>Which brings me to my point: Web spiders are pretty stupid.  Ninety-nine percent of 404s to this site are from spiders.  They're looking for URLs that:</p>

<ul>
<li><em>...that they couldn't possibly have derived from any other page on the site.</em> <br />
Oftentimes they screw up relative vs. absolute URLs.  I usually go check, just in case I forgot to put "http://" in front of something, but I usaully find everything is in order and it must just be the spider that's confused.</li>
<li><em>...existed a long, long time ago.</em> <br />
I still get spiders coming in for pages with URLs that haven't been around for three years.  They must have them stored somewhere because every once in a while I'll get about 300 consecutive requests from the same spider for the same old pattern, like it was reading them from a file somewhere.</li>
<li><em>...are obviously munged.</em> <br />
Spiders truncate a lot, or insert random spaces in URLs.  I finally modified my lookup script up to remove spaces from the target URL first, and, if it can't find what the want, try to match what they ask for at the front of a string, so I can catch truncations.</li>
</ul>

<p>I've also noticed a lot of one-off spiders that I've never seen before.  They come out of colleges a lot, it seems.</p>

<p>And, of course, there are hack attempts galore.  Trying to hack <a href="http://www.gadgetopia.com/post/4092">the XMLRPC vulnerability</a> that was revealed a few months ago is pretty common, and I get scads of long, long requests for things in <a href="http://www.gadgetopia.com/post/2918">"_vti" directories</a>.</p>

<p>That said, monitoring your 404s is a really handy thing to do as it alerts you to a <em>lot</em> of problems.  We have over 4,500 entries now, and by watching bad requests, I find out all the time about bad links, missing images, etc.  It's really a good, simple way to give you an extra leg up on <a href="http://www.gadgetopia.com/post/2618">fighting content rot</a>.</p>

<p>But don't think the spiders are the smart ones.  You'd think since they were programmed by (supposed) professionals, and have everything in a database somewhere, that they'd be pretty on top of things.  My experience, however, indicates that a bunch of two-year-olds mashing on the keyboard would probably come up with more valid URLs than your average Web spider.</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-11-04T06:52:13-06:00</dc:date>
    </item>

    <item>
      <title>Gadgetopia Screen Resolution Stats</title>
      <link>http://gadgetopia.com/post/4427</link>
      <description><![CDATA[<p>I'm using a new stat tracking app called <a href="http://www.haveamint.com/">Mint</a>.  It has a plug-in called "User Agent 007" (ho, ho -- what wit) that captures browser stats.  Interesting stats:</p>

<ul>
<li>almost 90% of Gadgetopia visitors are at 1024 x 768 or greater</li>
<li>almost 20% of visitors are at higher resolution than 1024 x 768</li>
</ul>

<p>Here's the <a href="http://www.gadgetopia.com/images/mint_resolution_stats.jpg">entire list of resolutions</a> and their penetration.  At what point do you abandon the 800 x 600 crowd and start using the extra space for those that have it, I wonder?</p>

<p>But even with more space, you always run into the problem that "screen resolution" and "browser viewing area" are two very different things.  With the sidebars that people run these days, you don't get anywhere near the full width to work with.  I have my resolution at 1024 right now, but I think I use 200 pixels in the bookmark sidebar in Firefox.</p>
]]></description>
      <guid isPermaLink="false">4427@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p>I'm using a new stat tracking app called <a href="http://www.haveamint.com/">Mint</a>.  It has a plug-in called "User Agent 007" (ho, ho -- what wit) that captures browser stats.  Interesting stats:</p>

<ul>
<li>almost 90% of Gadgetopia visitors are at 1024 x 768 or greater</li>
<li>almost 20% of visitors are at higher resolution than 1024 x 768</li>
</ul>

<p>Here's the <a href="http://www.gadgetopia.com/images/mint_resolution_stats.jpg">entire list of resolutions</a> and their penetration.  At what point do you abandon the 800 x 600 crowd and start using the extra space for those that have it, I wonder?</p>

<p>But even with more space, you always run into the problem that "screen resolution" and "browser viewing area" are two very different things.  With the sidebars that people run these days, you don't get anywhere near the full width to work with.  I have my resolution at 1024 right now, but I think I use 200 pixels in the bookmark sidebar in Firefox.</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-10-03T12:58:09-06:00</dc:date>
    </item>

    <item>
      <title>Email All Your Users Day Re-visited</title>
      <link>http://gadgetopia.com/post/4333</link>
      <description><![CDATA[<p><a title="Email All Your Users Day | Gadgetopia" href="http://www.gadgetopia.com/post/1030">Email All Your Users Day</a>: I posted this two years ago today.  I still think this is a great idea.</p>

<blockquote>
  <p>[...] I hereby proclaim December 1 as "Email All Your Users Day." On that day, everyone who runs a service that has user accounts should email ALL their users to remind them they have an account, what the account is for, and where the login screen is. The member can then decide what he or she wants to do with it.</p>
</blockquote>
]]></description>
      <guid isPermaLink="false">4333@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="Email All Your Users Day | Gadgetopia" href="http://www.gadgetopia.com/post/1030">Email All Your Users Day</a>: I posted this two years ago today.  I still think this is a great idea.</p>

<blockquote>
  <p>[...] I hereby proclaim December 1 as "Email All Your Users Day." On that day, everyone who runs a service that has user accounts should email ALL their users to remind them they have an account, what the account is for, and where the login screen is. The member can then decide what he or she wants to do with it.</p>
</blockquote>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-09-13T13:10:11-06:00</dc:date>
    </item>

    <item>
      <title>Do Anonymous Domain Registration Outfits Actually Work?</title>
      <link>http://gadgetopia.com/post/4139</link>
      <description><![CDATA[<p><a title="Foetry Forum V.2 :: View topic - About anonymity . . ." href="http://foetry.com/newbb/viewtopic.php?p=906&amp;">About anonymity . . .</a>: Think you're safe if you register your domain name "anonymously"?  Apparently not:</p>

<blockquote>
  <p>Despite paying Domains by Proxy an additional fee to register foetry.com anonymously, they responded to a letter from a personal injury lawyer, and canceled my registration without notifying me of a complaint. Let that sink in: a personal injury lawyer's letter is all it took for DBP to cancel my anonymity. Furthermore, the attorney's ignorance of Internet Law didn't even phase Domains by Proxy. (I have a copy of the attorney's letter and I know more about Internet law than he).</p>
  
  <p>Boise State University Professor Janet Holmes, simply hired the lawyer to write a letter. That's all. There was no subpeona. No chance of a case against me. Domains by Proxy never emailed me and never telephoned. They simply canceled the anonymity and my confidential information suddenly became available. My initials, my address, and my phone number became freely available to anyone with an internet connection. </p>
</blockquote>

<p>Via <a href="http://www.metafilter.org">Metafilter</a>.</p>
]]></description>
      <guid isPermaLink="false">4139@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="Foetry Forum V.2 :: View topic - About anonymity . . ." href="http://foetry.com/newbb/viewtopic.php?p=906&amp;">About anonymity . . .</a>: Think you're safe if you register your domain name "anonymously"?  Apparently not:</p>

<blockquote>
  <p>Despite paying Domains by Proxy an additional fee to register foetry.com anonymously, they responded to a letter from a personal injury lawyer, and canceled my registration without notifying me of a complaint. Let that sink in: a personal injury lawyer's letter is all it took for DBP to cancel my anonymity. Furthermore, the attorney's ignorance of Internet Law didn't even phase Domains by Proxy. (I have a copy of the attorney's letter and I know more about Internet law than he).</p>
  
  <p>Boise State University Professor Janet Holmes, simply hired the lawyer to write a letter. That's all. There was no subpeona. No chance of a case against me. Domains by Proxy never emailed me and never telephoned. They simply canceled the anonymity and my confidential information suddenly became available. My initials, my address, and my phone number became freely available to anyone with an internet connection. </p>
</blockquote>

<p>Via <a href="http://www.metafilter.org">Metafilter</a>.</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-07-18T17:56:59-06:00</dc:date>
    </item>

    <item>
      <title>Robots.txt Survey</title>
      <link>http://gadgetopia.com/post/4137</link>
      <description><![CDATA[<p><a title="Robots.txt, The Big Crawl" href="http://www.searchengineworld.com/misc/robots_txt_crawl.htm">Robots.txt, The Big Crawl</a>: These guys grabbed 75,000 robots.txt files, and found a few problems:</p>

<blockquote>
  <p>[...] we found a wide array of problems with peoples robots.txt files. We found more than 5% of the robots.txt used bad style and up to 2% were so badly formed that they would not be recognized by any spider.</p>
  
  <p>One of the most common mistakes is backwards syntax [...] A large number of people had multiple directories per line [...] Another common mistake, is editing your robots.txt in DOS mode</p>
</blockquote>

<p>Not only do they tell you the problems they found, but they explain how various spiders would interpret the problems.  Some of the "problems" are correct per the spec, but spiders don't always follow the spec...</p>
]]></description>
      <guid isPermaLink="false">4137@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="Robots.txt, The Big Crawl" href="http://www.searchengineworld.com/misc/robots_txt_crawl.htm">Robots.txt, The Big Crawl</a>: These guys grabbed 75,000 robots.txt files, and found a few problems:</p>

<blockquote>
  <p>[...] we found a wide array of problems with peoples robots.txt files. We found more than 5% of the robots.txt used bad style and up to 2% were so badly formed that they would not be recognized by any spider.</p>
  
  <p>One of the most common mistakes is backwards syntax [...] A large number of people had multiple directories per line [...] Another common mistake, is editing your robots.txt in DOS mode</p>
</blockquote>

<p>Not only do they tell you the problems they found, but they explain how various spiders would interpret the problems.  Some of the "problems" are correct per the spec, but spiders don't always follow the spec...</p>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-07-18T00:00:38-06:00</dc:date>
    </item>

    <item>
      <title>Unresolved 404 Patterns</title>
      <link>http://gadgetopia.com/post/4133</link>
      <description><![CDATA[<p>I changed the URL scheme of this Web site over the weekend.  I had been meaning to do it for a while, but some problems with <a href="http://www.gadgetopia.com/post/4074">Movable Type 3.2</a> kind of forced the issue. (I have <em>got</em> to stop rushing into every beta that presents itself...)</p>

<p>To make everything backwards compatible, I built a simple redirect system -- I have a table in the database with every single permalink from the old site (all 9,000 of them -- including entry RSS feeds and category pages) mapped to every single new URL.</p>

<p>If someone looks for a page which has moved, the 404 page does a lookup on this table, "resolves" the old URL against a new one, then redirects with a "301 Moved Permanently."  It seems to work well.</p>

<p>A side benefit of this system is that I can watch for "unresolved 404s," meaning 404s that were not in my lookup table -- a genuine 404, if you will.  I've noticed some interesting phenomena:</p>

<ul>
<li><p><strong>I get hammered by referrer spam.</strong>  We've <a href="http://www.gadgetopia.com/post/1863">talked about this before</a> -- this is spam created by a bot hitting any page with a fake referrer string in the hopes that you're displaying your referrers on this site (a la Dean Allen's <a href="http://www.textism.com/tools/refer/">Refer</a> or similar tool).</p>

<p>This results in fully half the unresolved 404s on this site coming from casino bots hitting URLs that are three years old.  I know they're that old because they use the very first URL scheme I had for this site -- the default Movable Type archives URL: "archives/000355.html", etc.</p>

<p>They must be working off a very old list of URLs, which I find quite funny, and quite interesting. Why would they keep an old list of URLs lying around?  Why not just re-spider?  Do spammers sell lists of URLs  like they do lists of emails?</p></li>
<li><p><strong>Browsers and spiders sometimes mangle HREFs.</strong>  I see impossible URLs that can only result from a mis-interpretation of the HREF in the link.  IE 5.x on the Mac, for example, has problems with background images coded in CSS.  The see that browser try to get this  a lot:</p>

<p><code>/'/bin/images/header.jpg'</cide></p>

<p>It's just mangled the URL of the image.</p>

<p>Others, however, are more mysterious.  Just two minutes ago, a spider tried to access a URL that it could only have hit if it missed the leading "/" in the HREF.  Coming from this page...</p>

<p><code>/2005/07/09/IsPerlStillRelevant.html</code></p>

<p>...the spider tried to hit:</p>

<p><code>/2005/07/09/4131</code></p>

<p>I just checked that page and there's <em>no way</em> it pulled that URL out of the code.  The correct URL was...</p>

<p><code>/4131</code></p>

<p>But the URL it bounced off of could have only happened if it had a bug of some kind or if the HTML got mangled on the way down.</p>

<p>I also get hits to things like this:</p>

<p><code>/2005/04/15/EasyJavaScriptAutocompleteI</code></p>

<p>No mystery here -- that's just a truncated version of this:</p>

<pre><code>&lt;code&gt;/2005/04/15/EasyJavaScriptAutocompleteIntellisenseScript.html&lt;/code&gt;
</code></pre>

<p>Truncation, it seems, happens a lot.  The Ask Jeeves/Teoma spider, for instance, has been trying all day to get at URLs that are all truncated at 39 characters.  Add "http://www.gadgetopia.com/" to that, and you get 64 characters.</p>

<p>Why is this, I wonder?  Was that the size of the database field they stored the URL in?  More importantly, does it explain why I've never done so well in that index?  I'm wondering now if my previously-long URLs have hurt my engine placement in other indexes besides Google.</p></li>
<li><p>As implied by the preceding two points, <strong>the vast majority of 404s are from bots.</strong>  I'm sure this is true for all sites, but I never realized it so much until now.</p></li>
<li><p><strong>Hack attempts abound.</strong>  There are lots of attempts to hit DLL files in the (non-existant) "MSOffice" and "_vti/" directories.  These are people trying to hack Outlook Web Access and various Web-enabled Microsoft Office technologies.</p></li>
<li><p><strong>Spiders don't crawl and index in the same pass.</strong> I changed the URL pattern late Friday night, then changed my mind about pattern to use when I woke up the next morning.  This means the site was accessible under a certain pattern for about eight hours.</p>

<p>In the following 48 hours, I saw attempt after attempt by bots to get to files under that pattern.  This tells me that a crawler made a pass at the site during that eight hour window and stored the URLs it found.  Then an indexer used that list to come back through the site a day later and index the text (sadly, in this case, the pattern had changed -- I've since put in a RewriteRule to catch those).</p></li>
</ul>
]]></description>
      <guid isPermaLink="false">4133@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p>I changed the URL scheme of this Web site over the weekend.  I had been meaning to do it for a while, but some problems with <a href="http://www.gadgetopia.com/post/4074">Movable Type 3.2</a> kind of forced the issue. (I have <em>got</em> to stop rushing into every beta that presents itself...)</p>

<p>To make everything backwards compatible, I built a simple redirect system -- I have a table in the database with every single permalink from the old site (all 9,000 of them -- including entry RSS feeds and category pages) mapped to every single new URL.</p>

<p>If someone looks for a page which has moved, the 404 page does a lookup on this table, "resolves" the old URL against a new one, then redirects with a "301 Moved Permanently."  It seems to work well.</p>

<p>A side benefit of this system is that I can watch for "unresolved 404s," meaning 404s that were not in my lookup table -- a genuine 404, if you will.  I've noticed some interesting phenomena:</p>

<ul>
<li><p><strong>I get hammered by referrer spam.</strong>  We've <a href="http://www.gadgetopia.com/post/1863">talked about this before</a> -- this is spam created by a bot hitting any page with a fake referrer string in the hopes that you're displaying your referrers on this site (a la Dean Allen's <a href="http://www.textism.com/tools/refer/">Refer</a> or similar tool).</p>

<p>This results in fully half the unresolved 404s on this site coming from casino bots hitting URLs that are three years old.  I know they're that old because they use the very first URL scheme I had for this site -- the default Movable Type archives URL: "archives/000355.html", etc.</p>

<p>They must be working off a very old list of URLs, which I find quite funny, and quite interesting. Why would they keep an old list of URLs lying around?  Why not just re-spider?  Do spammers sell lists of URLs  like they do lists of emails?</p></li>
<li><p><strong>Browsers and spiders sometimes mangle HREFs.</strong>  I see impossible URLs that can only result from a mis-interpretation of the HREF in the link.  IE 5.x on the Mac, for example, has problems with background images coded in CSS.  The see that browser try to get this  a lot:</p>

<p><code>/'/bin/images/header.jpg'</cide></p>

<p>It's just mangled the URL of the image.</p>

<p>Others, however, are more mysterious.  Just two minutes ago, a spider tried to access a URL that it could only have hit if it missed the leading "/" in the HREF.  Coming from this page...</p>

<p><code>/2005/07/09/IsPerlStillRelevant.html</code></p>

<p>...the spider tried to hit:</p>

<p><code>/2005/07/09/4131</code></p>

<p>I just checked that page and there's <em>no way</em> it pulled that URL out of the code.  The correct URL was...</p>

<p><code>/4131</code></p>

<p>But the URL it bounced off of could have only happened if it had a bug of some kind or if the HTML got mangled on the way down.</p>

<p>I also get hits to things like this:</p>

<p><code>/2005/04/15/EasyJavaScriptAutocompleteI</code></p>

<p>No mystery here -- that's just a truncated version of this:</p>

<pre><code>&lt;code&gt;/2005/04/15/EasyJavaScriptAutocompleteIntellisenseScript.html&lt;/code&gt;
</code></pre>

<p>Truncation, it seems, happens a lot.  The Ask Jeeves/Teoma spider, for instance, has been trying all day to get at URLs that are all truncated at 39 characters.  Add "http://www.gadgetopia.com/" to that, and you get 64 characters.</p>

<p>Why is this, I wonder?  Was that the size of the database field they stored the URL in?  More importantly, does it explain why I've never done so well in that index?  I'm wondering now if my previously-long URLs have hurt my engine placement in other indexes besides Google.</p></li>
<li><p>As implied by the preceding two points, <strong>the vast majority of 404s are from bots.</strong>  I'm sure this is true for all sites, but I never realized it so much until now.</p></li>
<li><p><strong>Hack attempts abound.</strong>  There are lots of attempts to hit DLL files in the (non-existant) "MSOffice" and "_vti/" directories.  These are people trying to hack Outlook Web Access and various Web-enabled Microsoft Office technologies.</p></li>
<li><p><strong>Spiders don't crawl and index in the same pass.</strong> I changed the URL pattern late Friday night, then changed my mind about pattern to use when I woke up the next morning.  This means the site was accessible under a certain pattern for about eight hours.</p>

<p>In the following 48 hours, I saw attempt after attempt by bots to get to files under that pattern.  This tells me that a crawler made a pass at the site during that eight hour window and stored the URLs it found.  Then an indexer used that list to come back through the site a day later and index the text (sadly, in this case, the pattern had changed -- I've since put in a RewriteRule to catch those).</p></li>
</ul>
]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-07-17T08:45:25-06:00</dc:date>
    </item>

    <item>
      <title>AdSense and Borders</title>
      <link>http://gadgetopia.com/post/3733</link>
      <description><![CDATA[<p>If you have Google Adsense on your site, here is the best piece of advice I can give you: <em>don't put borders around your ads</em>.  I had a border around my skyscraper banner on the right here, so it sat in its own little box.</p>

<p>A friend told me to take the border off.  I figured it couldn't hurt to try it, so I made the border white, so it just fades into the background.  Nothing else was changed. I did it in the middle of the month, so the first half was with the border, the second half without.</p>

<p>The result in terms of clickthrough rate? A <em>seventy percent increase</em>.</p>]]></description>
      <guid isPermaLink="false">3733@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p>If you have Google Adsense on your site, here is the best piece of advice I can give you: <em>don't put borders around your ads</em>.  I had a border around my skyscraper banner on the right here, so it sat in its own little box.</p>

<p>A friend told me to take the border off.  I figured it couldn't hurt to try it, so I made the border white, so it just fades into the background.  Nothing else was changed. I did it in the middle of the month, so the first half was with the border, the second half without.</p>

<p>The result in terms of clickthrough rate? A <em>seventy percent increase</em>.</p>]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-04-01T09:25:10-06:00</dc:date>
    </item>

    <item>
      <title>Email All Your Users Day Re-Visited</title>
      <link>http://gadgetopia.com/post/3636</link>
      <description><![CDATA[<p><a title="Email All Your Users Day | Gadgetopia" href="http://www.gadgetopia.com/2003/09/13/EmailAllYourUsersDay.html">Email All Your Users Day</a>: This is one of those posts that I think never got the attention it deserved.  I still maintain that this is a good idea, and now that Gadgetopia has some more traffic, I re-submit it to the blogosphere.  Who's with me?</p>

<blockquote>

<p>I hereby proclaim December 1 as "Email All Your Users Day." On that day, everyone who runs a service that has user accounts should email ALL their users to remind them they have an account, what the account is for, and where the login screen is. The member can then decide what he or she wants to do with it.</p>

</blockquote>]]></description>
      <guid isPermaLink="false">3636@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="Email All Your Users Day | Gadgetopia" href="http://www.gadgetopia.com/2003/09/13/EmailAllYourUsersDay.html">Email All Your Users Day</a>: This is one of those posts that I think never got the attention it deserved.  I still maintain that this is a good idea, and now that Gadgetopia has some more traffic, I re-submit it to the blogosphere.  Who's with me?</p>

<blockquote>

<p>I hereby proclaim December 1 as "Email All Your Users Day." On that day, everyone who runs a service that has user accounts should email ALL their users to remind them they have an account, what the account is for, and where the login screen is. The member can then decide what he or she wants to do with it.</p>

</blockquote>]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-03-07T22:03:10-06:00</dc:date>
    </item>

    <item>
      <title>Snazzy 404s</title>
      <link>http://gadgetopia.com/post/3480</link>
      <description><![CDATA[<p><a title="404's 4 U | Metafilter" href="http://www.metafilter.com/mefi/39059">404's 4 U</a>: this post over at Metafilter has some great links to various 404s.  I liked the <a href="http://www.gadgetopia.com/2004/10/02/ZorkReturns.html">Zork</a> one.</p>

<blockquote>404 Research Lab . Not that I'm sorry for the double post, but I was inspired by this 404 and went searching for some more. Some of them are funny, some let you play games, some are just creepy.</blockquote>]]></description>
      <guid isPermaLink="false">3480@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="404's 4 U | Metafilter" href="http://www.metafilter.com/mefi/39059">404's 4 U</a>: this post over at Metafilter has some great links to various 404s.  I liked the <a href="http://www.gadgetopia.com/2004/10/02/ZorkReturns.html">Zork</a> one.</p>

<blockquote>404 Research Lab . Not that I'm sorry for the double post, but I was inspired by this 404 and went searching for some more. Some of them are funny, some let you play games, some are just creepy.</blockquote>]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-01-27T20:50:52-06:00</dc:date>
    </item>

    <item>
      <title>Official MT Hosting</title>
      <link>http://gadgetopia.com/post/3475</link>
      <description><![CDATA[<p><a title="movabletype.org : Movable Type Hosting Partner Program" href="http://www.movabletype.org/movable_type_hosting_partner_program.shtml">Movable Type Hosting Partner Program</a>: Movable Type has two "hosting partners" offering pre-installed MT hosting: $5.95 and $9.95 a month.</p>]]></description>
      <guid isPermaLink="false">3475@http://gadgetopia.com/</guid>
      <content:encoded><![CDATA[<p><a title="movabletype.org : Movable Type Hosting Partner Program" href="http://www.movabletype.org/movable_type_hosting_partner_program.shtml">Movable Type Hosting Partner Program</a>: Movable Type has two "hosting partners" offering pre-installed MT hosting: $5.95 and $9.95 a month.</p>]]></content:encoded>
      <dc:subject>Web Site Management</dc:subject>
      <dc:date>2005-01-26T13:28:09-06:00</dc:date>
    </item>


  </channel>
</rss>