Sure ... and Apple have said they are investigating ways to avoid/avert this problem. It seems that the problem is, at least in part, driven by the compromise between fast searching service and deep searching service - in other words, they err on the side of timing out with no results from the cache rather than searching the deep database. There is no bug report, just a 0 result and this result was reproduced by several list members on searches for their own catalogues, which were known to be present in iTunes. I'm not sure the other sites you mention have the same priorities when they consider their design compromises. Or perhaps it's just that no-one notices :-) It nonetheless represents an opportunity for market manipulation by database owners, which is anathema to the notion of "search neutrality", which is a premise of the Long Tail theory and a cherished ideal among web activists. Even worse, the example of "The Lord's Prayer" demonstrates how, even when the search system is working technically correctly, this effect acts to the benefit of hegemons and the detriment of new entrants or niche artists. Cheers, Hughie ----- Original Message ----- From: <elw@stderr.org> To: <>; "Hugemusic" <hmusic@ozemail.com.au> Sent: Monday, April 30, 2007 11:12 AM Subject: Re: [Air-l] net neutrality by another name, oecd report
Large music databases, iTunes, CDBaby, Amazon, etc, use a cacheing device to improve the efficiency of their searches. This device makes the most popular search results more readily available than the rest, which works well for most searches (on a zipf curve distribution). However, it significantly disadvantages the less-popular artists/results - to the point of virtual exclusion - for example, on searches for versions of "The Lord's Prayer" in iTunes, in which the versions by the more popular artists drown the versions by less popular artists ...
That's likely a bug in their particular software implementation, not a bug in the intellectual idea of database caches.
What *should* happen, should your search not result in any hits in the cache, is that the site's software issues a hugely more expensive search against the whole dataset. If you're seeing a truncation due to a particular search term being less than popular, it seems likely that the site operators would appreciate the generation of a reproducible bug report.
Several sites (livejournal, citeulike, etc) use memcached for this sort of database caching - and it doesn't seem to have the 'ill' mis-behavior that you're describing.
--e