Yahoo reports (
link here) that the World Association of Newspapers is spearheading an initiative to create the “Automated Content Access Protocol” and enforce the terms under which search engines access on line and printed content. Newspapers and magazines are said to be particularly concerned about the loss of income to the internet, either from subscriptions or advertising.
"What is required is a standardized way of describing the permissions which apply to a Web site or Web page so that it can be decoded by a dumb machine without the help of an expensive lawyer."
"In one example of how ACAP would work, a newspaper publisher could grant search engines permission to index its site, but specify that only select ones display articles for a limited time after paying a royalty."
The dilemma for ACAP's members is that they benefit from search engines which create traffic to their sites. The solution for some has been to deny access to whole articles unless a fee is paid or the reader is also a subscriber.
A work in progress.
Boot on other foot?
Instead of demanding payment for access, how about inviting it?
"If you've enjoyed reading these news articles and would like to see more, click here to pay 1 cent for each of our next articles. You can stop at any time."
This is the boot switch:
Copyright business model
"If you produce some money, we'll give you some articles"
Copyleft business model:
"If you produce some articles, we'll give you some money"
The newspapers have got to recognise that people still want to read news, but coming up with ever more esoteric ways of hiding it from people isn't going to work. Copyright is dead. DRM is deader.
I've seen more and more "bait and switch" tactics being used online by news and similar organizations. They actually make Google complicit: they let the googlebot "view" anything, but real browsers are shown a snippet and shaken down for dough. The result is that Google's index gets polluted with results pointing to pages you can't actually view! (Remember, 99% of the people on earth don't have a credit card, so 1% can view them at all, and it costs them, and the rest can't.)
Users of Google then get scammed. Google itself becomes less useful. Somehow they also convince Google's automation not to cache their pages. You see this has happened when the Google hit is accompanied by excerpt text that isn't anywhere on the page you land on by following the link. They are showing a different thing to googlebot than to firefox, MSIE, or whatever.
Of course, as soon as I can find a way I'm going to set up Firefox so I can conveniently toggle it between normal behavior and spoofing headers to appear to be the googlebot. Nobody blocks access by the googlebot ... well, almost nobody, and nobody with any brains.
Neo:
When you figure out how to spoof googlebot, please let the rest of us know how!
-- Michael Chermside
Eh -- it should be as simple as setting an appropriate user-agent string. Unless they check the source IP or something as well.
Or we can complain to Google. They shouldn't allow this to occur -- it should be "content behind a register/paywall does not get indexed". Organic search engine hits should just work, not link to a tollbooth instead of what it looked like they linked to. Only paid-placement links should be allowed to link to a tollbooth.