|
Press
Releases
11/29/2007
News Web sites propose
system for more control over content use by search engines
By ANICK JESDANUN
AP Internet Writer
NEW YORK (AP) -- Leading news organizations and other publishers
have proposed changing the rules that tell search engines
what they can and can't collect when scouring the Web, saying
the revisions would give site owners greater control over
their content.
Google Inc., Yahoo Inc. and other top search companies now
voluntarily respect a Web site's wishes as stated in a document
known as "robots.txt," which a search engine's indexing
software, called a crawler, knows to look for on a site.
Under the existing 13-year-old technology, a site can block
indexing of individual Web pages, specific directories or
the entire site. Some search engines have added their own
commands to the rules, but they're not universally observed.
The Automated Content Access Protocol proposal, unveiled Thursday
by a consortium of publishers at the global headquarters of
The Associated Press, seeks to have those extra commands —
and more — apply across the board.
With the ACAP commands, sites could try to limit how long
search engines retain copies in their indexes, for instance,
or tell the crawler not to follow any of the links that appear
within a Web page.
If accepted by search engines, publishers say they would be
willing to make more of their copyright-protected materials
available online. But Web surfers also could find sites disappear
from search engines more quickly, or find smaller versions
of images called thumbnails missing if sites ban such presentations.
"Robots.txt was created for a different age," said
Gavin O'Reilly, president of the World Association of Newspapers,
one of the organizations behind the proposal. "It works
well for search engines but doesn't work for content creators."
As with the current robots.txt, ACAP's use would be voluntary,
so search engines ultimately would have to agree to recognize
the new commands. So far, none of the leading ones have.
Search engines also could ignore the new commands and leave
it to courts to resolve any disputes.
Robots.txt was developed in 1994 following concerns that some
crawlers were taxing Web sites by visiting them too many times
too quickly. Although the system has never been sanctioned
by any standards body, major search engines have voluntarily
complied.
As search engines expanded to offer services for displaying
news and scanning printed books, news organizations and book
publishers began to complain that their content was being
lifted from their sites and displayed on those of the search
engines.
News publishers had complained that Google was posting their
news summaries, headlines and photos without permission. Google
claimed that "fair use" provisions of copyright
laws applied, though it eventually settled a lawsuit with
Agence France-Presse and agreed to pay the AP without a lawsuit
filed. Financial terms haven't been disclosed.
The proposed extensions partly grew out of those disputes.
Leading the ACAP effort were groups representing publishers
of newspapers, magazines, online databases, books and journals.
The AP is one of dozens of organizations that have joined
ACAP, and O'Reilly said those members collectively represent
some 18,000 publications.
AP Chief Executive Tom Curley said the news cooperative spends
hundreds of millions of dollars annually covering the world
— and in many cases its employees risk their lives doing
so. Technologies such as ACAP, he said, are important to protect
the AP's original news reports from sites that distribute
them without permission.
"The free riding deprives AP of economic returns on its
investments," he said.
The new ACAP commands will use the same robots.txt file that
search engines now recognize. ACAP developers tested their
system with French search engine Exalead Inc. but had only
informal discussions with others. Google, Yahoo and Microsoft
Corp. sent representatives to Thursday's announcement but
made no public promises to use ACAP.
Google spokeswoman Jessica Powell said the company supports
all efforts to bring Web sites and search engines together
but needed to evaluate ACAP to ensure it can meet the needs
of millions of Web sites — not just those of a single
community.
Joseph Siino, a senior vice president at Yahoo, said other
technological initiatives exist, including the Sitemaps system
for Web sites to tell search engines about which Web pages
to index and how often they change.
The marketplace — and not any one group — "is
ultimately going to dictate what's the right solution,"
Siino said. "Our industry is a rapidly evolving industry.
No one will want to endorse any particular solution prematurely."
———
On the Net:
http://www.the-acap.org
|