You are here: silicon.com > Networks > WebWatch

WebWatch

Google bots are crawling in a new way

On the hunt for HTML forms…

Tags: search, google

By Stephen Shankland

Published: 16 April 2008 08:55 BST

Google's search bots, which scour the web constantly for new pages, have begun a new, more active phase of their indexing jobs.

In a blog post last week, Jayant Madhavan and Alon Halevy of Google's crawling and indexing team said the company has begun an experiment in which its indexing software experimentally enters text in website forms to see what previously undiscovered pages may appear.

The best of Google Earth

From Hollywood to Vegas and racetracks to controversial domes... click here to travel the world with Google Earth.

The post said: "In the past few months, we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines."

The new Google indexing practice involves only "high quality" websites and doesn't run on sites with 'robots.txt' files or other standard mechanisms of warding off indexing software.

To decide what words to "type" into the forms, the indexing software samples from among words on the web page with the form, Google said.

The technology looks related to a company called Transformic which Google acquired, according to a blog post by Anand Rajaraman, who was involved with the technology earlier in his career, while working for Halevy.

  1. Zones
  2. Management
  3. Networks
  4. Software
  5. IT Services
  6. Hardware
  1. Verticals
  2. Public Sector
  3. Financial Services
  4. Retail & Leisure
Read and write about internet access at the airports of the world at atlarge.com. Be the first to rate an airport, win champagne...


Web Developer (JavaScript, AJAX, JavaScript, Web Developer) London

Key words: XHTML, CSS, JavaScript / DOM, XML / XSLT, AJAX / Prototype To apply for the Web Developer (JavaScript, AJAX, Prototype, JavaScript) role ...

IT Development Officer / IT Developer - Plymouth (C, C++, C#, PHP, Java ,SQL )

Please return the fully completed application and equal opportunities forms together with a current full CV by no later than 22nd May 2008. ...

Speech Research Engineer (Software Engineer) - Nuance, VoiceXML, SRGS, SLMs, Java, C++ - London, South East

Speech Research Engineer (Software Engineer) - Nuance, VoiceXML, SRGS, SLMs, Java, C++ - London, South East The area: Engineering, Mobile ...

CIO Agenda 2008
The exclusive silicon.com CIO Agenda 2008 survey looks at the CIO's tech shopping list for the year, examines whether IT budgets are rising or falling and reveals what the pain points are for tech chiefs this year. Find out more in our latest special report.





Quick Sitemap Links: