
Taking it off the menu for web searchers...
By Joris Evers
Published: 14 July 2006 08:40 BST
Researchers at Microsoft have developed a tool to scrub search engines of major websites that pollute search results and ultimately help clean the web of spam.
Called Strider Search Defender, the tool is designed to dig out web pages that are a front for spam sites, according to a paper published by Microsoft researchers on Thursday. These web pages typically reside on blogging sites and other services that provide free web space, the researchers said.
Spammers soil the web with countless links to their spam fronts in order to gain a higher ranking in search engines. Yi-Min Wang, principal researcher at Microsoft, said in an interview: "By cleaning up web search, hopefully we can discourage spammers from cluttering the web with spam."
Microsoft's tool doesn't find spam the traditional way, by looking at the site's content. Instead, it turns the spammers' activities against them by using search engines to find links to potential spam pages. These links are often posted as comments on blogs, in online discussion forums and in guestbooks, also called "comment spam".
Search Defender starts with a list of confirmed spam web addresses. A "Spam Hunter" part of the tool runs those addresses through search engines to find pages that link to the spam sites, using the "link:" query tag. Additional spam URLs found on those sites are, in turn, run through the Spam Hunter, resulting in a long list of potential spam sites.
Then, using another Microsoft research project, Strider URL Tracer, false positives are filtered out and a list of web pages that redirect to spam sites is compiled. Strider URL Tracer actually visits each one of the web addresses found by the Spam Hunter to see if it redirects to a secondary spam page.
Wang said: "We use search engines to find them. Spammers are basically telling us: here are my spam URLs."
Spammers use various online services to host spam fronts, including free web hosting providers such as Angelfire, Tripod and Yahoo!'s Geocities, Microsoft said. Blogging services are also often abused, Google's Blogger at blogspot.com is especially popular, according to the research report.
The report said: "Our preliminary investigation shows that spam blogs hosted on blogspot.com appear to be particularly widely spammed and effective against search engines."
Microsoft's researchers are working with the MSN Search team to see how the search service could be cleaned up, Wang said. Additionally, he called on the web community, especially the operators of blog and free hosting sites, to co-operate to combat the web spam problem.
Wang added: "In the end it is all about protecting the search engines. Because if the spam doesn't show up in any search engine result the spammer will not receive traffic."
Joris Evers writes for CNET News.com
With over 1000 servers and 20 data centres, our mission-critical infrastructure is technically complex and encompasses solutions such as email, web ...
.NET/Windows developer WebFusion, based near Uxbridge, is seeking a Windows developer with sysadmin skills to join their existing development team to ...
My client is seeking a Hosting Architect to join a start up ISP/Managed Service Provider working in their hosting service team to be based in Leeds. ...
CIO50 2008
The silicon.com CIO50 2008 profiles the most influential and innovative tech chiefs in the UK across all industries and organisation size, from the biggest FTSE100 companies to high growth dot-com start ups and the public sector. The list was voted on by the UK CIO community and a panel of experts. Find out more in our latest special report.
Stories from the web...
Copyright ©1995-2008 CNET Networks, Inc. All rights reserved. Top of page
Steve Ranger Editor's Blog: Why we write about the iPhone Is it just because it's so shiny?
Siān Croxon Legal Eye: Trademark landmark Pricking O2's bubble