The recent C2C battle between Baidu and Taobao has been intense. Taobao blocked Baidu's web crawler, and subsequently, Sohu Blogs, 51.com, Xiaonei (a social networking site), and Hainei also blocked Baidu’s spider. Have you ever thought about checking their robots.txt files?
Here are the addresses where you can go and take a look for fun, or out of sheer boredom if you like:
Taobao: http://www.taobao.com/robots.txt
-----------------------------------------------------Evil dividing line------------------------
User-agent: Baiduspider
Disallow: /
User-agent: baiduspider
Disallow: /
-----------------------------------------------------Evil dividing line------------------------
Xiaonei: http://www.xiaonei.com/robots.txt
-----------------------------------------------------Evil dividing line------------------------
# Robots.txt file from http://www.xiaonei.com
# All robots will spider the domain
User-agent: BaiduSpider
Disallow: /
-----------------------------------------------------Evil dividing line------------------------
Hainei: http://www.hainei.com/robots.txt
It’s fierce — all directories are off-limits to all search engine crawlers…