Prima paginaLiveHosting WebsiteForumBlog
Language
 
Home>Knowledge Base>Linux Hosting>How to write robot.txt to control search engine spider
User Login
Username
Password
 
 Login
Information
Article ID219
Created On4/29/2011
Modified4/29/2011
Share With Others

How to write robot.txt to control search engine spider

What is Web Robot?
A robot is a program that automatically traverses the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. (From: robotstxt.org)
Web robot sometimes also call as web crawler, web spider, web wanderer.

What robot do?
Once your site got scan by robot, your site will probably get index by the search engine. Most of the time, these robots are program that written by search engine like Google, Yahoo, Alexa, MSN, etc.

What is the use of robot.txt or robots.txt?
robot.txt or robots.txt (plural) is just a simple text file tat use to control how search engine spider or crawler should go thru your site and which spider is not allow to visit your site.

Example of a robot.txt

User-agent: Titan
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: *
Disallow:

Where should i place my robot.txt / robots.txt?
Just place it at http://www.yourdomain.com/robots.txt

What should i write in robot.txt to prevent robot to scan my site?

User-agent: *
Disallow: /