What is Web Robot?
A robot is a program that automatically traverses the Web’s hypertext
structure by retrieving a document, and recursively retrieving all
documents that are referenced. (From: robotstxt.org)
Web robot sometimes also call as web crawler, web spider, web wanderer.
What robot do?
Once your site got scan by robot, your site will probably get index by
the search engine. Most of the time, these robots are program that
written by search engine like Google, Yahoo, Alexa, MSN, etc.
What is the use of robot.txt or robots.txt?
robot.txt or robots.txt (plural) is just a simple text file tat use to
control how search engine spider or crawler should go thru your site and
which spider is not allow to visit your site.
Example of a robot.txt
User-agent: Titan
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: *
Disallow:
Where should i place my robot.txt / robots.txt?
Just place it at http://www.yourdomain.com/robots.txt
What should i write in robot.txt to prevent robot to scan my site?
User-agent: *
Disallow: /