You can specify which sections of your site you would like search engines and web crawlers to index, and which sections they should ignore. To do this, you specify directives in a robots.txt file, and place the robots.txt file in your document root directory.
The directives used in a robots.txt file are straightforward and easy to understand. The most commonly used directives are User-agent, Disallow, and Crawl-delay. Here are some examples:
User-agent: * Disallow:
In this example, any crawler (specified by the User-agent directive and the asterisk wildcard) can access any file on the site.
User-agent: * Disallow: /
In this example, all crawlers are instructed to ignore all files on the site.
User-agent: * Disallow: /scripts/
In this example, all crawlers are instructed to ignore the scripts directory.
User-agent: * Disallow: /documents/index.html
In this example, all crawlers are instructed to ignore the documents/index.html directory.
User-agent: * Crawl-delay: 30
In this example, all crawlers are instructed to wait at least 30 seconds between successive requests to the web server.
For more information about the robots.txt file, please vist http://www.robotstxt.org.