A robots.txt file lives at the root of your site.
So, for site www.example.com, the robots.txt file lives at www.example.com/robots.txt. robots.txt is a plain text file that follows the Robots Exclusion Standard.
A robots.txt file consists of one or more rules. Each rule blocks (or or allows) access for a given crawler to a specified file path in that website.
http://example.com/nogooglebot/or any subdirectories.
We will provide a more detailed example later.
Here are some basic guidelines for robots.txt files. We recommend that you read the full syntax of robots.txt files because the robots.txt syntax has some subtle behavior that you should understand.
You can use almost any text editor to create a robots.txt file. The text editor should be able to create standard UTF-8 text files; don't use a word processor, because word processors often save files in a proprietary format and can add unexpected characters, such as curly quotes, which can cause problems for crawlers.
Format and location rules:
http://www.example.com/, the robots.txt file must be located at
http://www.example.com/robots.txt. It cannot be placed in a subdirectory ( for example, at
http://example.com/pages/robots.txt). If you're unsure about how to access your website root, or need permissions to do so, contact your web hosting service provider. If you can't access your website root, use an alternative blocking method such as meta tags.
http://website.example.com/robots.txt) or on non-standard ports (for example,
Disallow: /file.aspapplies to
http://www.example.com/file.asp, but not
The following directives are used in robots.txt files:
User-agent:[Required, one or more per group] The name of a search engine robot (web crawler software) that the rule applies to. This is the first line for any rule. Most Google user-agent names are listed in the Web Robots Database or in the Google list of user agents. Supports the * wildcard for a path prefix, suffix, or entire string. Using an asterisk (
*) as in the example below will match all crawlers except the various AdsBot crawlers, which must be named explicitly. Examples:
# Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all but AdsBot crawlers User-agent: * Disallow: /
Disallow:[At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that should not be crawled by the user agent. If a page, it should be the full page name as shown in the browser; if a directory, it should end in a / mark. Supports the * wildcard for a path prefix, suffix, or entire string.
Allow:[At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that should be crawled by the user agent just mentioned. This is used to override Disallow to allow crawling of a subdirectory or page in a disallowed directory. If a page, it should be the full page name as shown in the browser; if a directory, it should end in a / mark. Supports the * wildcard for a path prefix, suffix, or entire string.
Sitemap:[Optional, zero or more per file] The location of a sitemap for this website. Must be a fully-qualified URL; Google doesn't assume or check http/https/www.non-www alternates. Sitemaps are a good way to indicate which content Google should crawl, as opposed to which content it can or cannot crawl.
Sitemap: https://example.com/sitemap.xml Sitemap: http://www.example.com/sitemap.xml
Other rules are ignored.
A robots.txt file consists of one or more groups, each beginning with a
User-agent line that specifies the target of the groups. Here is a file with two group; inline comments explain each group:
# Block googlebot from example.com/directory1/... and example.com/directory2/... # but allow access to directory2/subdirectory1/... # All other directories on the site are allowed by default. User-agent: googlebot Disallow: /directory1/ Disallow: /directory2/ Allow: /directory2/subdirectory1/ # Block the entire site from anothercrawler. User-agent: anothercrawler Disallow: /