How to create a simple robots.txt file for your website
17 Comments
robots.txt files tell search engines what they are and aren’t allowed to crawl and index on your Website, by using the robots.txt file properly we can control the way our websites are searched and indexed.
All major search engines check to see if a robots.txt file is present to adhere to any special rules you may have for them, spam bots and the like may ignore this file.
Benefits of using a robots.txt file
- Can prevent duplicate content from being indexed
- Saves bandwidth and keeps logs clean
- Prevents server from continually having to send 404 file not found errors to search engines when they request the robots.txt file each visit
- Prevent system folders and irrelevant files from showing up in Google etc
- Can prevent confidential documents or images from being indexed
- Increases the importance of pages that are allowed to be indexed
- Can prevent users finding restricted pages of your website via search engines
How to create and upload a robots.txt file
- Create a blank notepad/word document called robots and save as plain text (*.txt) (Most word processing programs should have the same functionality)
- Write or copy in the code you want, and save.
- Upload the robots.txt file to the root directory of your Website via FTP
(Usually this is the public_html or www folder) and you’re done.
Allowing Robots Access to Everything
User-agent: * Disallow:
This instructs search engines and robots that they can crawl and index all content.
Denying Robots Access to Everything
User-agent: * Disallow: /
This instructs search engines and robots that they may not crawl or index any content.
Basic Template for a simple robots file
User-agent: * Disallow: /cgi-bin/ Disallow: /*.js$ Disallow: /*.css$ # Google Image User-agent: Googlebot-Image Disallow: Allow: /* # Google AdSense User-agent: Mediapartners-Google* Disallow: Allow: /*
How Robot file rules work
User-agent: *
User-agent is basically a synonym for "Search engines and robots called:"
and, * means anything, so in this case, we are targeting all User-agents.
Disallow: /cgi-bin/ Disallow: /*.js$ Disallow: /*.css$
By default, everything will be accessible to the robots that crawl your website, so we generally only need to specify where they don’t need to go. In most cases you don’t want CSS, JavaScript or system files indexed and showing up in Google.
The Disallow parameter is used to indicate locations that may not be accessed.
A Disallow with /on both ends/ means that folder and all its subfolders and url segments with that in it, may not be indexed.
Disallow /*.css$ means that any file of that type/extension may not be indexed, this could be used for any file type, for example you could prevent confidential pdf’s (.pdf) from being accessed by Google or images (.jpeg, .gif) etc.
# Google Image User-agent: Googlebot-Image Disallow: Allow: /*
Using # at the start of a line is called commenting, this helps us humans make notes to remember what the hell we did, and tells robots to ignore this line.
Here we are targeting the Googlebot-Image User-agent specifically, to let it know that it may access and index all images with Allow: /* and saying that nothing is specifically forbidden by leaving Disallow blank, note that not all robots support Allow.
The reason we specifically target the image bot and adSense bot after we target all User-agents is to make sure that they are not being prevented by a previous rule, for example you could prevent everything in the /news/ section of your website from being indexed, text, files, images - but then you want images to still be indexed, and for adSense to still work.
Referencing a sitemap in your simple robots file
Sitemap: http://www.put-your-domain-name-here.com/sitemap.xml
If you have a sitemap you can inform the search engines where it is here, it’s generally a good idea to keep your sitemap in the root directory of your website. [Create a sitemap]
Denying access to search result pages
Disallow: /*?* Disallow: /*?
By preventing urls that contain question marks from being indexed we can stop search results from being indexed, and let the robots focus on your content/products.
Content Management Systems And Robots
If your running a recent version of WordPress, you can use these fantastic plugins to generate your robots and sitemap files and automate seo to an extent.
Other content management systems may also have plugins or addons that can help generate or maintain robots files, however because of the way each CMS is structured it’s not possible to make a robots file that can cater for everything.
Common Mistakes & Things To Remember
- The robots file is case sensitive, and the file name is all lowercase: robots.txt
- The file name is all lowercase: robots.txt
- Keep to one rule per line
- Old rules may be overwritten by newer rules declared after them
- Anyone may view your robots.txt file, so be careful of what you include
Robots.txt Downloads
Blank robots.txt templateExample robots.txt file
Are there any other great robot.txt rules that you use on your websites?
Share or Save!
Thankyou i realy impress that. Very nice :-)Bloger widget
Thanks for such a good information..!!!!!!1
WOW!!
Looking for this simple “robots.txt” file search for website. I was wondering for the “TXT” file creating, now i solve this file creation.. Thanks
Hi, I have created robots.txt file according to your suggestion, but I have a doubt, how i can place in the blog root directory. Kindly help me to place robots.txt file in root directory. I created blog in blogger site.
Thanks
Ravi
I still haven’t figured out what should do with robot.txt because google said that any changes with robot.txt will not be saved. So, how to use this robot.txt if the hosting and platform are using google. Thanks
Hi there, all the time i used to check weblog posts here in the early hours in the daylight, as i love to
gain knowledge of more and more.
Attractive section of content. I just stumbled upon your site and
in accession capital to assert that I acquire actually enjoyed account your blog posts.
Any way I will be subscribing to your augment and even I achievement you access consistently quickly.
It’s hard to find your blog in google. I found it on 17 spot, you should build
quality backlinks , it will help you to rank to google top 10.
I know how to help you, just search in google - k2 seo tips and tricks
What’s up it’s me, I am also visiting this web page daily,
this web page is genuinely nice and the visitors are
genuinely sharing good thoughts.
[...] Source Posted in: SEO ⋅ Tagged: google robots, robots.txt, robotstxt [...]
Hi! I know thus is kinda off topic but I was wondering which blog platfrorm are you using for this website?
I’m getting sick and tired of WordPress because I’ve had issues with hackers and I’m looking
at alternatives for another platform. I would be great if you could point me in the direction of a good platform.
Would you mind if I quote a few of your blogposts as long as I provide credit and sources returning to your weblog: %BLOGURL%. I most certainly will aslo make certain to give you the proper anchor-text link using your webpage title: %BLOGTITLE%. Be sure to let me know if this is ok with you. Thankyou
Enable round hit several breaths into it to really make the end of the tube and
to cool somewhat.
my blog: hello kitty bong for sale ebay (Kandi)
Thanks for every other informative website.
The place else may I am getting that type of information written in such a perfect method?
I have a project that I am simply now operating on, and I’ve
been on the look out for such info.
This is realⅼy attention-grabbing, Уоu аrе
an overly skilled blogger. І Һave joined yоur rss
feed аnd lооk forward tߋ looking fоr extra օf үοur wonderful post.
Аlso, Ι’vе shared үour web site іn my social networks
17 Comments [ Leave a comment ]