How to create a simple robots.txt file for your website

17 Comments

robots.txt files tell search engines what they are and aren’t allowed to crawl and index on your Website, by using the robots.txt file properly we can control the way our websites are searched and indexed.

All major search engines check to see if a robots.txt file is present to adhere to any special rules you may have for them, spam bots and the like may ignore this file.

Benefits of using a robots.txt file

Can prevent duplicate content from being indexed
Saves bandwidth and keeps logs clean
Prevents server from continually having to send 404 file not found errors to search engines when they request the robots.txt file each visit
Prevent system folders and irrelevant files from showing up in Google etc
Can prevent confidential documents or images from being indexed
Increases the importance of pages that are allowed to be indexed
Can prevent users finding restricted pages of your website via search engines

How to create and upload a robots.txt file

Create a blank notepad/word document called robots and save as plain text (*.txt) (Most word processing programs should have the same functionality)
Write or copy in the code you want, and save.
Upload the robots.txt file to the root directory of your Website via FTP
(Usually this is the public_html or www folder) and you’re done.

Allowing Robots Access to Everything

User-agent: *
Disallow:

This instructs search engines and robots that they can crawl and index all content.

Denying Robots Access to Everything

User-agent: *
Disallow: /

This instructs search engines and robots that they may not crawl or index any content.

Basic Template for a simple robots file

User-agent: *
Disallow: /cgi-bin/
Disallow: /*.js$
Disallow: /*.css$
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

How Robot file rules work

User-agent: *

User-agent is basically a synonym for "Search engines and robots called:"
and, * means anything, so in this case, we are targeting all User-agents.

Disallow: /cgi-bin/
Disallow: /*.js$
Disallow: /*.css$

By default, everything will be accessible to the robots that crawl your website, so we generally only need to specify where they don’t need to go. In most cases you don’t want CSS, JavaScript or system files indexed and showing up in Google.

The Disallow parameter is used to indicate locations that may not be accessed.

A Disallow with /on both ends/ means that folder and all its subfolders and url segments with that in it, may not be indexed.

Disallow /*.css$ means that any file of that type/extension may not be indexed, this could be used for any file type, for example you could prevent confidential pdf’s (.pdf) from being accessed by Google or images (.jpeg, .gif) etc.

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

Using # at the start of a line is called commenting, this helps us humans make notes to remember what the hell we did, and tells robots to ignore this line.

Here we are targeting the Googlebot-Image User-agent specifically, to let it know that it may access and index all images with Allow: /* and saying that nothing is specifically forbidden by leaving Disallow blank, note that not all robots support Allow.

The reason we specifically target the image bot and adSense bot after we target all User-agents is to make sure that they are not being prevented by a previous rule, for example you could prevent everything in the /news/ section of your website from being indexed, text, files, images - but then you want images to still be indexed, and for adSense to still work.

Referencing a sitemap in your simple robots file

Sitemap: http://www.put-your-domain-name-here.com/sitemap.xml

If you have a sitemap you can inform the search engines where it is here, it’s generally a good idea to keep your sitemap in the root directory of your website. [Create a sitemap]

Denying access to search result pages

Disallow: /*?*
Disallow: /*?

By preventing urls that contain question marks from being indexed we can stop search results from being indexed, and let the robots focus on your content/products.

Content Management Systems And Robots

If your running a recent version of WordPress, you can use these fantastic plugins to generate your robots and sitemap files and automate seo to an extent.

Other content management systems may also have plugins or addons that can help generate or maintain robots files, however because of the way each CMS is structured it’s not possible to make a robots file that can cater for everything.

Common Mistakes & Things To Remember

The robots file is case sensitive, and the file name is all lowercase: robots.txt
The file name is all lowercase: robots.txt
Keep to one rule per line
Old rules may be overwritten by newer rules declared after them
Anyone may view your robots.txt file, so be careful of what you include

Robots.txt Downloads

Blank robots.txt template Example robots.txt file

Are there any other great robot.txt rules that you use on your websites?

Share or Save!

In: Development, Tutorials Tagged: robots.txt, SEO

3D CSS Shadow Text Tutorial

List Of Web Design Tutorials Perfect For Beginners

17 Comments [ Leave a comment ]

Moncler

July 13, 2011 at 5:35 pm #1

i like your article very much.thanks for you share with us

Vdbrwaqa

August 1, 2011 at 1:54 am #2

It’s serious young girls lolita fantasy %-DD

ocen

March 29, 2012 at 11:25 am #3

Thankyou i realy impress that. Very nice :-)Bloger widget

Mayank Patel

September 15, 2012 at 9:17 am #4

Thanks for such a good information..!!!!!!1

Indian restaurant

March 22, 2013 at 8:09 am #5

WOW!!
Looking for this simple “robots.txt” file search for website. I was wondering for the “TXT” file creating, now i solve this file creation.. Thanks

Raviraja

June 26, 2013 at 8:25 am #6

Hi, I have created robots.txt file according to your suggestion, but I have a doubt, how i can place in the blog root directory. Kindly help me to place robots.txt file in root directory. I created blog in blogger site.

Thanks
Ravi

Resep masakan

October 2, 2013 at 3:31 am #7

I still haven’t figured out what should do with robot.txt because google said that any changes with robot.txt will not be saved. So, how to use this robot.txt if the hosting and platform are using google. Thanks

Projektowanie Ulotek

February 28, 2014 at 7:12 am #8

Hi there, all the time i used to check weblog posts here in the early hours in the daylight, as i love to
gain knowledge of more and more.

steroid abuse mental health

May 4, 2014 at 11:58 am #9

Attractive section of content. I just stumbled upon your site and
in accession capital to assert that I acquire actually enjoyed account your blog posts.
Any way I will be subscribing to your augment and even I achievement you access consistently quickly.

Tricia

July 12, 2014 at 8:28 pm #10

It’s hard to find your blog in google. I found it on 17 spot, you should build
quality backlinks , it will help you to rank to google top 10.

I know how to help you, just search in google - k2 seo tips and tricks

temizlik şirketleri ankara

October 13, 2014 at 3:38 am #11

What’s up it’s me, I am also visiting this web page daily,
this web page is genuinely nice and the visitors are
genuinely sharing good thoughts.

How to create robots.txt file - Designers Mantra

November 2, 2014 at 9:28 pm #12

[...] Source Posted in: SEO ⋅ Tagged: google robots, robots.txt, robotstxt [...]

Karla

December 22, 2014 at 8:54 pm #13

Hi! I know thus is kinda off topic but I was wondering which blog platfrorm are you using for this website?
I’m getting sick and tired of WordPress because I’ve had issues with hackers and I’m looking
at alternatives for another platform. I would be great if you could point me in the direction of a good platform.

air huarache blanche

June 27, 2015 at 6:09 am #14

Would you mind if I quote a few of your blogposts as long as I provide credit and sources returning to your weblog: %BLOGURL%. I most certainly will aslo make certain to give you the proper anchor-text link using your webpage title: %BLOGTITLE%. Be sure to let me know if this is ok with you. Thankyou

Kandi

July 3, 2015 at 5:09 am #15

Enable round hit several breaths into it to really make the end of the tube and
to cool somewhat.

my blog: hello kitty bong for sale ebay (Kandi)

binary options indicators

July 9, 2015 at 11:21 am #16

Thanks for every other informative website.
The place else may I am getting that type of information written in such a perfect method?
I have a project that I am simply now operating on, and I’ve
been on the look out for such info.

seo 69200 venissieux

March 14, 2016 at 10:38 am #17

This is realⅼy attention-grabbing, Уоu аrе
an overly skilled blogger. І Һave joined yоur rss
feed аnd lооk forward tߋ looking fоr extra օf үοur wonderful post.
Аlso, Ι’vе shared үour web site іn my social networks

How to create a simple robots.txt file for your website

17 Comments

Benefits of using a robots.txt file

How to create and upload a robots.txt file

Allowing Robots Access to Everything

Denying Robots Access to Everything

Basic Template for a simple robots file

How Robot file rules work

Referencing a sitemap in your simple robots file

Denying access to search result pages

Content Management Systems And Robots

Common Mistakes & Things To Remember

Robots.txt Downloads

Share or Save!

3D CSS Shadow Text Tutorial

List Of Web Design Tutorials Perfect For Beginners

Leave a Reply Cancel reply

Articles

categories

Tags

The site & me

Blog Categories

Out & About