Consejos y Golosinas
Consejos Sueños Horóscopo Recetas
TipsAndTreats.com
Inicio | Consejos | Escritores | Trata | Feed | Mapa Del Sitio | Consejos eZine | Publicidad
Temas de actualidad
Horóscopo Diario
Tips And Treats On Mobile TAT Móvil
   
Distribuir Nuestro RSS
 
Acne Control Tips

Robots Text File Tips

The Robots.txt file is an exclusion standard required by allweb crawlers/robots to tellText File Robots Text Filethemwhat files and directories that you want them to stay OUT of on yoursite. Not all crawlers/bots follow the exclusion standard and will continuecrawling your site anyway ("Bad Bots"). We block them by IPexclusion.

This is a very simple overview of robots.txt basics for webmasters.For a complete and thorough lesson, visit Robotstxt.org.

Last Updated - 12th November 2005

To see the proper format for a somewhat standard robots.txt file lookdirectly below. That file should be at the root of the domain becausethat is where the crawlers expect it to be, not in some secondary directory.

Below is the proper format for a robots.txt file ----->

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/

User-agent: msnbot
Crawl-delay: 10

User-agent: Teoma
Crawl-delay: 10

User-agent: Slurp
Crawl-delay: 10

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

--------> End of robots.txt file

This tiny text file is saved as a plain text document and ALWAYS withthe name "robots.txt" in the root of your domain.

A quick review of the listed information from the robots.txt file abovefollows. The "User Agent: MSNbot" is from MSN, Slurp is fromYahoo and Teoma is from AskJeeves. The others listed are "Bad"bots that crawl very fast and to nobody's benefit but their own, sowe ask them to stay out entirely. The * asterisk is a wild card thatmeans "All" crawlers/spiders/bots should stay out of thatgroup of files or directories listed.

The bots given the instruction "Disallow: /" means they shouldstay out entirely and those with "Crawl-delay: 10" are thosethat crawled our site too quickly and caused it to bog down and overusethe server resources. Google crawls more slowly than the others anddoesn't require that instruction, so is not specifically listed in theabove robots.txt file. Crawl-delay instruction is only needed on verylarge sites with hundreds or thousands of pages. The wildcard asterisk* applies to all crawlers, bots and spiders, including Googlebot.

Those we provided that "Crawl-delay: 10" instruction to wererequesting as many as 7 pages every second and so we asked them to slowdown. The number you see is seconds and you can change it to suit yourserver capacity, based on their crawling rate. Ten seconds between pagerequests is far more leisurely and stops them from asking for more pagesthan your server can dish up.

(You can discover how fast robots and spiders are crawling by lookingat your raw server logs - which show pages requested by precise timesto within a hundredth of a second - available from your web host orask your web or IT person. Your server logs can be found in the rootdirectory if you have server access, you can usually download compressedserver log files by calendar day right off your server. You'll needa utility that can expand compressed files to open and read those plaintext raw server log files.)

To see the contents of any robots.txt file just type robots.txt afterany domain name. If they have that file up, you will see it displayedas a text file in your web browser. Click on the link below to see thatfile for Amazon.com

http://www.Amazon.com/robots.txt

You can see the contents of any website robots.txt file that way.

Most webmasters instruct the bots to stay out of "image"directories and the "cgi-bin" directory as well as any directoriescontaining private or proprietary files intended only for users of anintranet or password protected sections of your site. Clearly, you shoulddirect the bots to stay out of any private areas that you don't wantindexed by the search engines.

The importance of robots.txt is rarely discussed by average webmasters.This should be standard knowledge by webmasters at substantial companies.

The search engine spiders really do want your guidance and this tinytext file is the best way to provide crawlers and bots a clear signpostto warn off trespassers and protect private property - and to warmlywelcome invited guests, such as the big three search engines while askingthem nicely to stay out of private areas.



Disclaimer: The Robots Text File Tips / Informationpresented and opinions expressed herein are those of the authors anddo not necessarily represent the views of TipsAndTreats.com and/orits partners.

Login | Contacto | FAQ | Términos | Política Anti Spam

Page protegida contra copia el contenido del sitio web de infracción por Copyscape
Copyright © TipsAndTreats.com. Todos los derechos reservados.
La reproducción total o parcial en cualquier forma o medio sin el permiso escrito está prohibida.
El uso de este sitio web está sujeto a los términos y condiciones.
Los enlaces rotos? Problema con el sitio? Enviar email a admin@tipsandtreats.com
© Consejos y trata. Un sitio web basado en la información (2005-2012)