# Allow all for Googlebot # User-agent: Googlebot # Disallow: # Disallow all for Googlebot-image User-agent: Googlebot-Image Disallow: / User-agent: * Crawl-delay: 180 # Once in 180 secs Disallow: /cgi-bin/ Disallow: cgi-bin Disallow: /tmp/ Disallow: tmp Disallow: _ #phpBB #wiki # Known Search Engine bots / spiders # # Inktomi # User-agent: Slurp # WiseNut/LookSmart User-agent: ZyBorg Disallow: / # Fast/AllTheWeb User-agent: fast Disallow: / # OpenFind User-agent: Openbot Disallow: / # Alta Vista User-agent: Scooter ## Ban Spambots in .htaccess # # SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot # SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot # SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot # SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot # SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot # SetEnvIfNoCase User-Agent "^Teleport" bad_bot # SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot # SetEnvIfNoCase User-Agent "^LinkWalker" bad_bot # SetEnvIfNoCase User-Agent "^Zeus" bad_bot # # # Order Allow,Deny # Allow from all # Deny from env=bad_bot # # # Use robots.txt to control what you want robots.txt-compliant 'good' spiders to *fetch* # -- bandwidth control, in other words. If they are nice by asking, give them a polite robots.txt # reply. I specifically said 'fetch' here, because that is what is accomplished. Some search # engines, including some of the majors, don't need to fetch a page to list it in their results; # They can create a search result based on links they find on other sites pointing to your page, # and the link text associated with that link. I find this annoying, but that leads us to... # # Use the on-page meta-robots tag to control what search engines *list* in their results. # (If you mark a page as "noindex," then you must allow it to be fetched in robots.txt -- # otherwise the spider can't fetch and read the page to find the robots meta tag.) # # Use .htaccess to stop rogue spiders that don't fetch, or that fetch and ignore robots.txt, # and to insure that good spiders don't wander into forbidden territory due to a bug in their # code or an error in your robots.txt or on-page meta-robots tags.