Site and robots
Your Salesforce Site is for Customers, not for bots!
You have created a pretty Salesforce site to manage your business, and you want to make it cost effective.There are some limits on the platform in term of CPU, bandwidth and pages viewed per day, if you reach those limits, you probably will have to pay to increase them (for instance changing from an enterprise edition to an unlimited edition).
The resource consumption should be useful, that means targeted to your expected visitors. Did you know that most of the visits are not real human visits ?
The web is not a beautiful place with only friendly people. The underground web is based on machines. These "bots" (robots) are downloading pages for good or bad reasons. And each time they get a page from your site, it's part of the available resources they consume. The issue is the ratio between humans and bots. If a website has not put in place any protection, you will get more trafic from bots than humans.
How to optimize resources ? The first step is to prevent crawling from bad bots. Of course, you will have to make a choice between "good" and "bad". For instance, google, bing, baidu and a few other are crawling the web to make you apear in the search results. Don't block them as they will give your real visitors. On the other side, you have a few bots that are crawling the web to get content information that will be sold: they consume your resources and you don't get money from them - stop them. You can even have bots that will harvest email addresses from your pages, or try to identify security issues (such as a form that is not protected by a captcha). You absolutely need to block them.
The quick win is that Salesforce is providing you a simple standard option to tell the bots they are not wecome: the use of a standard file called "robots.txt" (the file is common for all your salesforce sites). You just have to define a list and associated rights. Pay attention to the fact that very bad bots don't read this file, they won't be blocked.
By default, salesforce will prevent all bots for non production orgs (dev edition etc.). You absolutely need to define a robots.txt for your production org. The syntax is quite simple, but the content is not easy to define: how can you know which robots to put in the file? The following content is a VisualForce page that you will have to add to your org, and then point to this VF page in your Salesforce site configuration, and voila! Taking 5 minutes to do this can spare lots of money.
<apex:page contentType="text/plain" showHeader="false"> User-agent: * Disallow: /</apex:page>