Website owners have a lot more control over the way search engines look at their site than they may realize. Left to their own devices, search engines – or, rather, their robots – crawl the web looking into every nook and cranny they can find. When a search engine’s robot arrives on your site, it will look into every directory it can find. That includes your main website, directories containing images (and sub directories), and even data directories. It was rumored at one stage that Google’s robots had read and stored personal information that businesses had stored on their websites – both customers and employees.
One of the biggest problems confronting search engines now is the amount of information that can be found online. There are billions and billions of pages, and probably far more images and videos. Most businesses have images stored on their website, and given the poor online housekeeping of most businesses, there are bound to be a lot images that are outdated. Yet a search engine’s robot will still search through them all. Search engines are struggling with the amount of data on the web, and they are now recognizing that there must be a limit. Google, for example, strongly urges website owners to include a file, known as the ‘Robots.txt’ file, that includes instructions on what can and cannot be searched through.
There are several benefits to website owners that can be gained by having a well optimized robots.txt file. The first is bandwidth – the longer a search engine’s robot is searching across your site, the more bandwidth it uses. By limiting its scope, you are reducing the amount of bandwidth used.
Of more importance is the SEO benefits to using a robots.txt file. If you use a content management system like WordPress, your content can be stored in several places at once – stored by month in the archives, stored by tags, and stored by category. That is on top of the ‘original’ page. A robots.txt file can be used to prevent a search engine’s robot from searching and indexing content in each of those supplementary stores – this reduces a problem known as duplicate content. Duplicate content, as the name suggests, is the same content that is found in several places. The robots.txt file can pass instructions that reduces this – and that certainly makes a search engine happy.
You can also use the robots.txt file to bar access to any area of your website that you don’t want a search engine’s robot to crawl. You can bar access to one search engine, for example Google, while allowing access to every other search engine.
A robots.txt file, if used wisely, can fine tune a website, allowing search engines to crawl through content that you do want to have found in the search engines whilst banning the crawling and indexing of private content, or content that is not of value to searchers. Once you have created a robots.txt file, simply upload it to the root directory of your website. Search engines look for it every time they arrive on a website, and they do generally follow the directions provided.