Bad Bots! Why Your Website Should Have Protection Against Them
Bad Bots! But Not The Russian Sort
If you’ve paid attention to the media and politics recently, you’ve probably heard about allegations of bots – Russian bots specifically, that apparently exist to influence US (and now the upcoming Canadian) elections. We here have no opinion on these types of bots, that may affect Twitter and Facebook, and we couldn’t care less, to be honest. But, we are concerned about a different kind of bot that we think are evil – and if you have a website, you might want to give them some attention as well – by outright blocking as many of them as you can.
What Are Good Vs. Bad Website Bots?
Search Engines – The Welcome (Usually) Bots
Okay, so what are these bots that we are talking about? Actually, there are some “good” bots that we welcome to our websites, and these are the main search engine (Google, Bing, DuckDuckGo, Yandex) crawlers that seem to “come along” and go through our web pages. What they are doing is reading the content on websites, and then ultimately, according to their own algorithms, figuring out what your website’s pages are about and how to rank them in their search results.
These crawlers come to your website and generally crawl your site’s pages at a rate that is not harmful to your bandwidth or in a way that would impose high bandwidth use at any time. And usually, we want them to visit and completely index our public facing website pages. There are some situations when we don’t want them to visit and/or index pages (when a site is under development is one reason) and they generally “obey” directives that we place in a file called “robots.txt” in the website’s home directory, or on a webpage’s “robots” meta tag.
Nasty Bots – The Ones We Don’t Want
The internet has thousands of bots that go out and try to find content on websites, but some go even further: Some are very bad, and actually, go out and look for websites and pages that have vulnerable URL’s, for example. We don’t want these types of bots visiting us. Then, there are other types of bots that might provide a service to others (often a paid service) but are using up your bandwidth and server resources – but are not paying you for the privilege of visiting your website, and causing resources you are paying for to be consumed. Yet, they are reselling the data that they discover to others.
Some of these bots are horrible and will crawl multiple URL’s at once! Some are so nasty, that they have hundreds of different IP’s associated with them and often, dozens of these IP’s are crawling your site, all at the same time.
The act of crawling your site like this is not much different than if a human were to load your website pages, over and over – meaning, it requires the transfer of data, consuming bandwidth, and it requires the use of server resources that your site his hosted on.
The purpose of this article is not to get into all the different types of bad bots that exist, and some even quibble with our characterization of them as being “bad.” But many of you may have, out of interest, gone to a website like “Ahrefs” or “Majestic” in order to view the backlink profile of a competitor’s website, or even your own.
How does Ahrefs and Majestic obtain this data in the first place?
By sending out bots to crawl websites. They crawl as many websites as they can, storing data about links and anchor text, in order to sell that data to others who will pay for it. But they never pay you any fee for using up your bandwidth or server resources in order to collect this data. So why let them have this data? We see no good reason for it, and in fact, see many good reasons to just block them completely.
What Bad Bots Can Cause
Bots Can Cause Site Speed Issues
As mentioned above, when a bot comes to visit your site and begins to “crawl” the URL’s (the individual website pages), they are essentially causing the same amount of bandwidth and server resources to be used as when a human comes. One could argue that they only download the “text” of the websites and don’t cause any transfer of image files, but this is not always the case at all. In addition, most sites today are database driven and in order to actually read the content on a URL, the database server is also forced to load – and CPU and RAM is consumed.
What this means is that if you have bots crawling your website, they can slow your website down considerably to real visitors that you want to be visiting your site! Who wants that?
(By the way, if interested, you can read our in-depth discussion of site speed issues here).
Bad Bots Can Facilitate Your Website Being Hacked
Security is a very important part of what we do here – and ensuring updates are applied as soon as possible after an update is released is a high priority. We don’t recommend “automatic” updates for a variety of reasons, which you can read about here – Dangers Of WordPress Auto Updates – and even if your site is “auto-updated,” there still could be a period of time between when your site has a vulnerable version of WordPress Core (or some plugins) and when it actually gets updated.
Many bad bots are coded specifically to go and crawl websites, looking for vulnerable versions of software including URL’s that might be vulnerable. As soon as the bot discovers a potential vulnerability, the action might begin – action including attempting to take advantage of the vulnerability in some way, and for some purpose(s) that the bot writer has in mind.
So obviously, this is not something we want visiting our websites! And I’m sure you do not either!
So How To Block These Bad Bots In The First Place?
Actually, many hosting companies have their own methods of blocking some types of malicious traffic in the first place, but it is not always effective. One method that you can use (let your web designer/maintainer know about this and send them the link to this article – https://ianscottgroup.com/bad-bots-why-your-website-should-have-protection-against-them/) is to use the .htaccess file in your website’s root directory. Again, we’re not going to discuss exactly what an .htaccess file does completely, but one of its abilities is to be able to act as a “firewall” in a way, for your website and refusing traffic based on a number of criteria including IP address and other “footprints.”
There are a number of security experts that work very hard and then share and volunteer information on the footprints of bad bots, so we can all work together to keep clean website traffic and discourage these things from visiting our sites. One great list that is already “coded” correctly for an .htaccess file is this one, that is regularly updated (click here).
If you are comfortable, you can simply copy what is there and paste it into your own .htaccess file (be sure to back it up before you change the file) and then save it. Of course, like any thing else, you should test and monitor your website and performance after making any changes.
By blocking bad bots, you’ll be helping to reduce unnecessary website traffic – and a big plus to you – doing more to keep your website faster!
“I’m Uncomfortable Doing This Myself”
That’s fine – we can help you out. Even many of today’s website “designers” have no clue about these things and don’t know what all can be accomplished with the .htaccess file. But we’ve been doing this since 1997 – and can help you speed up your website while ensuring security and reliability. Contact us now.
Thanks for this! I see all these bots in my Awestats and I can see that combined they are taking a lot of bandwidth. So this is helpful.
One question . Is there a danger of blocking Google and Bing with the code you linked to? I looked at it all quickly but there is so much. Thanks for the answer and advice in advance.
Hi Jim,
Sorry for the late reply. Been busy around here!
Great question. No, there is no danger of blocking major search engines such as Google and Bing with the code we linked to.