Fighting with unwanted bots

November 1, 2014 at 12:49 am

Many webmasters do not even imagine how many bots is visiting their websites.
It’s not only bots of popular search engines, but big thousands of automats browsing your sites and content for many different reasons.
Most often they only eat your bandwidth, sometimes traveling throughout multiple pages within one second.

For one real visit there can be even 20 bots spidering your site.
Banning them by IP is just waste of time. First of all most are not able to catch such IPs, second, bots are changing IPs constantly.

One of many solutions is to block unwanted bots by host name.
It’s pretty easy to extract hostname and if it exists send unwanted bot to some other location.

We keep a list of bot’s hostnames in an array, listed one by one separated by comas. Then compare current visitor’s hostname against such bots list.
Here’s php example with long list of unwanted bots and code with redirection to Wikipedia bot’s definition.

$hosts=array('fastwebserver.de','videotron.ca','hipuu.com.py','hosting','11.com','eonet.ne.jp','cloud-ips.com','192-99-149.net','broadband.corbina.ru','ahrefs.com','webazilla.com','getisys.net','steadfastdns.net','res.bhn.net','loadtime.net','weservit.nl','keepvid','rdns.100tb.com','sysms.net','reverse.wowrack.com','ovh.net','unmetered.com','startdedicated.com','configcenter.info','ionic.pl','gameservers.net','meanpathbot.com','sistrix.net','serverloft.com','wordpress.com','digitalfiretruck','datapoint.ru','kyivstar','bise.eu','fetcher','archive.org','yandex','stratoserver.net','sitetruth.com','baidu.com','infiumhost.com','steephost.net','estpak.ee','reverse.gdsz.cncnet.net','serverloft.eu','rdns.rogmeo.com','grapeshot.co.uk','reverse.softlayer.com','rdsnet.ro','leaseweb.com','hide.services','clients.your-server.de','proxad.net','dedicatedpanel.com','amazonaws.com','codesearch','42.0.7','robot.acoon.de','exabot','buyurl','monitorengine.com','kimsufi.com','vodafone.in','seokicks.de','seomastering.com');

$hostname = strtolower(gethostbyaddr($user_ip));
foreach($hosts as $hs) {
	if(strstr($hostname,strtolower($hs)) ) {
		header('Location: http://en.wikipedia.or/wiki/Internet_bot');
                die(); 
	}
}

Best is to set such php code on top, before site’s code begins.
For dynamic scripts, good place is some script’s configuration php file.

Of course only small percent of bots can be identified by hostname.
But even with such small list of hosts provided above huge thousands of bots can be blocked succesfully and there’s no limit to add new hosts to the list.