Detecting bots from User Agent

Detecting bots from User Agent

Often it can be very important to detect bots and one approach is using User Agent

Keywords to find bots


Many bots will define themselves as "bot", "crawler" or the function they do, such as "Analytics". By search for these keywords from User Agent, you can detect a good portions of bots. Many sites we can see are defining bots more specific, but we have decided to use certain keywords more widely. So instead of looking for googlebot or bingbot, we just look if bot exist

[snippet]$botUserAgentPettern = '/bot|crawl|slurp|spider|mediapartners|sistrix|summify|analyzer|archiver|webmon|httrack|censysinspect|zgrab|survey|cURL|http|libwww|l9tcpid|bing|google|facebook|coccoc|research|biglotron|GRequests|teoma|convera|gigablast|ptst|Cloudflare|\.com|\.org|python|WhatsApp|speedy|fluffy|bibnum\.bnf|findlink|panscient|IOI|ips-agent|expanseinc|findthatfile|ec2linkfinder|yeti|Aboundex|placid|yanga|Voyager|postrank|CyberPatrol|page2rss|linkdex|ezooms|heritrix|wget|wp_is_mobile|sogou|wotbox|ichiro|drupact|coccoc|integromedb|robot|\.infoproximic|changedetection|WeSEE:Search|SEO|Scaper|binlar|\.net|\.app|AddThis|lipperhey|Qwantify|BUbiNG|ltx71|index|ADmantX|Expanse|java|Request-Promise/i'; //Bot keywords[/snippet]

Using it in PHP


Below is an example on how it can be implemented and we also included check if user agent or accept language is empty, which they often can be for bots

[snippet]function bot_detected() {

//User Agent
$user_agent = $_SERVER['HTTP_USER_AGENT'];

//Languages
$accept_language = $_SERVER['HTTP_ACCEPT_LANGUAGE'];

//Bot keywords to look for
$botUserAgentPettern = '/bot|crawl|slurp|spider|mediapartners|sistrix|summify|analyzer|archiver|webmon|httrack|censysinspect|zgrab|survey|cURL|http|libwww|l9tcpid|bing|google|facebook|coccoc|research|biglotron|GRequests|teoma|convera|gigablast|ptst|Cloudflare|\.com|\.org|python|WhatsApp|speedy|fluffy|bibnum\.bnf|findlink|panscient|IOI|ips-agent|expanseinc|findthatfile|ec2linkfinder|yeti|Aboundex|placid|yanga|Voyager|postrank|CyberPatrol|page2rss|linkdex|ezooms|heritrix|wget|wp_is_mobile|sogou|wotbox|ichiro|drupact|coccoc|integromedb|robot|\.infoproximic|changedetection|WeSEE:Search|SEO|Scaper|binlar|\.net|\.app|AddThis|lipperhey|Qwantify|BUbiNG|ltx71|index|ADmantX|Expanse|java|Request-Promise/i';

//Compare pattern with user agent, to check if bot
if((preg_match($botUserAgentPettern, $user_agent)) || empty($user_agent) || empty($accept_language))
{
//This is a bot
return true;
}
else
{
//Not a bot (most likely)
return false;
}
}[/snippet]

Now you can easy call this function where you need it in your code

[snippet]// Check if not bot
if(bot_detected() == false){//User is not bot}

// Check if user is bot
if(bot_detected() == true){//User is bot}
[/snippet]

Bad bots


Please note that many bad bots will try to mask that they're bots and it's not enough to only use User Agent to detect these kind of bots. User Agent field is easy to manipulate to appear that you are something other than what you actually are

For good bots and services, you can detect most of them through user agent



Tags: #DetectingBots #UserAgent

We sometimes publish affiliate links and these always needs to follow our editorial policy, for more information check out our affiliate link policy

You might also like

Comments

Sign up or Login to post a comment

There are no comments, be the first to comment.