Detect social bots in Node Express
Asked Answered
B

3

9

I'm trying to detect for either of the following 2 options:

  • A specific list of bots (FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider)
  • Any bots that don't support the Crawable Ajax Specification

I've seen similar questions (How to recognize Facebook User-Agent) but nothing that explains how to do this in Node and Express.

I need to do this in a format like this:

app.get("*", function(req, res){ 
  if (is one of the bots) //serve snapshot
  if (is not one of the bots) res.sendFile(__dirname + "/public/index.html");
});
Briarroot answered 25/3, 2015 at 16:51 Comment(0)
A
7

What you can do is use the request.headers object to check if the incoming request contains any UA information specific to that bot. A simple example.

Node

var http = require('http');

var server = http.createServer(function(req, res){

    if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */


});

server.listen(8080);

Express

var http = require('http');
var express = require('express');
var app = express();

app.get('/', function(req, res){

    if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */


});

app.listen(8080);
Ashtray answered 25/3, 2015 at 17:0 Comment(3)
OK thanks, but what are the Bot specific UAs for the list of social bots I posted?Briarroot
I'm not sure, but they're going to be the same regardless of the platform of your server. The link you provided lists one for Facebook. I'd recommend investigating that elsewhere.Ashtray
If you just google for the user agent of the bot you'll likely find results.Ashtray
A
11

You can check the header User-Agent in the request object and test its value for different bots,

As of now, Facebook says they have three types of User-Agent header values ( check The Facebook Crawler ), Also twitter has a User-Agent with versions ( check Twitter URL Crawling & Caching ), the below example should cover both bots.

Node

var http = require('http');
var server = http.createServer(function(req, res){

    var userAgent = req.headers['user-agent'];
    if (userAgent.startsWith('facebookexternalhit/1.1') ||
       userAgent === 'Facebot' ||
       userAgent.startsWith('Twitterbot') {

        /* Do something for the bot */
    }
});

server.listen(8080);

Express

var http = require('http');
var express = require('express');
var app = express();

app.get('/', function(req, res){

    var userAgent = req.headers['user-agent'];
    if (userAgent.startsWith('facebookexternalhit/1.1') ||
       userAgent === 'Facebot' ||
       userAgent.startsWith('Twitterbot') {

        /* Do something for the bot */
    }
});

app.listen(8080);
Atwater answered 1/3, 2016 at 14:22 Comment(0)
A
7

What you can do is use the request.headers object to check if the incoming request contains any UA information specific to that bot. A simple example.

Node

var http = require('http');

var server = http.createServer(function(req, res){

    if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */


});

server.listen(8080);

Express

var http = require('http');
var express = require('express');
var app = express();

app.get('/', function(req, res){

    if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */


});

app.listen(8080);
Ashtray answered 25/3, 2015 at 17:0 Comment(3)
OK thanks, but what are the Bot specific UAs for the list of social bots I posted?Briarroot
I'm not sure, but they're going to be the same regardless of the platform of your server. The link you provided lists one for Facebook. I'd recommend investigating that elsewhere.Ashtray
If you just google for the user agent of the bot you'll likely find results.Ashtray
P
2

This node express middleware will analyze a bunch of different user agent strings and give you just a "bot==true" or "desktop==true" way to determine. I haven't used it and the readme sounds like it was just a trial project so I don't know how maintained it will be going forward, but it will detect all sorts of bots.

https://github.com/rguerreiro/express-device

Predetermine answered 19/5, 2016 at 23:41 Comment(1)
npm package 'mobile-detect' also detects bots and is what I'm using in my projects. const MobileDetect = require('mobile-detect'), md = new MobileDetect(req.headers['user-agent']); const isMobile = !!md.phone(); let isBot = md.is("Bot");Predetermine

© 2022 - 2024 — McMap. All rights reserved.