How to know if a server is running Node.js?
Asked Answered
L

1

6

I was wondering how w3techs knows when a given server uses Node.js. http://w3techs.com/technologies/details/ws-nodejs/all/all

I'm guessing they look at some specific http headers.

For example: X-Powered-By:Express

But not every node module generate such headers.

Do you know any other ways or similar fingerprints generated by popular node modules ?

Larkin answered 22/9, 2014 at 15:36 Comment(6)
There are a lot of characteristics for fingerprinting servers, from the data they respond with down to the way packets are chunked up to be sent. But, you're right in that there is no foolproof way, especially on a Node.js server that doesn't even have to run HTTP, and is very flexible.Pianoforte
So there is no solution ?Archi
Not really. They could also run something like Nginx in front and you'd have an even harder time telling what the backend is written in.Capuchin
@JessieA.Morris can we assume that it's just a nodejs server without nginx or anything.Router
Even still, fingerprinting methods are good, but you're never going to be 100% sure. You could change the node HTTP module to act like Apache or Java or some other server. You can't know with 100% certainty what a server is running via HTTP.Capuchin
At 90% of cases, if a front server is used with node.js, Nginx is used. And Nginx only upstream data to the Node.js server, right ? Would it be possible from a system to see how buffer are chunked ?Archi
A
6

As already pointed out by comments by @brad, @jessie-a-morris, @tknew there is no easy to use method available that you might easily reuse in your own analyzer.

Quoting from w3techs's own information:

http://w3techs.com/faq especially chapter "How exactly does your website analyzer work?" explains

...We search for specific patterns in the web pages that identify the usage of technologies, similarly to the way a virus scanner searches for patterns in a file to identify viruses. We use a combination of regular expressions and DOM traversal for this search. We have identified several thousand indicators for technology usage. These indicators have different priorities, and based on the presence or absence of specific combinations of indicators in a specific context, we come to our conclusions.

These are examples of the information used by the indicators:

  • HTML elements of web pages
  • Specific HTML tags, for example the generator meta tag
  • JavaScript code
  • CSS code
  • The URL structure of a site
  • Offsite links
  • HTTP headers, for example cookies
  • HTTP responses to specific requests, for example compression

A lot of research was necessary to build the analyzer, and we keep improving it all the time. We want it to be the best possible website analyzer...

and http://w3techs.com/disclaimer points out that

...In order to obtain any information from websites, we rely on the websites themselves, their owners or their webmasters to provide such information. Some websites are more open to sharing this type of information than others. Some technologies may provide more means to reveal information about their usage than others...

and more "we may not", "in some cases", "some technologies", "inaccurate results" follows

Alansen answered 23/9, 2014 at 17:41 Comment(2)
while this does answer the question I imagine OP was look for in depth analysis of the methods, but good job!Router
@SleepDeprivedBulbasaur I totally agree that my answer does not "explain" anything. At first I just meant to comment under the original OP's question "what did you try already? did you ask at w3techs how they do it?" but it would get hidden under the "show x more comment" button. Publishing w3techs's several thousand prioritized probabilistic indicators would be the only acceptable answer to this strange (in the sense of how is that useful) questionAlansen

© 2022 - 2024 — McMap. All rights reserved.