How does Google Know you are Cloaking?
Asked Answered
T

7

13

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?

It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?

Thanks for your put on this one.

Tombola answered 10/12, 2009 at 2:51 Comment(1)
Voting to close as off-topic: migrate to Webmasters.SE!Luminance
L
4

As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.

Lessor answered 10/12, 2009 at 3:3 Comment(0)
J
2

Google looks at your site while presenting user-agent's other than googlebot.

Jutta answered 10/12, 2009 at 2:54 Comment(5)
They do? And does this other user-agent still identify itself as some kind of robot? If not, would that not be very sneaky on Google's part?Nigrosine
Even different user agents can't help Google tell if a browser has used the z-index to overlay a div to hide certain content from view - does this qualify as "cloaking"?Masquerade
@jdk: google has created a browser with a rendering engine. They very well could tell.Styrene
Okay then, that's kind of what I posted below as a solution - I wasn't sure if my understanding really met the definition of cloaking but it appears it does or is close enough.Masquerade
@Thilo: Sneaky? I guess different people with have different takes, but I think it is OK as long as they respect robots.txtPredecease
M
2

See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.

alt text

Masquerade answered 10/12, 2009 at 3:1 Comment(3)
Can you explain a little how this (which is about automated testing of a rendering engine) relates to cloak detection?Nigrosine
I'm speculating technology could be repackaged like "what the browser thinks it's displaying" and applied to what the Googlebot actually scrapes. It wouldn't be unlike TestSwarm for jQuery testswarm.com but Google would use server farms for it. Yah, it's out there but it has shreds of viability.Masquerade
My explanation is probably not very clear but basically I'm saying if Google (via Chrome) can create technology that demonstrates a difference between what a web browser "thinks" it sees and what is actually seen, then the idea is not infeasible that they can also have other technologies comparing the "thinking" vs "seeing" web world.Masquerade
O
2

In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol

Onset answered 7/12, 2010 at 0:53 Comment(0)
G
2

Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.

Gym answered 22/8, 2011 at 9:14 Comment(0)
U
1

A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.

This would detect most suspect candidates for closeer examination.

Upholster answered 13/5, 2016 at 14:28 Comment(0)
L
1

They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not

Literacy answered 19/7, 2017 at 13:7 Comment(1)
Technically you answer is correct but it would be better to provide links to back the information you have provided to improve the quality of this answer.Meteoritics

© 2022 - 2024 — McMap. All rights reserved.