Match pattern for all Google search pages
Asked Answered
E

3

5

I'm developing an extension which will perform a certain action on all Google search URLs - but not on other websites or Google pages. In natural language the match pattern is:

  • Any protocol ('*://')
  • Any subdomain or none ('www' or '')
  • The domain string must equal 'google'
  • Any TLD including three-letter TLDs (e.g. '.com') and multi-part country TLDs (e.g. '.co.uk')
  • The first 8 letters of the path must equal '/search?'

Many people say 'to match all google search pages use "*://*.google.com/search?*" but this is patently untrue as it will not match national TLDs like google.co.uk.

Thus the following code does not work at all:

chrome.webRequest.onBeforeRequest.addListener(
  function(details) {
    alert('This never happens');
  }, {
    urls: [
        "*://*.google.*/search?*",
        "*://google.*/search?*",
    ],
    types: ["main_frame"]
  },
  ["blocking"]
);

Using "*://*.google.com/search?*" as the match pattern does work, but I fear I would need a list of every single Google localisation for that to be an effective strategy.

Elderberry answered 19/5, 2014 at 21:49 Comment(3)
You can get that list as a plain text file from google.com/supported_domains It currently has 192 entries.Kassandrakassaraba
Good answer, but there seems to be an upper limit of a few dozen on the number of match patterns you can have. 192 entires is several times more match patterns than the listener will accept... :-(Elderberry
Here's an interesting idea. Split your domain list and register several (identical) listeners. That would bypass the restriction.Franzen
F
5

Unfortunately, match patterns do not allow wildcards for TLDs for security reasons.

You cannot use wildcard match patterns like http://google.*/* to match TLDs (like http://google.es and http://google.fr) due to the complexity of actually restricting such a match to only the desired domains.

For the example of http://google.*/*, the Google domains would be matched, but so would http://google.someotherdomain.com. Additionally, many sites do not own all of the TLDs for their domain. For an example, assume you want to use http://example.*/* to match http://example.com and http://example.es, but http://example.net is a hostile site. If your extension has a bug, the hostile site could potentially attack your extension in order to get access to your extension's increased privileges.

You should explicitly enumerate the TLDs that you wish to run your extension on.

A slightly unrealistic option would be to list all variants with all national TLDs.

Edit: thanks to an incredibly helpful comment by rsanchez, here's an up to date list of all Google domain variants which makes this approach viable.

A realistic option is to inject into a larger set of pages (for instance, all pages), then analyze the URL (with a regexp, for example) and only execute if it matches the pattern you are looking for. Yes, it will be a scarier permissions warning, and you will have to explain it to your users.

Franzen answered 20/5, 2014 at 6:44 Comment(6)
Explicitly enumerating all national TLDs is not slightly unrealistic; it's impossible as there is an upper limit of a few dozen to the number of match patterns the listener will accept. Thanks both for the good answers though!Elderberry
Oops, I just noticed you're trying to block requests and not to inject scripts. I didn't really answer your question properly. For your case, you'll just have to implement filtering yourself, yes, in the listener handler. Inefficient, but you can at least salvage it with a match "*://*/search?*", it will be vastly better than no filter at all. Though, you'll also need host permissions; maybe there is no such limitation there.Franzen
In the end I set the extension permissions and background page match pattern to "http://*/search?*" and "https://*/search?*". Then I used the URI.js library (which the extension was already using for other purposes) to manually ensure that the domain was like google.*.Elderberry
Does that even work in permissions? They expect hosts. Also, you can use "*://" for protocol to avoid duplication.Franzen
Also, you do NOT want to test for "google.*" for the exact same reasons as why it's not allowed in match patterns. Check against that domain list.Franzen
Yes I liked your original answer and my code specifically checks against that eventuality that you gave. I should have been more specific - using URI.js we first find the national TLD. Then we find the domain. If the domain+tld == URI host part (e.g. if google+.co.uk == google.co.uk) then we assume the site is authorised. Thus google+.com != the actual google.someattacker.com. You're right that a more elegant solution would use the domain list though.Elderberry
B
4

Source: https://mcmap.net/q/972345/-how-to-load-a-content-script-on-all-google-international-pages

I was wondering the same and found the same question with a better solution, which introduces the "include_globs" parameters.

"matches":        ["http://*/*", "https://*/*"],
"include_globs":  ["http://www.google.*/*", "https://www.google.*/*"],
Beason answered 21/12, 2016 at 23:27 Comment(1)
here is the link to Match patterns and globsAlate
B
1

You can use match-pattern arrays of arbitrary length (though it slows down the browser when using more than 1000 or so). For your convenience, here is a updated list:

  "matches": [
    "*://*.google.com/*",
    "*://*.google.ad/*",
    "*://*.google.ae/*",
    "*://*.google.com.af/*",
    "*://*.google.com.ag/*",
    "*://*.google.com.ai/*",
    "*://*.google.al/*",
    "*://*.google.am/*",
    "*://*.google.co.ao/*",
    "*://*.google.com.ar/*",
    "*://*.google.as/*",
    "*://*.google.at/*",
    "*://*.google.com.au/*",
    "*://*.google.az/*",
    "*://*.google.ba/*",
    "*://*.google.com.bd/*",
    "*://*.google.be/*",
    "*://*.google.bf/*",
    "*://*.google.bg/*",
    "*://*.google.com.bh/*",
    "*://*.google.bi/*",
    "*://*.google.bj/*",
    "*://*.google.com.bn/*",
    "*://*.google.com.bo/*",
    "*://*.google.com.br/*",
    "*://*.google.bs/*",
    "*://*.google.bt/*",
    "*://*.google.co.bw/*",
    "*://*.google.by/*",
    "*://*.google.com.bz/*",
    "*://*.google.ca/*",
    "*://*.google.cd/*",
    "*://*.google.cf/*",
    "*://*.google.cg/*",
    "*://*.google.ch/*",
    "*://*.google.ci/*",
    "*://*.google.co.ck/*",
    "*://*.google.cl/*",
    "*://*.google.cm/*",
    "*://*.google.cn/*",
    "*://*.google.com.co/*",
    "*://*.google.co.cr/*",
    "*://*.google.com.cu/*",
    "*://*.google.cv/*",
    "*://*.google.com.cy/*",
    "*://*.google.cz/*",
    "*://*.google.de/*",
    "*://*.google.dj/*",
    "*://*.google.dk/*",
    "*://*.google.dm/*",
    "*://*.google.com.do/*",
    "*://*.google.dz/*",
    "*://*.google.com.ec/*",
    "*://*.google.ee/*",
    "*://*.google.com.eg/*",
    "*://*.google.es/*",
    "*://*.google.com.et/*",
    "*://*.google.fi/*",
    "*://*.google.com.fj/*",
    "*://*.google.fm/*",
    "*://*.google.fr/*",
    "*://*.google.ga/*",
    "*://*.google.ge/*",
    "*://*.google.gg/*",
    "*://*.google.com.gh/*",
    "*://*.google.com.gi/*",
    "*://*.google.gl/*",
    "*://*.google.gm/*",
    "*://*.google.gp/*",
    "*://*.google.gr/*",
    "*://*.google.com.gt/*",
    "*://*.google.gy/*",
    "*://*.google.com.hk/*",
    "*://*.google.hn/*",
    "*://*.google.hr/*",
    "*://*.google.ht/*",
    "*://*.google.hu/*",
    "*://*.google.co.id/*",
    "*://*.google.ie/*",
    "*://*.google.co.il/*",
    "*://*.google.im/*",
    "*://*.google.co.in/*",
    "*://*.google.iq/*",
    "*://*.google.is/*",
    "*://*.google.it/*",
    "*://*.google.je/*",
    "*://*.google.com.jm/*",
    "*://*.google.jo/*",
    "*://*.google.co.jp/*",
    "*://*.google.co.ke/*",
    "*://*.google.com.kh/*",
    "*://*.google.ki/*",
    "*://*.google.kg/*",
    "*://*.google.co.kr/*",
    "*://*.google.com.kw/*",
    "*://*.google.kz/*",
    "*://*.google.la/*",
    "*://*.google.com.lb/*",
    "*://*.google.li/*",
    "*://*.google.lk/*",
    "*://*.google.co.ls/*",
    "*://*.google.lt/*",
    "*://*.google.lu/*",
    "*://*.google.lv/*",
    "*://*.google.com.ly/*",
    "*://*.google.co.ma/*",
    "*://*.google.md/*",
    "*://*.google.me/*",
    "*://*.google.mg/*",
    "*://*.google.mk/*",
    "*://*.google.ml/*",
    "*://*.google.com.mm/*",
    "*://*.google.mn/*",
    "*://*.google.ms/*",
    "*://*.google.com.mt/*",
    "*://*.google.mu/*",
    "*://*.google.mv/*",
    "*://*.google.mw/*",
    "*://*.google.com.mx/*",
    "*://*.google.com.my/*",
    "*://*.google.co.mz/*",
    "*://*.google.com.na/*",
    "*://*.google.com.nf/*",
    "*://*.google.com.ng/*",
    "*://*.google.com.ni/*",
    "*://*.google.ne/*",
    "*://*.google.nl/*",
    "*://*.google.no/*",
    "*://*.google.com.np/*",
    "*://*.google.nr/*",
    "*://*.google.nu/*",
    "*://*.google.co.nz/*",
    "*://*.google.com.om/*",
    "*://*.google.com.pa/*",
    "*://*.google.com.pe/*",
    "*://*.google.com.pg/*",
    "*://*.google.com.ph/*",
    "*://*.google.com.pk/*",
    "*://*.google.pl/*",
    "*://*.google.pn/*",
    "*://*.google.com.pr/*",
    "*://*.google.ps/*",
    "*://*.google.pt/*",
    "*://*.google.com.py/*",
    "*://*.google.com.qa/*",
    "*://*.google.ro/*",
    "*://*.google.ru/*",
    "*://*.google.rw/*",
    "*://*.google.com.sa/*",
    "*://*.google.com.sb/*",
    "*://*.google.sc/*",
    "*://*.google.se/*",
    "*://*.google.com.sg/*",
    "*://*.google.sh/*",
    "*://*.google.si/*",
    "*://*.google.sk/*",
    "*://*.google.com.sl/*",
    "*://*.google.sn/*",
    "*://*.google.so/*",
    "*://*.google.sm/*",
    "*://*.google.sr/*",
    "*://*.google.st/*",
    "*://*.google.com.sv/*",
    "*://*.google.td/*",
    "*://*.google.tg/*",
    "*://*.google.co.th/*",
    "*://*.google.com.tj/*",
    "*://*.google.tk/*",
    "*://*.google.tl/*",
    "*://*.google.tm/*",
    "*://*.google.tn/*",
    "*://*.google.to/*",
    "*://*.google.com.tr/*",
    "*://*.google.tt/*",
    "*://*.google.com.tw/*",
    "*://*.google.co.tz/*",
    "*://*.google.com.ua/*",
    "*://*.google.co.ug/*",
    "*://*.google.co.uk/*",
    "*://*.google.com.uy/*",
    "*://*.google.co.uz/*",
    "*://*.google.com.vc/*",
    "*://*.google.co.ve/*",
    "*://*.google.vg/*",
    "*://*.google.co.vi/*",
    "*://*.google.com.vn/*",
    "*://*.google.vu/*",
    "*://*.google.ws/*",
    "*://*.google.rs/*",
    "*://*.google.co.za/*",
    "*://*.google.co.zm/*",
    "*://*.google.co.zw/*",
    "*://*.google.cat/*"
  ],

To recreate, you can use the command

curl https://www.google.com/supported_domains | sed 's!\(.*\)!"*://*\1/*",!g'
Bundy answered 18/9, 2017 at 10:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.