Robots.txt disallow path with regular expression
Asked Answered
B

1

5

Does robots.txt accept regular expression ? I have many URLs with this format:

https://example.com/view/99/title-sample-text
ID ----------------------^
Title -----------------------------^

I used this:

Disallow: /view

But look like this not working because google indexed more pages. so i want to do this with regex, something like this:

Disallow: /view/([0-9]+)/([^/]*)

But is this correct format or valid in robots.txt ?

Brookins answered 5/12, 2017 at 18:55 Comment(10)
I'm voting to close this question as off-topic because it is about SEOHoopes
@JohnConde Only programming-related SEO questions are acceptable on StackOverflow.Brookins
This is programming!!Brookins
@Brookins no it's not. You're not writing code, you're writing a config file.Knopp
So what is programming about SEO?! @PaulTomblin anyway, where i should as this???Brookins
Well, you could do what I did and spend 3 seconds typing "how to write a robots.txt file" into Google.Knopp
@PaulTomblin I know how to write a robots.txt file! Agent, allow, disallow or etc.. the question is how to use regular exp in robots.txtBrookins
This question I found in the related section at least answers that it's possible.Laine
@Brookins and if you'd done that google search I suggested, you would have seen that you can't do regular expressions, but Google and some of the other crawlers recognize * globs.Knopp
Config is code. It is machine-interpreted. It's just not Turing complete (in most cases).Willettawillette
L
9

You can use a wildcard ...

User-agent: *
disallow: /view/*

See https://webmasters.stackexchange.com/questions/72722/can-we-use-regex-in-robots-txt-file-to-block-urls

Hope this helps.

Leoine answered 5/12, 2017 at 19:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.