Strip parent categories from url
Asked Answered
E

1

7

I'm struggling to fix an issue with 301 redirects and .htaccess. I have moved a site from an old domain to a new domain. And I have successfully managed to do this with a 301 redirect. Like so:

Redirect 301 / https://newdomain.com

On the old site child category URLs are like this:

olddomain.com/product-category/parent-cat1/parent-cat2/child-cat

or

olddomain.com/product-category/parent-cat1/child-cat

or

olddomain.com/product-category/child-cat

Whereas on the new site they are:

newdomain.com/product-category/child-cat

Unfortunately, this is resulting in 404s from the redirects. Is there any way to remove the parent categories (which can vary by name and amount of them) from the URL?

Eft answered 21/5, 2017 at 14:26 Comment(2)
"/parent-cat/parent-cat/" - Are these two instances of parent-cat the same? Or is that really /parent-cat1/parent-cat2/? You say the number of parent-cat can vary... from 1 to how many? What characters are part of the product-category and child-cat?Bandwagon
Sorry for not being clearer. No they would be different parent categories. I will edit the question to clarify this. There is no limit as to how far the product categories could be nested, but practically speaking it isn't more than 5 or 6 levels. Alphanumeric characters and hyphens. ThanksEft
B
6

Try including the following RedirectMatch directive before your existing Redirect directive:

RedirectMatch 302 ^/([\w-]+)/(?:[\w-]+/)+([\w-]+)$ https://newdomain.com/$1/$2

The RedirectMatch directive is complementary to the Redirect directive, both part of mod_alias. Except the RedirectMatch directive uses regex to match the URL-path, whereas Redirect uses simple prefix-matching.

This assumes that the path segments (ie. "product-category", "parent-cat" and "child-cat") consist of just the characters a-z, A-Z, 0-9, _ and - (hyphen). This needs to be as specific as possible so as not to match "too much". One or more "parent-cat" are required.

$1 is a backreference to the first captured group in the pattern. ie. ([\w-]+), the product-category. And $2 is a backreference to the second captured group, ie. ([\w-]+) at the end of the pattern, the child-cat. The (?:....) "group" in the middle is a non-capturing group, so there is no backreference that applies to this.

This is a 302 (temporary) redirect. Change it to a 301 only when it is working OK. It is easier to test with 302s since they are not cached by the browser. Consequently, you'll need to make sure your browser cache is clear before testing.

Bandwagon answered 23/5, 2017 at 23:32 Comment(10)
Thanks @user82217. Cheers for the heads up with regards to 302, never knew that. It appears to be working fine at the moment just need to do a bit more testing.Eft
To complicate matters, I never mentioned that the site was in a sub directory, but this appears to work: RedirectMatch 302 ^/sub-dir/([\w-]+)/(?:[\w-]+/)+([\w-]+)$ newdomain.com/$1/$2Eft
That should be sufficient if the site is in a subdirectory. So, presumably your existing Redirect directive is really something like: Redirect 301 /subdir https://newdomain.com?Bandwagon
Yes that correct. Apologies again, I should have been a lot clearer with my initial question. I think the bounty will be heading in your direction!Eft
yes have tested the above and it doesnt quite work with urls with 2+ parent categories. Seems to work fine with 1 parent category and removes it as hoped forEft
It seems to work OK for me (tried /product-category/foo/bar/baz/zop/child-cat and it successfully redirects to /product-category/child-cat at the newdomain). Make sure your browser cache is cleared. What happens exactly - literally nothing? Do you have any other directives in this .htaccess file on olddomain.com? What's the exact URL you are requesting? Maybe there's some "different" chars in the URL?Bandwagon
There are hyphens "-" in the parent and child categories, would this cause an issue? I've tried different browsers, incognito windows etc ... and it hasn't worked. The redirect is the first directive, so that should take complete precedence?Eft
"The redirect is the first directive..." - Presumably you mean the RedirectMatch directive is the first and Redirect is second? "..., so that should take complete precedence I think?" - Not necessarily. Not if you have directives from different modules. Different modules execute at different times, regardless of the apparent order of the directives in the config file. eg. mod_rewrite (RewriteRule) executes before mod_alias (RedirectMatch), regardless of the apparent order of these directives in the config file. Hyphens are OK and are included in the above patterns.Bandwagon
Sorry, yes meant the RedirectMatch. That is then followed by the Redirect. Oh right, I thought htaccess instructions were performed 1 after the other. So once the first redirect is carried out that would be it. Yep, there are different directives on the olddomain. Could I move this directive into a conf file to give it precedence?Eft
It depends what these other directives are. And whether there are any other .htaccess files along the filesystem path? Do you have access to the server config? Are you still serving content from the site root? Please add the contents of this .htaccess file to your question.Bandwagon

© 2022 - 2024 — McMap. All rights reserved.