Extracting movie name and year from string were year is optional
Asked Answered
O

2

0

I'm missing a really obvious thing here, but I'm new to regex so be kind ;-)

I have a number of films in an arbitrary format that may or may not have the year attached.

My Movie Name 2010
Some.Other.Super.Cool.Movie
The~Third|Movie.2010

Now, using (.+)\W(\d{4}) I can extract the two movies with dates into two groups one containing the name and the other the year, but the middle one gets ignored? I'm just a little unsure on how to actually make the year segment optional.

Ideally, ;-), I could use a single expression to return the names with \W converted into spaces but that a different conversation.

Thanks in advance

Operator answered 23/3, 2011 at 3:3 Comment(1)
How do you plan to handle movies where the movie ends in 4 numbers? For example, "Death Race 2000" which came out in 1975. If you have "Death Race 2000 1975" you're fine, but what about just "Death Race 2000"?Thomasson
S
2

using a ? after the a character group will make it optional so in your case after the (\d{4})

(.+)\W(\d{4})?

That is because you are using greedy matching on (.+) and \W includes the new line character in it's set ( I think it does at least ). Strip your string of trailing whitespace and if that doesn't work make (.+) lazy with a ? of it's own, (.+?) - Also consider that \W may be the wrong delimiter for this problem.

Also adding $ to the end may help, as that would require the digits to end the function is they can, try lazing matching and $.

(.+?)\W(\d{4})?$
Scylla answered 23/3, 2011 at 3:6 Comment(1)
I actually did try this before, but RegExbuddy tells me that I'd end up with My Movie Name 2010, Some.Other.Super.Cool.Movie and The~Third|Movie in group 1 with blank, blank and 2010 in group 2??Operator
S
0

? Makes it optional

(.+?)\W?(\d{4})?$
Schaab answered 23/3, 2011 at 3:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.