sed - Back reference on match pattern does not work
Asked Answered
O

2

4

I need to find in files (xml) date in this format 2021-06-25T21:17:51Z and replace them with this format 2021-06-25T21:17:51.001Z

I thought about using regexp with sed but back references does not work.

1.xml could look like this, but I have much more fields in those files, and I got fields already correct.

<Doc>
   <PUB_DATE>2021-06-25T21:17:51Z</PUB_DATE><!-- to change -->
   <DATE_COLLECT_100>2021-06-25T21:17:51Z</DATE_COLLECT_100><!-- to change -->

   <DATE_CREATION>2021-06-25T21:17:51.001Z</DATE_CREATION><!-- keep it like this -->
</Doc>

Desired output is

<Doc>
   <PUB_DATE>2021-06-25T21:17:51.001Z</PUB_DATE><!-- to change -->
   <DATE_COLLECT_100>2021-06-25T21:17:51.001Z</DATE_COLLECT_100><!-- to change -->

   <DATE_CREATION>2021-06-25T21:17:51.001Z</DATE_CREATION><!-- keep it like this -->
</Doc>

Here is my sed

$ sed -Ee 's#<(PUB_DATE|DATE_COLLECT_100){1}>([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2})Z</\1>#<\1>\2.001Z</\1>#' 1.xml

The regexep seems to be OK in regex101

Here a representation of it made with https://regexper.com representation of the regexp

Is back references allowed in sed when they are used in the search portion ? Am I missing something about sed ? Is there a bug ?

Sed version : well... I dont know, sed --version sed -v man sed doesn't give it. I'm on OSX.

Olympus answered 24/1, 2022 at 18:29 Comment(1)
Your "keep it like this" line is exactly the same as it it were modified like the other lines so it's not a good example to test a potential solution with as we can't tell from the output if a script modified it or not.Gainor
C
6

BSD or OSX sed doesn't support back-reference \1 in regex pattern.

Your choices are perl:

perl -pe 's#<(PUB_DATE|DATE_COLLECT_100)>(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})Z</\1>#<\1>\2.001Z</\1>#' 1.xml

Or else install gnu sed using home brew installer and then use:

gsed -E 's#<(PUB_DATE|DATE_COLLECT_100)>([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2})Z</\1>#<\1>\2.001Z</\1>#' 1.xml
Capsaicin answered 24/1, 2022 at 18:38 Comment(2)
oh, OK! that's why it doesn't work. Thanks, I will go for perl than :)Olympus
yes I know, but you were so fast I should wait a couple of minutes before I could validate the answer ^^Olympus
G
1

POSIX defines backreferences in a BRE, not an ERE, and you're calling sed with -E to enable EREs and so the result is undefined behavior per POSIX and so YMMV regarding what any given tool will do with that.

You don't need a script that complicated to handle the input you show though, e.g. using any sed that supports EREs with a -E arg (e.g. GNU and BSD sed):

$ sed -E 's/(<(PUB_DATE|DATE_COLLECT_100)>.*:[0-9]+)Z/\1.001Z/' file
<Doc>
   <PUB_DATE>2021-06-25T21:17:51.001Z</PUB_DATE><!-- to change -->
   <DATE_COLLECT_100>2021-06-25T21:17:51.001Z</DATE_COLLECT_100><!-- to change -->

   <DATE_CREATION>2021-06-25T21:17:51.001Z</DATE_CREATION><!-- keep it like this -->
</Doc>

and if your real input is more complicated/variable than that then you should be using an XML-aware tool such as xmlstarlet instead of sed anyway.

Gainor answered 24/8, 2024 at 12:9 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.