I'm trying to parse/extract data from an XML file and retrieve necessary data.
For example:
<about>
This is an XML file
that I want to
extract data from
</about>
<message>Hello, this is a message.</message>
<this>Blah</this>
<that>Blahh</that>
<person>
<name>Jack</name>
<age>27</name>
<email>[email protected]</email>
</person>
I'm having trouble getting the content within the <about>
tags.
This is what I have so far:
(<\w*>)[\s*]?([\s*]?.*)(<\/\w*>)/m
I'm simply trying to extract the tag name and content, which is why I have the parentheses there. i.e. ($tag = $1) =~ s/[<>]//
to get the tag name, $tagcontent = $2
to get the tag's contents. I'm using \s
for the white-space characters (space, tab, newline) and the ?
because it may or may not occur *
amount of times.
I was testing this through http://www.regexe.com/, and no luck with the matching.