PHP regex code to extract FDF data
Asked Answered
C

1

2

I am trying to parse a FDF file using PHP, and regex. But I just cant get my head around regex. I am stuck parsing the file to generate a array.

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V ([email protected])
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Current function (source:http://php.net/manual/en/ref.fdf.php)

function parse2($file) {
 if (!preg_match_all("/<<\s*\/V([^>]*)>>/x", $file,$out,PREG_SET_ORDER))
         return;
 for ($i=0;$i<count($out);$i++) {
         $pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
         $thing = $out[$i][1];
         if (eregi($pattern,$out[$i][0],$regs)) {
                 $key = $regs[2];
                 $val = $regs[1];
                 $key = preg_replace("/^\s*\(/","",$key);
                 $key = preg_replace("/\)$/","",$key);
                 $key = preg_replace("/\\\/","",$key);
                 $val = preg_replace("/^\s*\(/","",$val);
                 $val = preg_replace("/\)$/","",$val);
                 $matches[$key] = $val;
         }
 }
 return $matches;
}

Result:

Array
(
    [field_email)
    ] => [email protected])

    [field_name)
    ] => John)

    [field_reference)
    ] => )

)

Why does it conclude the ) and new line? I know this problem is trivial for someone that understands regex expressions. So help would be appreciated.

Cablegram answered 10/8, 2013 at 12:42 Comment(0)
P
2

Description

Your initial expression simply finds the entire block of text which represents each key and value set. Then in your clean up section, you're looking for a close paran which is followed immediately by a end of string \)$ but I'm sure there are additional characters between the close paran and the end of the string.

Instead I'd handle all this in one operation. This expression will:

  • find the field value
    • trim the surrounding parens off
    • and place into capture group 1
  • find the name of the value and place into capture group 2
    • trim the field_ substring off
    • trim the surrounding parens off
    • and place into capture group 2
  • requires the options: case insensitive, and multi-line

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)

enter image description here

Example

Live Demo

Sample Text

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V ([email protected])
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Matches

[0][0] = /V ([email protected])
/T (field_email)
[0][1] = [email protected]
[0][2] = email

[1][0] = /V (John)
/T (field_name)
[1][1] = John
[1][2] = name

[2][0] = /V ()
/T (field_reference)
[2][1] = 
[2][2] = reference



Or

If you wanted retain the field_ substring, then you can simply remove that from the expression like so:

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(([^)]*)\)

enter image description here

Pillar answered 10/8, 2013 at 14:23 Comment(2)
adding the \ims flags, and the regex works perfectly in php (preg_match_all("/^\/V\s(([^)]*))[\r\n]*^\/T\s(field_([^)]*))/ims", $file,$out,PREG_SET_ORDER); Also in the mean time found debuggex.com great for debuggingCablegram
Regexp are not great for parsing FDF, e.g. Chrome submits FDF [<</T(field)/V(value)>>]Rearrange

© 2022 - 2024 — McMap. All rights reserved.