reading docx (Office Open XML) in PHP
Asked Answered
E

7

8

I want to add an word import function to our CMS, the only problem I cannot seems to find a good library for reading docx files (Word 2007).

Do anyone has some recommendations, the library should be able to extract content of the document and basic styling like italic, bold, superscript?

Thanks for your help

Endocardium answered 1/10, 2009 at 2:34 Comment(0)
D
2

Or, since you requested a library, you may want to look into something like Docvert. I was just looking around based on your question, and it's my favorite so far for PHP. You input the word file location, it transforms it into something simple with the attributes and all that good stuff.

Disciple answered 1/10, 2009 at 3:11 Comment(1)
looks promising but I would have to make an API of it.Endocardium
D
11

docx files are actually just containers for the document's XML. You should be able to unzip the docx file and then go to the word folder inside, then to the document.xml. This has the actual text. But things like the fonts and styles are in other xml files in the docx container, so you'll probably want to mess around a bit and figure out what is what and how to match it up (start by using namespaces, I bet).

But yea, unzip the file, then use simplexml to convert it into something you can actually mess around with.

Disciple answered 1/10, 2009 at 3:2 Comment(3)
Thanks but I am wondering if someone didn't came with a library to do that. I would do some XSLT processing if I really need.Endocardium
See my other answer. The only thing that I don't like about it is the lack of easy to find APIDisciple
the class TbsZip can read (and even edit) the content of zip archives without any dependency nor any temporary files. XML analysis can be done with several other tools.Raver
I
4

There is a library to do this but it works with Zend framework may be it will help you It is called phpLiveDocx : http://www.phplivedocx.org/downloads/ The library is licensed under New Bcd

Impervious answered 1/10, 2009 at 7:19 Comment(0)
P
4

PHPDocX PRO includes a TransformDoc class that can read .docx (zip) files and generate XHTML (or PDF) from it:

...
require_once 'phpdocx_pro/classes/TransformDoc.inc';
$doc = new TransformDoc();
$doc->setStrFile($file->filepath);
$doc->generateXHTML();
$html = $doc->getStrXHTML();
Petrol answered 9/6, 2011 at 18:0 Comment(0)
I
3

I have just find a library that has both reading and writing support check it on the codeplex forge http://openxmlapi.codeplex.com and it is licensed under GPLv2 .

Impervious answered 2/10, 2009 at 13:32 Comment(0)
D
2

Or, since you requested a library, you may want to look into something like Docvert. I was just looking around based on your question, and it's my favorite so far for PHP. You input the word file location, it transforms it into something simple with the attributes and all that good stuff.

Disciple answered 1/10, 2009 at 3:11 Comment(1)
looks promising but I would have to make an API of it.Endocardium
M
0

Convert a docx document to a odt using OpenOffice. Use then eZ Components to do the parsing and import. They actually use the import in their CMZ eZ Publish.

Masera answered 20/1, 2010 at 11:20 Comment(0)
L
0

Here is a simple working solution I found

http://webcheatsheet.com/php/reading_the_clean_text_from_docx_odt.php

Lyrate answered 31/7, 2012 at 11:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.