Convert .odt .doc .ods files to .txt files
Asked Answered
S

8

21

I want to convert all the .odt .doc .xls .pdf files to .txt files.

I want to convert these files to text files using a shell script or a perl script

Stearoptene answered 14/10, 2009 at 3:53 Comment(0)
J
29

There's a program for odt files and alikes:

odt2txt - avaliable in repos.

Jordanjordana answered 29/4, 2010 at 10:24 Comment(1)
This is a quite lightweight program and does the job pretty well. Thanks!Troupe
M
14
$ unoconv --format=txt document1.odt

Should produce document1.txt.

Merocrine answered 20/7, 2010 at 19:45 Comment(1)
In case someone is wondering, in LibreOffice the following command works (sometimes): soffice --headless --convert-to txt *odt Source: [ask.libreoffice.org/en/question/1671/…Bangka
O
2

OpenOffice has a built-in document converter capable of handling a bunch of formats- take a look at unoconv: http://dag.wieers.com/home-made/unoconv/

That being said, I have had some troubles getting that to work in the past- If you're having trouble, take a look at similar programs for AbiWord (another open source word processor).

Omnivore answered 14/10, 2009 at 4:10 Comment(0)
R
1

It's certainly possible to do this, though there is something strange and impenetrable about the OO project and its documentation that makes things like this hard to research and follow. However, OO has the capability to convert all of those types, not just the OO native ones, and it can do it via two different forms of automatic control.

These are the two general approaches.

  1. You can start OO and tell it to execute a macro which does this job for you for a given file. You then just have to write the macro and a script to loop over your files. The syntax is something like

    $ oowriter -headless filename macro://dir/Standard.Module1.sMySub

  2. The other thing OO has is a network API. This is based on something called UNO.

    $ oowriter -accept=accept-string

    Notifies  the  OpenOffice.org software that upon the creation of
    "UNO Acceptor Threads", a "UNO Accept String" will be used.
    

You will need some sort of client library. I think they have one for Python at least. Using this technology a Python program or some other scripting language with an OO client library could drive the program and convert all the files. Since OO reads MSO, it should be able to do all of them.

Residuum answered 14/10, 2009 at 4:18 Comment(2)
hi I could not get you. Would you be more specific?? Please help me as I need to convert odt files to txtx files as soon as possibleStearoptene
OK, I've updated my answer to make things clearer. I will add some more stuff here later today, come back in 6 or 12 hours...Residuum
C
1

For word documents, you can try antiword, at least on linux. It's a command line utility that takes a word document as an argument, and spits out the text from that document (as best as it can figure) to Standard Output. Maybe you can specify an ouput file too. I can't remember the details of how it works. I haven't used it in a while. Not sure if it can handle OO documents.

Competent answered 14/10, 2009 at 4:31 Comment(0)
A
1

Open the file in LibreOffice. Click on "File", "Save-as" scroll down to find the text option. Click that and it will be saved as a text file.

FYI, I had an *.ODT file that was 339.2 KB in size. When I save-as text the size of the file shrunk to ONLY 5.0 KB. Another reason for saving your files as text files.

Adjudge answered 4/8, 2020 at 13:58 Comment(0)
M
0

For the Microsoft formats, look into the wvWare tools.

Mizell answered 14/10, 2009 at 4:38 Comment(0)
T
-1

Open .ods file normally in libre office

Highlight text to be converted

Open a terminal

Run vi

Press "i" to get insert mode

Press ctrl-shift-v

Done!

Need some formatting?

Save the file as

Get out of vi

Run:

$cat | column >filename2

This worked in opensuse running KDE

Substitute "kwrite" for "vi", if you want

Timothy answered 21/4, 2014 at 1:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.