RTF to Plain Text in Java
Asked Answered
C

4

12

How do you convert an RTF string to plain text in Java? The obvious answer is to use Swing's RTFEditorKit, and that seems to be the common answer around the Internet. However the write method that claims to return plain text isn't actually implemented... it's hard-coded to just throw an IOException in Java6.

Coarsegrained answered 28/4, 2011 at 22:33 Comment(0)
C
21

I use Swing's RTFEditorKit in Java 6 like this:

RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(rtfBytes), document, 0);
String text = document.getText(0, document.getLength());

and thats working.

Cunha answered 28/4, 2011 at 23:27 Comment(4)
Got it working using this solution, though it didn't work at first either. It turns out that my input data was invalid, and the conversion was failing silently and returning an empty string.Coarsegrained
It works for me, but for some reason the text comes out with dropped characters.Frill
It works fine in WIndows platform, but for *nix platform, it has dependecy with X11 window server.Kryska
it doesn't work with "\line" codeVisit
M
6

Try Apache Tika: http://tika.apache.org/0.9/formats.html#Rich_Text_Format

Mapel answered 28/4, 2011 at 22:45 Comment(2)
Tika use 'RTFEditorKit' on the backendWeirick
Tika is just for plain text and metadata, am I right?Consentaneous
H
2

You might consider RTF Parser Kit as a lightweight alternative to the Swing RTFEditorKit. The line below shows plain text extraction from an RTF file. The RTF file is read from the input stream, the extracted text is written to the output stream.

new StreamTextConverter().convert(new RtfStreamSource(inputStream), outputStream, "UTF-8");

(full disclosure: I'm the author of RTF Parser Kit)

Harder answered 13/1, 2017 at 12:36 Comment(4)
Nice job! But "Kotlin: Unresolved reference: MyRtfListener"Visit
Does it mean that I must implement IRtfListener myself?Visit
@CatherineIvanova the one line example above will extract plain text for you... no need to implement the listener. I think your reference to MyRtfListener comes from the RTF Parser Kit readme which does illustrate the case where you'd provide your own listener.Harder
@JonIles thanks for the parser project! Can you see also this question #74112241 ?Kudva
M
0

Here is the full code to parse & write RTF as a plain text

    import java.io.FileInputStream;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import javax.swing.text.BadLocationException;
    import javax.swing.text.Document;
    import javax.swing.text.rtf.RTFEditorKit;

    public class rtfToJson {
    public static void main(String[] args)throws IOException, BadLocationException {
    // TODO Auto-generated method stub
    RTFEditorKit rtf = new RTFEditorKit();
    Document doc = rtf.createDefaultDocument();

    FileInputStream fis = new FileInputStream("C:\\SampleINCData.rtf");
    InputStreamReader i =new InputStreamReader(fis,"UTF-8");
    rtf.read(i,doc,0);
   // System.out.println(doc.getText(0,doc.getLength()));
    String doc1 = doc.getText(0,doc.getLength());


    try{    
           FileWriter fw=new FileWriter("B:\\Sample INC Data.txt");    
           fw.write(doc1);    
           fw.close();    
          }catch(Exception e)
    {
              System.out.println(e);
              }    
          System.out.println("Success...");    
     }    

    }
Martine answered 29/6, 2018 at 8:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.