C# .net converting HTML to RTF [closed]
Asked Answered
B

4

15

Theres another post at HTML to RTF Converter for .NET, but are there any open source converters or tutorials? I don't want to use Sautinsoft. I think there is a solution at ExpertsExchange, but I have to pay for that. Most of the search results on google point to an RTF to html converter, but not a html to RTF converter.

Battat answered 7/5, 2011 at 16:57 Comment(0)
H
5

The ExpertsExchange article is a poor one at best. Basically the OP gave up because they couldn't give a good answer. They list a link to the CodeProject article ( http://www.codeproject.com/KB/HTML/XHTML2RTF.aspx ) that shows you how to convert HTML to RTF but it isn't really a .NET solution. Instead, it would be something that would need to be highly adapted.

From my experience, there isn't a good open source converter out there. The pieces all seem to be there but it is waiting for someone to do the legwork of putting it all together. However, the immediate answer to your question is that there is not a converter already out there.

Hailstone answered 7/5, 2011 at 17:13 Comment(2)
I just went through this learning experience, and opted to use PERL which DOES have a good off the shelf, OSS, solution. (HTML::FormatRTF)Principally
@Jason D - Good to know.Hailstone
D
23

Create a WebBrowser. Load it with the html content. Select all and copy from it. Paste into a richtextbox. Then you have the RTF

string html = "...."; // html content
RichTextBox rtbTemp = new RichTextBox();
WebBrowser wb = new WebBrowser();
wb.Navigate("about:blank");

wb.Document.Write(html);
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);

rtbTemp.SelectAll();
rtbTemp.Paste();

Now rtbTemp.RTF has the RTF converted from the HTML.

Downwards answered 1/5, 2013 at 10:41 Comment(6)
@frenchone For these, you can use the Word interop, simulate paste into a Word document, then copy and paste into a richtextbox, then get the rtf.Downwards
thank for your comment but we try to remove our "MS Office" dependency. Your solution looked simple even if it require a winform reference (while our project is a consoel one). Too bad there's no dedicated system.dll to do the conversion. And that richtextbox doesn't behave like wordpad that gets the conversion right.Centaurus
@Centaurus Hyperlinks and tables will look better if you use richtextbox v5, instead of the default one in VS which is 4. But it won't fix the problem with images.Downwards
this is very helpful answerCarrico
This is clever. Very clunky, but still clever! Not sure it would scale...Mame
This does not work at all.Clair
I
13

TL;DR: I recommend using the OpenXml format and the HtmlToOpenXml nuget package if possible.


Microsoft Word COM

I haven't really searched much into this topic as a my use case is to use the functionality on a server which makes COM components not a great selection.


XHTML2RTF

As @IAmTimCorey mentioned you can use this codeproject library.

Disadvantages are:

  • Limited supported HTML and CSS
  • Not really .NET
  • ...

Windows Forms Web Browser

As @Jerry mentioned you can use the Windows Forms WebBrowser control.

Disadvantages are:

  • Reference to System.Windows.Forms
  • Uses copy & paste (problematic for multithreading)
  • Only works in an STA thread

Not supported features include:

  • Fonts
  • Colors
  • Numbered lists
  • Strikethrough (del element)
  • ...

DevExpress

Code sample of "Paul V" from the devexpress support center. (03.02.2015)

public String ConvertRTFToHTML(String RTF)
{   
    MemoryStream ms = new MemoryStream();
    StreamWriter writer = new StreamWriter(ms);
    writer.Write(RTF);
    writer.Flush();
    ms.Position = 0;
    String output = "";
    HtmlEditorExtension.Import(HtmlEditorImportFormat.Rtf, ms, (s, enumerable) => output = s);

    return output;
}

public String ConvertHTMLToRTF(String Html)
{
    MemoryStream ms = new MemoryStream();
    var editor = new ASPxHtmlEditor { Html = html };

    editor.Export(HtmlEditorExportFormat.Rtf, ms);

    ms.Position = 0;
    StreamReader reader = new StreamReader(ms);

    return reader.ReadToEnd();
}

Or you could use the RichEditDocumentServer type as shown in this example.

Unknown what actually is supported.

Disadvantages are:

  • Price
  • Quite a lot of references for one small thing
  • More?

Not supported features include:

  • Striketrough (del element)

Sautinsoft

public string ConvertHTMLToRTF(string html)
{
    SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
    return h.ConvertString(htmlString);
}

public string ConvertRTFToHTML(string rtf)
{
    SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
    byte[] bytes = Encoding.ASCII.GetBytes(rtf);
    r.OpenDocx(bytes );
    return r.ToHtml();
}

More examples and configuration options can be found here and here.

Supported is the following:

  • HTML 3.2
  • HTML 4.01
  • HTML 5
  • CSS
  • XHTML

Disadvantages are:

  • I'm not sure how active the development is
  • Price

Usage knowledgebase:


DIY

If you only wanted to support limited functionality you could write your own converter. I would not recommend this if the supported feature set is too large. (Sautinsoft claims to have written over 20'000 lines of code).

I have a small sample project here but is only for educational purposes in its current state.


OpenXml

If the OpenXml format is also ok for your use case you can use the HtmlToOpenXml nuget package. Its free and did support all features I've tested the other solutions against.

The project is based on the Open Xml SDK by microsoft and seems active.

public static byte[] ConvertHtmlToOpenXml(string html)
{
    using (var generatedDocument = new MemoryStream())
    {
        using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
        {
            var mainPart = package.MainDocumentPart;
            if (mainPart == null)
            {
                mainPart = package.AddMainDocumentPart();
                new Document(new Body()).Save(mainPart);
            }

            var converter = new HtmlConverter(mainPart);
            converter.ParseHtml(html);

            mainPart.Document.Save();
        }

        return generatedDocument.ToArray();
    }
}

Interdisciplinary answered 9/4, 2018 at 9:11 Comment(2)
Great answear. In OpenXml section u create docx file. Is there a chance to get a example with rtf with OpenXml?Allcot
That's the catch, OpenXml doesn't support RTF^^ So if you can I would recommend to not use RTF.Interdisciplinary
H
5

The ExpertsExchange article is a poor one at best. Basically the OP gave up because they couldn't give a good answer. They list a link to the CodeProject article ( http://www.codeproject.com/KB/HTML/XHTML2RTF.aspx ) that shows you how to convert HTML to RTF but it isn't really a .NET solution. Instead, it would be something that would need to be highly adapted.

From my experience, there isn't a good open source converter out there. The pieces all seem to be there but it is waiting for someone to do the legwork of putting it all together. However, the immediate answer to your question is that there is not a converter already out there.

Hailstone answered 7/5, 2011 at 17:13 Comment(2)
I just went through this learning experience, and opted to use PERL which DOES have a good off the shelf, OSS, solution. (HTML::FormatRTF)Principally
@Jason D - Good to know.Hailstone
T
2

There seems to be a new opensource solution based on a WPF RichTextBox. The only caveat is it in the core only supports STAThreaded applications and in order to use in a i.e. ASP.net you need to call it in a STAThread (but there is a sample for that in the writeup).

For use in VSTO add-ins this is confirmed to work (ie. Outlook RTFBody)

Nuget: https://www.nuget.org/packages/MarkupConverter/

Project: https://github.com/figuemon/MarkupConverter

Writeup: https://code.msdn.microsoft.com/Converting-between-RTF-and-aaa02a6e

Territory answered 25/3, 2019 at 13:3 Comment(1)
I was using MarkupConverter which works nicely but it was having strange effects on my app when user's screens were set to a scale of anything greater that 100%. So far this is working for me, nice job!Canvass

© 2022 - 2024 — McMap. All rights reserved.