HTML to RTF Converter for .NET [closed]
Asked Answered
M

4

5

I've already seen lots of posts on the site for RTF to HTML and some other posts talking about some HTML to RTF converters, but I'm really trying to get a full breakdown of what is considered the most widely used commercial product, open source product or if people recommend going home grown. Apologies if you consider this a duplicate question, but I'm trying to create a product matrix to see what is the most viable for our application. I also think this would be helpful for others.

The converter would be used in an ASP.NET 2.0 application (we're upgrading to 3.5 shortly but still sticking with WebForms) using SQLServer 2005 (soon 2008) as the DB.

From reading a few posts, SautinSoft appears to be popular as a commercial component. Are there other commercial components that you'd recommend for converting HTML to RTF? Price does matter, but even if it's a little on the expensive side, please list it.

For open source, I read that OpenOffice.org can be run as a service so that it can convert files. However, this appears to be only Java based. I imagine, I'd need some kind of interop to use this? What .NET open source components, if any, are out there for converting HTML to RTF?

For home grown, is an XSLT the way to go with XHTML? If so, what component do you recommend for generating XHTML? Otherwise, what other home grown avenuses do you recommend.

Also, please note that I currently don't care so much about RTF to HTML. If a commercial component offers this and the price is still the same, fine, otherwise please don't mention it.

Mechanician answered 11/1, 2010 at 15:42 Comment(7)
Could I get more background on the technical task at hand? Basically, why are you doing this? What program is going to view the RTF end-product?Aphrodisia
@Albert. Data is pulled from a DB to generate an RTF report. All the RTF formatting is currently done in the report (hard-coded... ourch!) based on a spec, but in a few instances, the client wants to format some sections, so we'll give them a rich text editor in the web app and when they save it, I'll convert it to a chunk of formatted RTF that will be pulled from the DB and inserted into the report.Mechanician
Um... I'm totally confused. I'm trying to understand the data flow and conversion here. So far I have the following: DB -> RTF -> RTF* -> DB But that doesn't make sense as it would seem to imply you have an RTF parser that can grep and dump to the DB. Unless you mean the DB holds RTF data?Aphrodisia
@Aphrodisia - Non RTF data is currently stored in the database. When we generate a report, we format the data in the database via C# class that formats the data as RTF. The fields that I'll be adding to the report are going to be stored in the database formatted as RTF. To store them as RTF, I need to convert the HTML in the RTE that's being posted back into RTF. Clear? :)Mechanician
Oh, okay. Here's a solution which you'll hate as it involves time/money, but I think it would be better. Dump RTF for DOCX. There are many tools for DOCX. Mircosoft OOXML SDK v2.0 (msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx). Aspose.Words for .NET (aspose.com). These tools would simplify your life as you would completely avoid HTML. In fact, there are a few companies offering web-based DOCX editing. Hopefully, dumping changes back into the DB would be simple (well, okay more simple). Again, you'll probably hate this approach.Aphrodisia
I hear ya, but the client currently demands RTF. Maybe in the future this could be a way to go.Mechanician
BTW, for reference, 90% or more of the SautinSoft posts on this site are by the company themselves or their shills. They seem to luuuurve them some astroturfing. It's worked, apparently -- someone thinks they're actually legit. There are currently 7 posts mentioning SautinSoft now, though, not counting the dozens to hundreds that have been deleted as spam.Precursor
D
0

I would recommend doing it yourself as the task is not really that complex. Firstly, the easiest way convert one Xml format into another Xml format is with an Xslt. Converting Xml documents in C# is super easy.

Here is a good msdn blog post to get you started. Mike even mentions that it was easier to do this by hand that to deal with a third party.

link

Actually, I already answered this question here. Guess that makes this a duplicate.

Dynamism answered 13/1, 2010 at 15:35 Comment(2)
@Ty - I have no problems going custom, just wondering what you'd recommend for converting to XHTML if the HTML isn't perfect.Mechanician
@nickyt Messed up HTML would make this job a real pain. I've done some apps where the HTML/RTF was controlled, but if you are going to see bold tags, strong tags and sometimes tags that are not closed you might need to look at a two staged approach where you normalize the data first and then convert. I don't think you need to worry about XHTML.Dynamism
P
1

For what its worth and in no particular order.

A while ago i wanted to export to RTF and then import from RTF the RTF in question being manipulated by MS Word.

The first problem is RTF is not an open standard. It is an internal MS standard and there fore they alter it as and when they like and do not generally worry about compatibility. Currently the versions of RTF are 1.3 to 1.9 and they are all different. Internally they use twips for measurement just for good measure.

I bought the O'Reilly pocket book on the subject which helped and read a lot of the MS documentation which is good, but there is a lot of it and lots for each version.

Because of the way RTF is coded using regex to manipulate is incredibly hard work and needs careful handling and concentration to test and get to work. I use a Mac editor that had built in regex so i could steadily test each section and build it into the code.

Because of the number of versions there is also a lot of incompatibility between versions but there is a lot of commonality and in the end it was reasonably hard/easy to get where i wanted (after about a weeks reading and a weeks coding) and producing a really simple version.

I never found a commercial solution but i had to have a free on because of budget so that cut a lot out but take great care in choosing one to make sure it does what you want and has support.

I don't think where you are coming from HTML/XML/XHTML, i was converting CSV formats, it the RTF.

I am not sure if i would advise to DIY or buy. Probably on balance DIY but your own circumstances will dictate that.

Edit: One thing going from content to RTF is easier than vice versa.

BTW not criticising MS fior the RTF versions, hey it's theirs and proprietary so they can do what they like.

Plowshare answered 13/1, 2010 at 18:28 Comment(0)
M
0

I just came across this WYSIWYG rich text editor (RTE) for the web that also has an HTML to RTF converter, Cute Editor for .NET. Does anyone have any experience with this component? My main experience for web based RTEs have been CKEditor (fckEditor) and TinyMCE but as far as I can tell CKEditor and TinyMCE do not have HTML to RTF converters built in.

Mechanician answered 11/1, 2010 at 17:34 Comment(0)
D
0

I would recommend doing it yourself as the task is not really that complex. Firstly, the easiest way convert one Xml format into another Xml format is with an Xslt. Converting Xml documents in C# is super easy.

Here is a good msdn blog post to get you started. Mike even mentions that it was easier to do this by hand that to deal with a third party.

link

Actually, I already answered this question here. Guess that makes this a duplicate.

Dynamism answered 13/1, 2010 at 15:35 Comment(2)
@Ty - I have no problems going custom, just wondering what you'd recommend for converting to XHTML if the HTML isn't perfect.Mechanician
@nickyt Messed up HTML would make this job a real pain. I've done some apps where the HTML/RTF was controlled, but if you are going to see bold tags, strong tags and sometimes tags that are not closed you might need to look at a two staged approach where you normalize the data first and then convert. I don't think you need to worry about XHTML.Dynamism
K
0

Since I'm required to implement some mailmerge capabilities with rich-text formatting on a Web application, I thought it'd be nice to share my experiences.

Basically, I explored two alternatives:

  • using Google Docs API to leverage Google Docs capabilities
  • using XSLT, as shown on this essay

Google Docs API works well. Problem is, when you upload an HTML document with page breaks, like this:

<p style="page-break-before:always;display:none;"/>

and ask Google to convert the doc in RTF, you lose all breaks, which does not fit my requirements. However, if page breaks aren't an issue for you, you might check this solution out.

The XSLT solution works... sort of.

It works if you reference MSXML3 COM object directly, bypassing System.Xml classes. Otherwise I couldn't make it work. Moreover, it seems to honor all but basic formatting and tags, disregarding text color, size and the like. However, it honors page breaks. :-)

Here's a quick library I wrote, using tidy.net to force HTML to XHTML conversion. Hope it helps.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace ADDS.Mailmerge
{

    public class XHTML2RTF
    {

        MSXML2.FreeThreadedDOMDocument _xslDoc;
        MSXML2.FreeThreadedDOMDocument _xmlDoc;
        MSXML2.IXSLProcessor _xslProcessor;
        MSXML2.XSLTemplate _xslTemplate;
        static XHTML2RTF instance = null;
        static readonly object padlock = new object();

        XHTML2RTF()
        {
            _xslDoc = new MSXML2.FreeThreadedDOMDocument();
            //XSLData.xhtml2rtf is a resource file 
            // containing XSL for transformation
            // I got XSL from here: 
            // http://www.codeproject.com/KB/HTML/XHTML2RTF.aspx
            _xslDoc.loadXML(XSLData.xhtml2rtf);
            _xmlDoc = new MSXML2.FreeThreadedDOMDocument();
            _xslTemplate = new MSXML2.XSLTemplate();
            _xslTemplate.stylesheet = _xslDoc;
            _xslProcessor = _xslTemplate.createProcessor();
        }

        public string ConvertToRTF(string xhtmlData)
        {
            try
            {
                string sXhtml = "";
                TidyNet.Tidy tidy = new TidyNet.Tidy();
                tidy.Options.XmlOut = true;
                tidy.Options.Xhtml = true;
                using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(xhtmlData)))
                {
                    StringBuilder sb = new StringBuilder();
                    using (MemoryStream sw = new MemoryStream())
                    {
                        TidyNet.TidyMessageCollection messages = new TidyNet.TidyMessageCollection();
                        tidy.Parse(ms, sw, messages);
                        sXhtml = Encoding.UTF8.GetString(sw.ToArray());
                    }
                }

                _xmlDoc.loadXML(sXhtml);
                _xslProcessor.input = _xmlDoc;
                _xslProcessor.transform();
                return _xslProcessor.output.ToString();
            }
            catch (Exception exc)
            {
                throw new Exception("Error in xhtml conversion. ", exc);
            }
        }

        public static XHTML2RTF Instance
        {
            get
            {
                lock (padlock)
                {
                    if (instance == null)
                    {
                        instance = new XHTML2RTF();
                    }
                    return instance;
                }
            }
        }
    }



}
Kauffmann answered 9/11, 2011 at 13:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.