Insert string with special characters into RTF
Asked Answered
T

2

2

How to programatically insert string with special characters into RTF? I have rtf template I load to string and then replace all $MY_VARIABLE$ with data. Data contains special chars like 'ąęść' and the problem is that in result file these characters are replaced with '?'. It's something wrong with encoding but what?

My code looks like:

StreamReader reader = new StreamReader("template.rtf");
StringBuilder form = new StringBuilder(reader.ReadToEnd());
// here I replace variables in rtf with data
Encoding srcEncoding = new UTF8Encoding();
Encoding dstEncoding = new ASCIIEncoding();
byte[] utf = srcEncoding.GetBytes(form.ToString());
byte[] asci = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, utf);
return dstEncoding.GetString(asci);
Teaser answered 20/1, 2010 at 10:52 Comment(0)
J
2

Please, check the answer to this question.


Edited to Add

As you say that the above answer applies to the conversion of RTF to PlainText, according to RTF Specification 1.6 you use \u261a to display ą, \u281e for ę...

The syntax is \uNd where N is the decimal Unicode value for the character, and d is the ASCII approximation.


Edited to Clarify

For what you say, you have some placeholders in the RTF, right?

What you need to do is to have a function that, when replacing the placeholders, add the proper RTF encoded characters.

After a little bit of research, I think you may use something like this:

Public Function GetRtfString(ByVal text As String) As String

  Dim sb As New Text.StringBuilder()
  For Each c As Char In text
    Dim code = Convert.ToInt32(c)
    If (Char.IsLetter(c) AndAlso code < &H80) Then
      sb.Append(c)
    Else
      sb.AppendFormat(CultureInfo.InvariantCulture, "\u{0}{1}", code, RemoveDiacritics(c))
    End If
  Next
  Return sb.ToString()

End Function

Public Function RemoveDiacritics(ByVal text As String) As String

  Dim formD = text.Normalize(System.Text.NormalizationForm.FormD)
  Dim sb As New Text.StringBuilder()

  For Each c As Char In formD
    If (CharUnicodeInfo.GetUnicodeCategory(c) <> UnicodeCategory.NonSpacingMark) Then
      sb.Append(c)
    End If
  Next

  Return sb.ToString().Normalize(System.Text.NormalizationForm.FormC)

End Function
Josh answered 20/1, 2010 at 11:14 Comment(2)
I have read this question. It applies to converting other way round.Teaser
I don't want to convert RTF to plain text. I want to load RTF file, replace something and save changes. Problem is that text I want to insert is UTF8 (I suppose) and RTF is ASCI. So at some point I have to change encoding. I don't know when and how.Teaser
T
2

I used the code sample from the reply of Paulo Santos, but:
- in C#
- improved to encode chars '{', '}', '\' and '\n'
- without the complex RemoveDiacritics() part because too complex for me and my shamefull solution (just put '?' as ascii approximation) worked for my needs (rtf in DevExpress's RichEditControl) It's certainly buggy, but it works with '€' or '因'.

public static string GetRtfEncoding(char c)
{
    if (c == '\\') return "\\\\";
    if (c == '{') return "\\{";
    if (c == '}') return "\\}";
    if (c == '\n') return "\r\n\\line ";
    int intCode = Convert.ToInt32(c);
    if (char.IsLetter(c) && intCode < 0x80)
    {
        return c.ToString();
    }
    return "\\u" + intCode + "?";   
}
public static string GetRtfString(string s)
{
    StringBuilder returned = new StringBuilder();
    foreach(char c in s)
    {
        returned.Append(GetRtfEncoding(c));
    }
    return returned.ToString();
}
Trilby answered 11/5, 2011 at 16:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.