How to write UTF-8 characters to a pdf file using itextsharp?
Asked Answered
B

3

18

I have tried a lot on google but not able to find..

Any help is appreciated.

Please find the code below:-

protected void Page_Load(object sender, EventArgs e)
    {
        StreamReader read = new StreamReader(@"D:\queryUnicode.txt", Encoding.Unicode);
        string str = read.ReadToEnd();

        Paragraph para = new Paragraph(str);

        FileStream file = new FileStream(@"D:\Query.pdf",FileMode.Create);

        Document pdfDoc = new Document();
        PdfWriter writer = PdfWriter.GetInstance(pdfDoc, file );

        pdfDoc.Open();
        pdfDoc.Add(para);
        pdfDoc.Close();

        Response.Write("Pdf file generated");
    }
Barbican answered 24/5, 2011 at 12:21 Comment(3)
What problems are you seeing? If it's missing characters then have a look here: #1322803Conjunction
Yes, the characters are missing in pdf, but i have already seen and tried this link, when I downloaded the source code of itextsharp, it didn't have the FactorySettings.cs file in it. And also, he is using "arial.ttf", I want UTF-8 characters.Barbican
Actually, the notepad from which I was fetching the string was saved as ANSI coded, when I changed it as "UTF-8" coded, now those characters are showing up in pdf as æ.Barbican
V
23

Are you converting HTML to PDF? If so, you should note that, otherwise never mind. The only reason I ask is that your last comment about getting æ makes me think that. If you are, check out this post: iTextSharp 5 polish character

Also, sometimes when people say "Unicode" what they're really trying to do is to get symbols like Wingdings into a PDF. If you mean that check out this post and know that Unicode and Wingding Symbols really aren't related at all. Unicode symbols in iTextSharp

Here's a complete working example that uses two ways to write Unicode characters, one using the character itself and one using the C# escape sequence. Make sure to save your file in a format that supports wide characters. This sample uses iTextSharp 5.0.5.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create our document object
            Document Doc = new Document(PageSize.LETTER);

            //Create our file stream
            using (FileStream fs = new FileStream(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
            {
                //Bind PDF writer to document and stream
                PdfWriter writer = PdfWriter.GetInstance(Doc, fs);

                //Open document for writing
                Doc.Open();

                //Add a page
                Doc.NewPage();

                //Full path to the Unicode Arial file
                string ARIALUNI_TFF = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");

                //Create a base font object making sure to specify IDENTITY-H
                BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

                //Create a specific font object
                Font f = new Font(bf, 12, Font.NORMAL);

                //Write some text, the last character is 0x0278 - LATIN SMALL LETTER PHI
                Doc.Add(new Phrase("This is a test ɸ", f));

                //Write some more text, the last character is 0x0682 - ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE
                Doc.Add(new Phrase("Hello\u0682", f));

                //Close the PDF
                Doc.Close();
            }
        }
    }
}

When working with iTextSharp you have to make sure that you're using a font that supports the Unicode code points that you want to use. You also need to specify IDENTITY-H when using your font. I don't completely know what it means but there's some talk about it here: iTextSharp international text

Volga answered 24/5, 2011 at 13:47 Comment(10)
@Chris, The characters you have written i.e. ɸ and \u0682 are coming correct but the characters in my file are still coming in code form. e.g. Character æ is coming as æ, ø is coming as ø. These are coming fine on the web page in the GridView and I have used UTF-8 in the Response Content Type.Barbican
@Chris, If I write these characters using code i.e. new Phrase("æ ø å", font) ,then they come fine. But I am fetching text from a text file saved as UTF8 encoded, converting it to string using StreamReader and then passing this string to the Phrase constructor.Barbican
@Puneet Dudeja, you are talking about a gridview and also a text file, which are you working with? These are two separate things that you need to further explain in your question. For the text file, are you sure that its UTF-8 encoded (you've checked it with a hex editor)? How are you fetching the text file? File system or web? For the gridview, how are you fetching that? Please edit your post above with some code so we can help you better.Volga
@Chris, I have included the whole code in my question. This code also includes the last two lines of your example code, and those characters are coming fine in the pdf. But the other characters in my text file (swedish characters) are coming as #encoded. Please help.Barbican
@Puneet Dudeja, are you able to email me the contents the file queryUnicode.txt? I understand if its confidential but it would help to see that. If you can, zip it and send it. Also, and this is true in general for debugging anything, but it would help if you could remove any code above that isn't causing a problem. For instance, there's code that creates headers in a table, that can be removed when posted here because its not part of the problem. In general, if you can get it down to the smallest amount of code that still breaks then we are more likely to be able to find a problem.Volga
@Chris, I was shocked when I tried to reproduce the problem after isolating the minimum code required, but it worked like a charm. The swedish characters are comin in the pdf and that too without using any special Font object. But the environment was different, I have tried it at my Home machine. I don't know why it is not working in office. I have posted that minimum code in my question. I will try again in my office and if I get the problem again, I will mail the file to you(its not confidential). - Great thanks for your help. Please help me again if I get the problem on Monday.Barbican
@Chris, Also, how will I mail you the file, there is no way on stackoverflow to send you the file.Barbican
@Puneet Dudeja, hope everything goes good for you. I'll be out Monday so if you have any problems you probably won't hear from my until Tuesday.Volga
Thanks, the BaseFont.IDENTITY_H is working for me. Kool beans!Catechu
'Identity-H' is not recognized.Margaux
E
0
public static Font GetArialUtf8Font(int fontSize = 9)
        {
            string fontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIAL.TTF");

            //Create a base font object making sure to specify IDENTITY-H
            BaseFont bf = BaseFont.CreateFont(fontPath, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

            //Create a specific font object
            return new Font(bf, fontSize, Font.NORMAL);
        }
Easton answered 19/3, 2022 at 8:11 Comment(0)
K
0

Arial and calibri fonts did not work for me, testing with the Check "✔" (u+2713) and X-Mark "✘" (u+2718).
I had success with the Segoe font, and embedding the font as well.

string fontPath=Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "seguisym.ttf"); 
BaseFont bf = BaseFont.CreateFont(fontPath, BaseFont.IDENTITY_H,  BaseFont.EMBEDDED);
Katonah answered 23/4, 2024 at 14:17 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.