iTextSharp international text
Asked Answered
M

4

12

I have a table in asp.net page,and trying to export it as a PDF file,I have couple of international characters that are not shown in generated PDF file,any suggestions,

Thanks in advance

Manure answered 13/11, 2009 at 7:50 Comment(0)
B
18

The key for proper display of alternate characters sets (Russian, Chinese, Japanese, etc.) is to use IDENTITY_H encoding when creating the BaseFont.

Dim bfR As iTextSharp.text.pdf.BaseFont
  bfR = iTextSharp.text.pdf.BaseFont.CreateFont("MyFavoriteFont.ttf", iTextSharp.text.pdf.BaseFont.IDENTITY_H, iTextSharp.text.pdf.BaseFont.EMBEDDED)

IDENTITY_H provides unicode support for your chosen font, so you should be able to display pretty much any character. I've used it for Russian, Greek, and all the different European language letters.

EDIT - 2013-May-28

This also works for v5.0.2 of iTextSharp.

EDIT - 2015-June-23

Given below is a complete code sample (in C#):

private void CreatePdf()
{
  string testText = "đĔĐěÇøç";
  string tmpFile = @"C:\test.pdf";
  string myFont = @"C:\<<valid path to the font you want>>\verdana.ttf";
  iTextSharp.text.Rectangle pgeSize = new iTextSharp.text.Rectangle(595, 792);
  iTextSharp.text.Document doc = new iTextSharp.text.Document(pgeSize, 10, 10, 10, 10);
  iTextSharp.text.pdf.PdfWriter wrtr;
  wrtr = iTextSharp.text.pdf.PdfWriter.GetInstance(doc,
      new System.IO.FileStream(tmpFile, System.IO.FileMode.Create));
  doc.Open();
  doc.NewPage();
  iTextSharp.text.pdf.BaseFont bfR;
  bfR = iTextSharp.text.pdf.BaseFont.CreateFont(myFont,
    iTextSharp.text.pdf.BaseFont.IDENTITY_H,
    iTextSharp.text.pdf.BaseFont.EMBEDDED);

  iTextSharp.text.BaseColor clrBlack = 
      new iTextSharp.text.BaseColor(0, 0, 0);
  iTextSharp.text.Font fntHead =
      new iTextSharp.text.Font(bfR, 12, iTextSharp.text.Font.NORMAL, clrBlack);

  iTextSharp.text.Paragraph pgr = 
      new iTextSharp.text.Paragraph(testText, fntHead);
  doc.Add(pgr);
  doc.Close();
}

This is a screenshot of the pdf file that is created:

sample pdf

An important point to remember is that if the font you have chosen does not support the characters you are trying to send to the pdf file, nothing you do in iTextSharp is going to change that. Verdana nicely displays the characters from all the European fonts I know of. Other fonts may not be able to display as many characters.

Biodegradable answered 13/11, 2009 at 13:44 Comment(5)
The second argument of BaseFont.CreateFont() is the encoding. And "Identity-H" is not a valid encoding body name, are you sure about this ?Anthe
@ManitraAndriamitondra, This was valid for iTextSharp v4.1.2. I haven't used it in awhile, so not sure if this is still valid for whatever the current version is.Biodegradable
You have to specify the full path to the font file. You can't use the internal fonts. If you want Helvetica 12, for example, you need to specify BaseFont.CreateFont("C:\Windows\Fonts\Ariel.ttf",BaseFont.IDENTITY_H,BaseFont.NOT_EMBEDDED). If you use the internal font (BaseFont.HELVETICA for example), you will get the "Identity-H" is not a valid encoding body name.Dribble
@Jovica, The đ character will write to pdf just fine as long as your chosen font supports that character. See edit to answer above.Biodegradable
Great but just it's better if you replace your absolute path for a relative path as string fontpath = System.IO.Path.Combine(System.IO.Directory.GetParent(Directory.GetCurrentDirectory()).Parent.FullName, "Fonts\\verdana.ttf");Firewarden
C
5

There are two potential reasons characters aren't rendered:

  1. The encoding. As Stewbob pointed out, Identity-H is a great way to avoid the issue entirely, though it does require you to embed a subset of the font. This has two consequences.
    1. It increases the file size a bit over unembedded fonts.
    2. The font has to be licensed for embedded subsets. Most are, some are not.
  2. The font has to contain that character. If you ask for some Arabic ligatures out of a Cyrillic (Russian) font, chances aren't good that it'll be there. There are very few fonts that cover a variety of languages, and they tend to be HUGE. The biggest/most comprehensive font I've run into was "Arial Unicode MS". Over 23 megabytes.

That's another good reason to require embedding SUBSETS. Tacking on a few megabytes because you wanted to add a couple Chinese glyphs is a bit steep.

If you're feeling paranoid, you can check your strings against a given BaseFont instance (which I believe takes the encoding into account as well) with myBaseFont.charExists(someChar). If you have a font you're confident in, I wouldn't bother.

PS: There's another good reason that Identity-H requires an embedded subset. Identity-H reads the bytes from the content stream as Glyph Indexes. The order of glyphs can vary wildly from one font to the next, or even between versions of the same font. Relying on a viewers system to have the EXACT same font is a bad idea, so its illegal... particularly when Acrobat/Reader starts substituting fonts because it couldn't find the exact font you asked for and you didn't embed it.

Crutch answered 31/12, 2010 at 0:56 Comment(0)
A
0

You can try setting the encoding for the font you are using. In Java would be something like this:

BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.EMBEDDED);

where the BaseFont.CP1252 is the encoding. Try to search for the exact encoding you need for the characters to be displayed.

Aleppo answered 13/11, 2009 at 7:54 Comment(1)
This does not work for me. I was trying to print Greek symbols such as mu and sigma in an English (Roman font) document. I believe the internal fonts don't support Greek letters, regardless of the encoding.Dribble
N
0

It caused by default iTextSharp font - Helvetica - that does not support other than base characters (or not support all other characters.

There are actually 2 options:

  1. One is to rewrite the table content by hand into the code. This approach might look faster to you, but it requires any modification to the original table to be repeated in the code as well (breaking DRY principle). In this case, you can easily set-up font as you wish.
  2. The other is to extract PDF from HTML extracted from HtmlEngine. This might sound a bit more complicated and complex (and it is), however, working solution is much more flexible and universal. I suffered the struggle with special characters myself just a while ago and decided to post a somewhat complete solution under other similar solution here on stackoverflow: https://mcmap.net/q/741233/-itextsharp-5-polish-character
Numerology answered 5/7, 2014 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.