Convert HTML to PDF in .NET [closed]
Asked Answered
I

26

514

I want to generate a PDF by passing HTML contents to a function. I have made use of iTextSharp for this but it does not perform well when it encounters tables and the layout just gets messy.

Is there a better way?

Ive answered 19/2, 2009 at 10:21 Comment(6)
You can use GemBox.Document for this. Also here you can find a sample code for converting HTML file into a PDF file.Neuropath
Which version of iTextSharp do you use and could you share your html?Cypsela
Still no answer to my request for additional information. Please also add if you are using HTMLWorker or XMLWorker.Cypsela
What about .net core?Miley
Anvil provides a complete set of APIs for generating, filling and e-signing PDFs. They just launched an HTML to PDF endpoint useanvil.com/pdf-generation-apiDisencumber
Can we please reopen this one? Many new products provide this functionality, others are out of date. Without new answers, this can not be easily lined out. For 2022 I would recommend: github.com/hardkoded/puppeteer-sharp#generate-pdf-files Is well established, well maintained, simple to use, built on a solid basis etc.Scampi
B
274

EDIT: New Suggestion HTML Renderer for PDF using PdfSharp

(After trying wkhtmltopdf and suggesting to avoid it)

HtmlRenderer.PdfSharp is a 100% fully C# managed code, easy to use, thread safe and most importantly FREE (New BSD License) solution.

Usage

  1. Download HtmlRenderer.PdfSharp nuget package.
  2. Use Example Method.

    public static Byte[] PdfSharpConvert(String html)
    {
        Byte[] res = null;
        using (MemoryStream ms = new MemoryStream())
        {
            var pdf = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.A4);
            pdf.Save(ms);
            res = ms.ToArray();
        }
        return res;
    }
    

A very Good Alternate Is a Free Version of iTextSharp

Until version 4.1.6 iTextSharp was licensed under the LGPL licence and versions until 4.16 (or there may be also forks) are available as packages and can be freely used. Of course someone can use the continued 5+ paid version.

I tried to integrate wkhtmltopdf solutions on my project and had a bunch of hurdles.

I personally would avoid using wkhtmltopdf - based solutions on Hosted Enterprise applications for the following reasons.

  1. First of all wkhtmltopdf is C++ implemented not C#, and you will experience various problems embedding it within your C# code, especially while switching between 32bit and 64bit builds of your project. Had to try several workarounds including conditional project building etc. etc. just to avoid "invalid format exceptions" on different machines.
  2. If you manage your own virtual machine its ok. But if your project is running within a constrained environment like (Azure (Actually is impossible withing azure as mentioned by the TuesPenchin author) , Elastic Beanstalk etc) it's a nightmare to configure that environment only for wkhtmltopdf to work.
  3. wkhtmltopdf is creating files within your server so you have to manage user permissions and grant "write" access to where wkhtmltopdf is running.
  4. Wkhtmltopdf is running as a standalone application, so its not managed by your IIS application pool. So you have to either host it as a service on another machine or you will experience processing spikes and memory consumption within your production server.
  5. It uses temp files to generate the pdf, and in cases Like AWS EC2 which has really slow disk i/o it is a big performance problem.
  6. The most hated "Unable to load DLL 'wkhtmltox.dll'" error reported by many users.

--- PRE Edit Section ---

For anyone who want to generate pdf from html in simpler applications / environments I leave my old post as suggestion.

TuesPechkin

https://www.nuget.org/packages/TuesPechkin/

or Especially For MVC Web Applications (But I think you may use it in any .net application)

Rotativa

https://www.nuget.org/packages/Rotativa/

They both utilize the wkhtmtopdf binary for converting html to pdf. Which uses the webkit engine for rendering the pages so it can also parse css style sheets.

They provide easy to use seamless integration with C#.

Rotativa can also generate directly PDFs from any Razor View.

Additionally for real world web applications they also manage thread safety etc...

Belamy answered 11/8, 2015 at 14:35 Comment(21)
Thank you for updating your post. I'm going to give PdfSharp a try. You saved me a lot of time.Peninsula
PdfSharp is good in terms of performance, but it didn't render floats properly for me. Luckily, I could change the markup to use good old tables, PdfSharp handles them well.Sapotaceous
I tried using background color and it does not work with pdf sharp. Any workaround for this?Farnham
We tried HtmlRenderer. It was really quick when not loading any CSS. But when we tried to apply CSS (Bootstrap plus some bespoke), the CSS parsing took a while (which we could probably mitigate), and rendering was completely different to the web page.Petrolic
I am trying HtmlRenderer.PdfSharp library now. I am experiencing text being cut-off near page breaks. Has anyone have a fix for this?Sejant
@Sejant - looking at the source for HtmlRenderer.PdfSharp, there's no way to fix this - it just takes the total page height and clips into each PDF page, which is really unfortunate - it means multi-page PDFs with this library really can't be done.Illusive
How to render image on PDF file ?Estellaestelle
Can a very long scrolling page be captured as a single long page? Is there a PageSize.Scroll, say?Blackburn
BS. This creates an image of the HTML and adds the image into the pdf file. This is not a real PDF at all. Also, PDF is a vector graphics format - you can scroll near infinitely - of course except if the PDF consists of a raster graphic, which is what this library produces.Elimination
This library also has a dependency on GDI+, or x-server if you're running on Mono/Linux. So this is not useful for server environments...Sportswoman
Any idea if the resultant .pdf files are bloated? That is: is the generated .pdf close to the size it would have been if I coded it? Based on the other comments it seems it would be.Sublapsarianism
Any idea, how to set headers and footers in PdfSharp ?Effortful
I wish I could upvote this answer twice. The review of Wkhtmltopdf saved me a lot of time in particular.Stridulous
@Anestis Kivranoglou i have used pdf sharp on my project. But for html design with css, it cannot render the html. Instead it is only creating a blank pageIsobelisocheim
PDFSharp is rendering some pages as blank for us.Sublapsarianism
PDFSharp seems abandoned (the last version was 1.5.1 beta, which was updated in 2016). It seems like these days the best move is to use PDFSharpCore instead github.com/ststeiger/PdfSharpCoreLorrin
Code posted in Example Method won't work with latest version of HtmlRenderer.PdfSharpInsomnolence
For HtmlRenderer.PdfSharp in .NET Core there is a port on nuget called Polybioz.HtmlRenderer.PdfSharp.Core. It works well so far in my tests.Errolerroll
@Errolerroll 's suggestion of HtmlRenderer.PdfSharp works as long as you don't have tables in your html. If you do, I'd suggest using Select.HtmlToPdf.Herrod
@Herrod What problems are you running into with tables? If you mean it's not visible, set the width to 100% on your table markup. <table style='width:100%;border:1px solid gray'>Errolerroll
@Errolerroll Thank you for the advice. I'll try that for my next project. For now, I'm happy with SelectPdf!Herrod
B
129

Last Updated: October 2020

This is the list of options for HTML to PDF conversion in .NET that I have put together (some free some paid)

If none of the options above help you you can always search the NuGet packages:
https://www.nuget.org/packages?q=html+pdf

Boatyard answered 5/9, 2019 at 17:25 Comment(9)
have you tested any for performance ? we are looking to improve current conversion times and are exploring other libraries for these performance benefitsCherellecheremis
I have not done any performance comparison especially because is such a long list - maybe out there somebody has already done a "PDF generation .net libraries performance review" or similarBoatyard
Another wkhtmtopdf based solution that will even work on Azure web services is DinkToPdf fork: github.com/hakanl/DinkToPdf with nuget: nuget.org/packages/Haukcode.DinkToPdfMagistery
DinkToPdf is free and works in .net core. nuget.org/packages/DinkToPdfLiquidator
the problem with DinkToPDF is the project was I think already abandoned by it's owner. so it's basically hard to keep it maintain.Citizenship
@Citizenship there are plenty of options from the list ;-)Boatyard
update this list!! Also, check this solution: github.com/eKoopmans/html2pdf.js#getting-started It got me VERY far down the rabbithole, until .dotnet 6 broke it and I had to start again.Rosy
You could add Chrome headless for completeness. Even better - MS Edge has the same CLI printing abilities and should be present on any newer/non-EoL Windows (client) system!Prolate
github.com/hardkoded/puppeteer-sharp this library does HTML to pdf conversion just pretty nicelyBudwig
G
30

I highly recommend NReco, seriously. It has the free and paid version, and really worth it. It uses wkhtmtopdf in background, but you just need one assembly. Fantastic.

Example of use:

Install via NuGet.

var htmlContent = String.Format("<body>Hello world: {0}</body>", DateTime.Now);
var pdfBytes = (new NReco.PdfGenerator.HtmlToPdfConverter()).GeneratePdf(htmlContent);

Disclaimer: I'm not the developer, just a fan of the project :)

Grume answered 23/4, 2015 at 19:53 Comment(12)
Looks indeed pretty useful. Worth noting that as of today (05/10/15), it's the most downloaded .Net wrapper for wkhtmtopdf (as a Nuget package).Geniculate
Tried it, unfortunately I couldn't make it work on azure's web pages.Haematosis
This library works fine when I run it locally on my machine, but on the hosting server, I am seeing the following error randomly. Pdf gets generated sometimes but sometimes it throws the following error. "Error. An error occurred while processing your request. Cannot generate PDF: (exit code: 1)"Sejant
wkhtmtopdf depends on GDI+, or x-server if you're running on Mono/Linux. So this is not useful for server environments...Sportswoman
Its good and working as expected but bit quality issue i see in my pdf , can we improve this ?Scram
@Nuzzolilo you need to use server-friendly wkhtmltopdf build that doesn't depend on X-server (available on official website).Cumbrous
@VitaliyFedorchenko I've since been laid off due to failure of that project but thanks for the suggestion lolSportswoman
@VitaliyFedorchenko I've treid Nreco today but I receive an error message saying I need a license, while I'm trying the free version. This is really painful because there's no way I buy a product without testing it before hand, not to mention my use case falls in those nreco theoretically supports for free.Dorr
@Dorr most likely you tried NReco.PdfGenerator.LT nuget package (which is compatible with .NET Core); it is available only for commercial users. However, you can contact support and get demo key for evaluation purposes, no problems with this.Cumbrous
@VitaliyFedorchenko Thank you for the answer. How else am I supposed to do it than with the nuget package ? NReco's website seems to say that it's the correct way... Is the core-compatible version ONLY available for paying customers ? If it's the case I'll just pass on it since we are building a small application for internal use.Dorr
It is great until you need support for CSS3, that's when things go south because under the hood it's based on wkhtmltopdf which still does not support all CSS3 constructs github.com/wkhtmltopdf/wkhtmltopdf/issues/3207.Vesting
I highly recommend NReco too. The free version works me really well, you can just add a reference to the dll and it work fine, no hassle at all. I was worried about using it with azure websites, but for now it seems working ok, even if it is based on wkhtmltopdf, After putting out the code to production and still seeing no error, i will pay them the asking price ($200), which is much lower then other ones like PrincePDF with its $3800 price.Recountal
T
26

Most HTML to PDF converter relies on IE to do the HTML parsing and rendering. This can break when user updates their IE. Here is one that does not rely on IE.

The code is something like this:

EO.Pdf.HtmlToPdf.ConvertHtml(htmlText, pdfFileName);

Like many other converters, you can pass text, file name, or Url. The result can be saved into a file or a stream.

Transudate answered 12/4, 2011 at 13:6 Comment(7)
Agreed- essential PDF was the best of over 10 that I tried. Support is fantastic.Anagrammatize
it is not useful because you must purchase the libraryMccowyn
d1jhoni1b, how does this make it not useful? If it is a pay-for tool, then it might be said to be expensive, but not useless on that criteria alone.Submediant
It's true EO.Pdf doesn't use IE. But it does seem to spawn 32 bit instances of a webkit browser in the background. Check your process list and you will see them as rundll32.exe instances pointing to the EO.PDF dll. So it still is a bit hacky in my opinion.Defeatism
@fubaar, your comment more look like an advertise than informational! EO is not free and even its price is much higher than other products and although it does not use IE to render html files, it still relies on a browser to do that.Valenza
It doesn't support media="print" which is really painful.Hexylresorcinol
Single developer licence for $650. That's costly.Ronnyronsard
K
24

For all those looking for an working solution in .net 5 and above here you go.

Here are my working solutions.

Using wkhtmltopdf:

  1. Download and install wkhtmltopdf latest version from here.
  2. Use the below code.
public static string HtmlToPdf(string outputFilenamePrefix, string[] urls,
    string[] options = null,
    string pdfHtmlToPdfExePath = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")
{
    string urlsSeparatedBySpaces = string.Empty;
    try
    {
        //Determine inputs
        if ((urls == null) || (urls.Length == 0))
            throw new Exception("No input URLs provided for HtmlToPdf");
        else
            urlsSeparatedBySpaces = String.Join(" ", urls); //Concatenate URLs

        string outputFilename = outputFilenamePrefix + "_" + DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss-fff") + ".PDF"; // assemble destination PDF file name

        var p = new System.Diagnostics.Process()
        {
            StartInfo =
            {
                FileName = pdfHtmlToPdfExePath,
                Arguments = ((options == null) ? "" : string.Join(" ", options)) + " " + urlsSeparatedBySpaces + " " + outputFilename,
                UseShellExecute = false, // needs to be false in order to redirect output
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                RedirectStandardInput = true, // redirect all 3, as it should be all 3 or none
                WorkingDirectory = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location))
            }
        };

        p.Start();

        // read the output here...
        var output = p.StandardOutput.ReadToEnd();
        var errorOutput = p.StandardError.ReadToEnd();

        // ...then wait n milliseconds for exit (as after exit, it can't read the output)
        p.WaitForExit(60000);

        // read the exit code, close process
        int returnCode = p.ExitCode;
        p.Close();

        // if 0 or 2, it worked so return path of pdf
        if ((returnCode == 0) || (returnCode == 2))
            return outputFilename;
        else
            throw new Exception(errorOutput);
    }
    catch (Exception exc)
    {
        throw new Exception("Problem generating PDF from HTML, URLs: " + urlsSeparatedBySpaces + ", outputFilename: " + outputFilenamePrefix, exc);
    }
}
  1. And call the above method as HtmlToPdf("test", new string[] { "https://www.google.com" }, new string[] { "-s A5" });
  2. If you need to convert HTML string to PDF, the tweak the above method and replace the Arguments to Process StartInfo as $@"/C echo | set /p=""{htmlText}"" | ""{pdfHtmlToPdfExePath}"" {((options == null) ? "" : string.Join(" ", options))} - ""C:\Users\xxxx\Desktop\{outputFilename}""";

Drawbacks of this approach:

  1. The latest build of wkhtmltopdf as of posting this answer does not support latest HTML5 and CSS3. Hence if you try to export any html that as CSS GRID then the output will not be as expected.
  2. You need to handle concurrency issues.

Using chrome headless:

  1. Download and install latest chrome browser from here.
  2. Use the below code.
var p = new System.Diagnostics.Process()
{
    StartInfo =
    {
        FileName = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
        Arguments = @"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""C:/Users/Abdul Rahman/Desktop/grid.html""",
    }
};

p.Start();

// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);

// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
  1. This will convert html file to pdf file.
  2. If you need to convert some url to pdf then use the following as Argument to Process StartInfo

@"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""https://www.google.com""",

Drawbacks of this approach:

  1. This works as expected with latest HTML5 and CSS3 features. Output will be same as you view in browser but when running this via IIS you need to run the AppliactionPool of your application under LocalSystem Identity or you need to provide read/write access to IISUSRS.

Using Selenium WebDriver:

  1. Install Nuget Packages Selenium.WebDriver and Selenium.WebDriver.ChromeDriver.
  2. Use the below code.
public async Task<byte[]> ConvertHtmlToPdf(string html)
{
    var directory = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.CommonDocuments), "ApplicationName");
    Directory.CreateDirectory(directory);
    var filePath = Path.Combine(directory, $"{Guid.NewGuid()}.html");
    await File.WriteAllTextAsync(filePath, html);

    var driverOptions = new ChromeOptions();
    // In headless mode, PDF writing is enabled by default (tested with driver major version 85)
    driverOptions.AddArgument("headless");
    using var driver = new ChromeDriver(driverOptions);
    driver.Navigate().GoToUrl(filePath);

    // Output a PDF of the first page in A4 size at 90% scale
    var printOptions = new Dictionary<string, object>
    {
        { "paperWidth", 210 / 25.4 },
        { "paperHeight", 297 / 25.4 },
        { "scale", 0.9 },
        { "pageRanges", "1" }
    };
    var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
    var pdf = Convert.FromBase64String(printOutput["data"] as string);

    File.Delete(filePath);

    return pdf;
}

Advantage of this method:

  1. This just needs an Nuget installation and works as expected with latest HTML5 and CSS3 features. Output will be same as you view in browser.

Drawbacks of this approach:

  1. This approach needs latest chrome browser to be installed in the server where the app runs.
  2. If the chrome browser version in server is updated then Selenium.WebDriver.ChromeDriver Nuget package needs to be updated. Else this will throw run time error due to version mismatch.

The above drawbacks can be overcome if we are running app in docker. All we need to do is to install chrome when building app image using Dockerfile

With this approach, please make sure to add <PublishChromeDriver>true</PublishChromeDriver> in .csproj file as shown below:

<PropertyGroup>
  <TargetFramework>net5.0</TargetFramework>
  <LangVersion>latest</LangVersion>
  <Nullable>enable</Nullable>
  <PublishChromeDriver>true</PublishChromeDriver>
</PropertyGroup>

This will publish the chrome driver when publishing the project.

Here is the link to my working project repo - HtmlToPdf

Using window.print() in JavaScript to generate PDF from browser

If the users are using your app from browser then you can rely on JavaScript and use window.print() and necessary print media css to generate PDF from the browser. For example generating invoice from browser in an inventory app.

Advantage of this method:

  1. No dependency on any tools.
  2. PDF generated directly from HTML, CSS and JS in browser.
  3. Faster
  4. Supports all the latest CSS properties.

Drawbacks of this approach:

  1. In SPA like Blazor, we need to do some workaround with iframe to print sections of page.

I arrived at the above answer after almost spending 2 days with available options and finally implemented Selenium based solution and it's working. Hope this helps you and save your time.

Ketene answered 27/3, 2021 at 20:4 Comment(12)
Have you ran any of these on Azure by chance? I will find out myself soon enough.Murrah
No I haven't tried. Please update here if you have tried this on AzureKetene
@Murrah did you get a chance to verify? Please share your feedback. And please upvote the answer if that helped you.Ketene
I will circle back around to answering this.Murrah
How about github.com/ststeiger/PdfSharpCore?Lorrin
@Lorrin I don't think PdfSharpCore supports HTML to PDF out of the box as stated in their wikiKetene
Using chrome headless works good; best way is simple code :)Zinovievsk
Does the selenium webdriver option require Chrome to be installed on the server hosting the application?Pancreas
@Pancreas yes. chrome needs to be installed on the server. Updated the answer with this info.Ketene
Headless Chrome works like a charm. And it's very simple to implement tooWadi
@KJ That depends on which OS the app is hosted or running.Ketene
With Selenium 4.11, it should be able to install even the browser automatically if need be. See the documentation about its manager.Separate
D
11

You can use Google Chrome print-to-pdf feature from its headless mode. I found this to be the simplest yet the most robust method.

var url = "https://mcmap.net/q/48109/-convert-html-to-pdf-in-net-closed";
var chromePath = @"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe";
var output = Path.Combine(Environment.CurrentDirectory, "printout.pdf");
using (var p = new Process())
    {
        p.StartInfo.FileName = chromePath;
        p.StartInfo.Arguments = $"--headless --disable-gpu --print-to-pdf={output} {url}";
        p.Start();
        p.WaitForExit();
    }

Dorcia answered 28/1, 2020 at 7:31 Comment(6)
Hey, this is really cool for owned server and vps. Thanks for sharing.Deeplaid
In order to allow ASP.NET in IIS to run external program with write access permission, the application pool > advance settings > identity > set to "LocalSystem"Deeplaid
I love this approach, but how to handle if the request to url need more specific, such as header, cookie even post method?Cytolysis
can it handle html strings? instead of url.Citizenship
I have a problem. The pdf conversion is not fully load the page.Chaldron
@TấnNguyên You might have to set up your own web service that will do that stuff and then output the HTML contents, and then give Chrome your web service's URL.Korella
O
8

Quite likely most projects will wrap a C/C++ engine rather than implementing a C# solution from scratch. Try Project Gotenberg.

To test it

docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6

Curl sample

curl --request POST \
    --url http://localhost:3000/convert/url \
    --header 'Content-Type: multipart/form-data' \
    --form remoteURL=https://brave.com \
    --form marginTop=0 \
    --form marginBottom=0 \
    --form marginLeft=0 \
    --form marginRight=0 \
    -o result.pdf

C# sample.cs

using System;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;
using static System.Console;

namespace Gotenberg
{
    class Program
    {
        public static async Task Main(string[] args)
        {
            try
            {
                var client = new HttpClient();            
                var formContent = new MultipartFormDataContent
                    {
                        {new StringContent("https://brave.com/"), "remoteURL"},
                        {new StringContent("0"), "marginTop" }
                    };
                var result = await client.PostAsync(new Uri("http://localhost:3000/convert/url"), formContent);
                await File.WriteAllBytesAsync("brave.com.pdf", await result.Content.ReadAsByteArrayAsync());
            }
            catch (Exception ex)
            {
                WriteLine(ex);
            }
        }
    }
}

To compile

csc sample.cs -langversion:latest -reference:System.Net.Http.dll && mono ./sample.exe
Osugi answered 8/5, 2020 at 19:35 Comment(0)
I
7

This is a free library and works very easily : OpenHtmlToPdf

string timeStampForPdfName = DateTime.Now.ToString("yyMMddHHmmssff");

string serverPath = System.Web.Hosting.HostingEnvironment.MapPath("~/FolderName");
string pdfSavePath = Path.Combine(@serverPath, "FileName" + timeStampForPdfName + ".FileExtension");


//OpenHtmlToPdf Library used for Performing PDF Conversion
var pdf = Pdf.From(HTML_String).Content();

//FOr writing to file from a ByteArray
 File.WriteAllBytes(pdfSavePath, pdf.ToArray()); // Requires System.Linq
Investment answered 6/3, 2019 at 7:41 Comment(4)
That seems to be a Java library, not a .net/C# one.Prolate
@AndreasReiff nope, I put this snipped from .net code only.Investment
Ok, thank you, seems like there is a Java library by that name that is also on Github. There also is a nuget package with the same name though.Prolate
yup thats right @AndreasReiffInvestment
E
6

2018's update, and Let's use standard HTML+CSS=PDF equation!

There are good news for HTML-to-PDF demands. As this answer showed, the W3C standard css-break-3 will solve the problem... It is a Candidate Recommendation with plan to turn into definitive Recommendation in 2017 or 2018, after tests.

As not-so-standard there are solutions, with plugins for C#, as showed by print-css.rocks.

Exponible answered 19/2, 2009 at 10:21 Comment(1)
The solutions linked by print-css.rocks cost $2,950.00 for PDFreactor, $3800 for Prince, and $5,000.00 for Antenna House Formatter V7. And Weasyprint appears to be for Python.Cromer
J
4

Below is an example of converting html + css to PDF using iTextSharp (iTextSharp + itextsharp.xmlworker)

using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml;


byte[] pdf; // result will be here

var cssText = File.ReadAllText(MapPath("~/css/test.css"));
var html = File.ReadAllText(MapPath("~/css/test.html"));

using (var memoryStream = new MemoryStream())
{
        var document = new Document(PageSize.A4, 50, 50, 60, 60);
        var writer = PdfWriter.GetInstance(document, memoryStream);
        document.Open();

        using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
        {
            using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
            {
                XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlMemoryStream, cssMemoryStream);
            }
        }

        document.Close();

        pdf = memoryStream.ToArray();
}
Justly answered 21/6, 2016 at 13:45 Comment(1)
Note that iTextSharp works with XHtml and is quite sensitive to the quality of your html. It would break, where SelectPdf and HiqPdf wouldn't.Quinn
P
3

It depends on any other requirements you have.

A really simple but not easily deployable solution is to use a WebBrowser control to load the Html and then using the Print method printing to a locally installed PDF printer. There are several free PDF printers available and the WebBrowser control is a part of the .Net framework.

EDIT: If you Html is XHtml you can use PDFizer to do the job.

Psychomancy answered 19/2, 2009 at 10:26 Comment(0)
T
3

It seems like so far the best free .NET solution is the TuesPechkin library which is a wrapper around the wkhtmltopdf native library.

I've now used the single-threaded version to convert a few thousand HTML strings to PDF files and it seems to work great. It's supposed to also work in multi-threaded environments (IIS, for example) but I haven't tested that.

Also since I wanted to use the latest version of wkhtmltopdf (0.12.5 at the time of writing), I downloaded the DLL from the official website, copied it to my project root, set copy to output to true, and initialized the library like so:

var dllDir = AppDomain.CurrentDomain.BaseDirectory;
Converter = new StandardConverter(new PdfToolset(new StaticDeployment(dllDir)));

Above code will look exactly for "wkhtmltox.dll", so don't rename the file. I used the 64-bit version of the DLL.

Make sure you read the instructions for multi-threaded environments, as you will have to initialize it only once per app lifecycle so you'll need to put it in a singleton or something.

Tobi answered 2/1, 2020 at 7:32 Comment(1)
wkhtmltopdf is great, but it is based on WebKit from around 2012, so doesn't support more modern HTML/CSS.Menhaden
W
2

I was also looking for this a while back. I ran into HTMLDOC http://www.easysw.com/htmldoc/ which is a free open source command line app that takes an HTML file as an argument and spits out a PDF from it. It's worked for me pretty well for my side project, but it all depends on what you actually need.

The company that makes it supplies the compiled binaries, but you are free to download and compile from source and use it for free. I managed to compile a pretty recent revision (for version 1.9) and I intend on releasing a binary installer for it in a few days, so if you're interested I can provide a link to it as soon as I post it.

HTMLDOC converts HTML and Markdown source files or web pages to EPUB, PostScript, or PDF files with an optional table of contents.

Edit (2/25/2014): Seems like the docs and site moved to https://www.msweet.org/htmldoc/

Edit (2022/3) Binaries are on github GPL2 licensed https://github.com/michaelrsweet/htmldoc

Walkway answered 8/5, 2009 at 16:26 Comment(4)
hi, can u provide a link and also a guide on how to use it with c# asp.net thanksPl
static.persistedthoughts.com/htmldoc_1.9.1586-setup.exe Be aware that this is a command line program. You have to execute it from within your application to get it to work. You can find the documentation for its arguments and caveats from Chapter 4 on: easysw.com/htmldoc/documentation.phpWalkway
I'm not sure how useful this would be nowadays, but if it helps you: dropbox.com/s/9kfn3ttoxs0fiar/htmldoc_1.9.1586-setup.exeWalkway
The website is no longer in operation.Competency
J
2

You can also check Spire, it allow you to create HTML to PDF with this simple piece of code

 string htmlCode = "<p>This is a p tag</p>";
 
//use single thread to generate the pdf from above html code
Thread thread = new Thread(() =>
{ pdf.LoadFromHTML(htmlCode, false, setting, htmlLayoutFormat); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
 
// Save the file to PDF and preview it.
pdf.SaveToFile("output.pdf");
System.Diagnostics.Process.Start("output.pdf");
Jameyjami answered 3/9, 2015 at 9:18 Comment(3)
Spire generates a PDF file that is just an image. Some of the css is not even correct, such as ignoring bold fonts.Quinn
See the response to my question regarding generating the PDFs as an image: e-iceblue.com/forum/nuget-pdf-as-non-image-t6710.htmlQuinn
Spire is the 4th one I've tried from this page and I think it's the best, thanks.Cromer
B
1

Best Tool i have found and used for generating PDF of javascript and styles rendered views or html pages is phantomJS.

Download the .exe file with the rasterize.js function found in root of exe of example folder and put inside solution.

It Even allows you to download the file in any code without opening that file also it also allows to download the file when the styles and specially jquery are applied.

Following code generate PDF File :

public ActionResult DownloadHighChartHtml()
{
    string serverPath = Server.MapPath("~/phantomjs/");
    string filename = DateTime.Now.ToString("ddMMyyyy_hhmmss") + ".pdf";
    string Url = "http://wwwabc.com";

    new Thread(new ParameterizedThreadStart(x =>
    {
        ExecuteCommand(string.Format("cd {0} & E: & phantomjs rasterize.js {1} {2} \"A4\"", serverPath, Url, filename));
                           //E: is the drive for server.mappath
    })).Start();

    var filePath = Path.Combine(Server.MapPath("~/phantomjs/"), filename);

    var stream = new MemoryStream();
    byte[] bytes = DoWhile(filePath);

    Response.ContentType = "application/pdf";
    Response.AddHeader("content-disposition", "attachment;filename=Image.pdf");
    Response.OutputStream.Write(bytes, 0, bytes.Length);
    Response.End();
    return RedirectToAction("HighChart");
}



private void ExecuteCommand(string Command)
{
    try
    {
        ProcessStartInfo ProcessInfo;
        Process Process;

        ProcessInfo = new ProcessStartInfo("cmd.exe", "/K " + Command);

        ProcessInfo.CreateNoWindow = true;
        ProcessInfo.UseShellExecute = false;

        Process = Process.Start(ProcessInfo);
    }
    catch { }
}


private byte[] DoWhile(string filePath)
{
    byte[] bytes = new byte[0];
    bool fail = true;

    while (fail)
    {
        try
        {
            using (FileStream file = new FileStream(filePath, FileMode.Open, FileAccess.Read))
            {
                bytes = new byte[file.Length];
                file.Read(bytes, 0, (int)file.Length);
            }

            fail = false;
        }
        catch
        {
            Thread.Sleep(1000);
        }
    }

    System.IO.File.Delete(filePath);
    return bytes;
}
Bly answered 19/2, 2009 at 10:21 Comment(1)
Can you share your full source code? I am new to C# so i am getting stuck even on the imports.Et
R
1

To convert HTML to PDF in C# use ABCpdf.

ABCpdf can make use of the Gecko or Trident rendering engines, so your HTML table will look the same as it appears in FireFox and Internet Explorer.

There's an on-line demo of ABCpdf at www.abcpdfeditor.com. You could use this to check out how your tables will render first, without needing to download and install software.

For rendering entire web pages you'll need the AddImageUrl or AddImageHtml functions. But if all you want to do is simply add HTML styled text then you could try the AddHtml function, as below:

Doc theDoc = new Doc();
theDoc.FontSize = 72;
theDoc.AddHtml("<b>Some HTML styled text</b>");
theDoc.Save(Server.MapPath("docaddhtml.pdf"));
theDoc.Clear();

ABCpdf is a commercial software title, however the standard edition can often be obtained for free under special offer.

Rundown answered 2/2, 2010 at 6:16 Comment(3)
You should really write in all your answers that you work for websupergoo. From the faq: However, you must disclose your affiliation with the product in your answers. Also, if a huge percentage of your posts include a mention of your product, you're clearly here for the wrong reasons. All your answers have been about ABCpdfHauteur
Ouch! I suggested ABCpdf because it's a component I'm familiar with. If a large percentage of my posts relate to PDFs, it's only because I refrain from contributing to topics outside my areas of interest. Apologies.Rundown
In the posters defense, the website does make out the product to be pretty good.Competency
G
1

You need to use a commercial library if you need perfect html rendering in pdf.

ExpertPdf Html To Pdf Converter is very easy to use and it supports the latest html5/css3. You can either convert an entire url to pdf:

using ExpertPdf.HtmlToPdf; 
byte[] pdfBytes = new PdfConverter().GetPdfBytesFromUrl(url);

or a html string:

using ExpertPdf.HtmlToPdf; 
byte[] pdfBytes = new PdfConverter().GetPdfBytesFromHtmlString(html, baseUrl);

You also have the alternative to directly save the generated pdf document to a Stream of file on the disk.

Gramicidin answered 14/11, 2014 at 14:8 Comment(8)
You dont have to use a commercial library if you need perfect html rendering in pdfFirmin
I'm beginning to believe this. I've tried 5 of the freebies and they all have one thing that ruins it for me. From choking to a page that is beyond a simple hello world, to looking awful - I think I'm going to have to cough up some money for a real converter. The samples of each of the commercial products actually works the way you'd expect the PDF to come out as.Bruckner
@Firmin - I would like to believe you. Perhaps you could share with us a link to whatever tools you are finding so good.Around
@PeterWone on the top there is many open source alternatives as you can see easily. Asking someone to share same things is just stealing time. But if you tried all of them and unsatisfied, i hope you will share your comments under them about what is not satisfied you and maybe then it would help to grow the knowledge.Firmin
@Firmin - Why repeat what others have already done? They fall into three categories: not really free, unacceptable dependencies like wkhtmltopdf or IE9, and the HTML Renderer for PDFSharp. HR for PDF# is the only one in pure C# and it does a horrible job of paginating - it renders one long page and cuts it up, often clipping through lines of text. If I can find the time to completely rewrite the renderer, HR for PDF# would win hands down: it's fast, free and has no dependencies. But that would be a whole new renderer, I fear.Around
@PeterWone if you are sure about this then why don't you share this as an asnwer instead of a comment?Firmin
@PeterWone because i worked with HTMLDOC(which is in answers) and it was worked perfect for me. I used it in C++ but not to hard to use it in C#. I think you are not talking with your experiences but with pre-acceptance which is not a right approach.Firmin
@Firmin - I had to search for HTMLDOC, it's way down with the low scoring answers. Your experience with it is encouraging, full source is available, the documentation seems quite good. But I strongly prefer a single-language solution if possible, and my C is too weak from long disuse. Otherwise you are right, it would be a good option.Around
M
1

If you want user to download the pdf of rendered page in the browser then the easiest solution to the problem is

window.print(); 

on client side it will prompt user to save pdf of current page. You can also customize the appearance of pdf by linking style

<link rel="stylesheet" type="text/css" href="print.css" media="print">

print.css is applied to the html while printing.

Limitation

You can't store the file on server side. User prompt to print the page than he had to save page manually. Page must to be rendered in a tab.

Mannerly answered 18/3, 2015 at 9:28 Comment(1)
dude so easy, least for my needs. Thanks!Parkin
S
1

As a representative of HiQPdf Software I believe the best solution is HiQPdf HTML to PDF converter for .NET. It contains the most advanced HTML5, CSS3, SVG and JavaScript rendering engine on market. There is also a free version of the HTML to PDF library which you can use to produce for free up to 3 PDF pages. The minimal C# code to produce a PDF as a byte[] from a HTML page is:

HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

// set PDF page size, orientation and margins
htmlToPdfConverter.Document.PageSize = PdfPageSize.A4;
htmlToPdfConverter.Document.PageOrientation = PdfPageOrientation.Portrait;
htmlToPdfConverter.Document.Margins = new PdfMargins(0);

// convert HTML to PDF 
byte[] pdfBuffer = htmlToPdfConverter.ConvertUrlToMemory(url);

You can find more detailed examples both for ASP.NET and MVC in HiQPdf HTML to PDF Converter examples repository.

Supervision answered 2/12, 2016 at 11:48 Comment(3)
Produces decent results, but like SelectPdf, it can have a big hit on your build time and deploy package size. It was almost doubling my Visual Studio build time. I also had a hard time getting it to fill my page - the html was too small in the middle - in that respect SelectPdf did a better job.Quinn
page filling with HTML content depends on HtmlToPdf.BrowserWidth property. It is 1200 pixels by default but you can set it to 800 pixels and the HTML should fill very well the entire PDF page. You can find a live demo and sample code for this at hiqpdf.com/demo/HtmlFittingAndScalingOptions.aspxSupervision
No .NET Core support either.Frostwork
S
0

Instead of parsing HTML directly to PDF, you can create an Bitmap of your HTML-page and then insert the Bitmap into your PDF, using for example iTextSharp.

Here's a code how to get an Bitmap of an URL. I found it somewhere here on SO, if I find the source I'll link it.

public System.Drawing.Bitmap HTMLToImage(String strHTML)
{
    System.Drawing.Bitmap myBitmap = null;

    System.Threading.Thread myThread = new System.Threading.Thread(delegate()
    {
        // create a hidden web browser, which will navigate to the page
        System.Windows.Forms.WebBrowser myWebBrowser = new System.Windows.Forms.WebBrowser();
        // we don't want scrollbars on our image
        myWebBrowser.ScrollBarsEnabled = false;
        // don't let any errors shine through
        myWebBrowser.ScriptErrorsSuppressed = true;
        // let's load up that page!    
        myWebBrowser.Navigate("about:blank");

        // wait until the page is fully loaded
        while (myWebBrowser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
            System.Windows.Forms.Application.DoEvents();

        myWebBrowser.Document.Body.InnerHtml = strHTML;

        // set the size of our web browser to be the same size as the page
        int intScrollPadding = 20;
        int intDocumentWidth = myWebBrowser.Document.Body.ScrollRectangle.Width + intScrollPadding;
        int intDocumentHeight = myWebBrowser.Document.Body.ScrollRectangle.Height + intScrollPadding;
        myWebBrowser.Width = intDocumentWidth;
        myWebBrowser.Height = intDocumentHeight;
        // a bitmap that we will draw to
        myBitmap = new System.Drawing.Bitmap(intDocumentWidth - intScrollPadding, intDocumentHeight - intScrollPadding);
        // draw the web browser to the bitmap
        myWebBrowser.DrawToBitmap(myBitmap, new System.Drawing.Rectangle(0, 0, intDocumentWidth - intScrollPadding, intDocumentHeight - intScrollPadding));
    });
    myThread.SetApartmentState(System.Threading.ApartmentState.STA);
    myThread.Start();
    myThread.Join();

    return myBitmap;
}
Sauer answered 11/6, 2014 at 9:22 Comment(1)
I do believe this is the ugliest approach ever. Really, who wants to lose text accessibility and possibility of copying text?Sorus
A
0

With Winnovative HTML to PDF converter you can convert a HTML string in a single line

byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlString, baseUrl);

The base URL is used to resolve the images referenced by relative URLs in HTML string. Alternatively you can use full URLs in HTML or embed images using src="data:image/png" for image tag.

In answer to 'fubaar' user comment about Winnovative converter, a correction is necessary. The converter does not use IE as rendering engine. It actually does not depend on any installed software and the rendering is compatible with WebKit engine.

Ashantiashbaugh answered 13/9, 2014 at 9:35 Comment(0)
C
0

PDFmyURL recently released a .NET component for web page / HTML to PDF conversion as well. This has a very user friendly interface, for example:

PDFmyURL pdf = new PDFmyURL("yourlicensekey");
pdf.ConvertURL("http://www.example.com", Application.StartupPath + @"\example.pdf");

Documentation: PDFmyURL .NET component documentation

Disclaimer: I work for the company that owns PDFmyURL

Chatman answered 8/9, 2015 at 11:33 Comment(0)
B
0

Already if you are using itextsharp dll, no need to add third party dll's(plugin), I think you are using htmlworker instead of it use xmlworker you can easily convert your html to pdf.

Some css won't work they are Supported CSS
Full Explain with example Reference Click here


        MemoryStream memStream = new MemoryStream();
        TextReader xmlString = new StringReader(outXml);
        using (Document document = new Document())
        {
            PdfWriter writer = PdfWriter.GetInstance(document, memStream);
            //document.SetPageSize(iTextSharp.text.PageSize.A4);
            document.Open();
            byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(outXml);
            MemoryStream ms = new MemoryStream(byteArray);
            XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, ms, System.Text.Encoding.UTF8);
            document.Close();
        }

        Response.ContentType = "application/pdf";
        Response.AddHeader("content-disposition", "attachment;filename=" + filename + ".pdf");
        Response.Cache.SetCacheability(HttpCacheability.NoCache);
        Response.BinaryWrite(memStream.ToArray());
        Response.End();
        Response.Flush();
Boreal answered 24/2, 2016 at 7:57 Comment(1)
It is worth noting that iTextSharp uses the Affero General Public License which limits it for use only with other open source applications. That could be important for some people considering this option.Complainant
W
0

Another suggestion it to try the solution by https://grabz.it.

They provide a nice .NET API to catch screenshots and manipulate it in an easy and flexible approach.

To use it in your app you will need to first get key + secret and download the .NET SDK (it's free).

Now a short example of using it.

To use the API you will first need to create an instance of the GrabzItClient class, passing your application key and application secret from your GrabzIt account to the constructor, as shown in the below example:

//Create the GrabzItClient class
//Replace "APPLICATION KEY", "APPLICATION SECRET" with the values from your account!
private GrabzItClient grabzIt = GrabzItClient.Create("Sign in to view your Application Key", "Sign in to view your Application Secret");

Now, to convert the HTML to PDF all you need to do it:

grabzIt.HTMLToPDF("<html><body><h1>Hello World!</h1></body></html>");

You can convert to image as well:

grabzIt.HTMLToImage("<html><body><h1>Hello World!</h1></body></html>");     

Next you need to save the image. You can use one of the two save methods available, Save if publicly accessible callback handle available and SaveTo if not. Check the documentation for details.

Whorl answered 15/6, 2017 at 11:18 Comment(0)
B
0

Another trick you can use WebBrowser control, below is my full working code

Assigning Url to text box control in my case

  protected void Page_Load(object sender, EventArgs e)
{

   txtweburl.Text = "https://www.google.com/";

 }

Below is code for generate screeen using thread

  protected void btnscreenshot_click(object sender, EventArgs e)
  {
    //  btnscreenshot.Visible = false;
    allpanels.Visible = true;
    Thread thread = new Thread(GenerateThumbnail);
    thread.SetApartmentState(ApartmentState.STA);
    thread.Start();
    thread.Join();

}

private void GenerateThumbnail()
{
    //  btnscreenshot.Visible = false;
    WebBrowser webrowse = new WebBrowser();
    webrowse.ScrollBarsEnabled = false;
    webrowse.AllowNavigation = true;
    string url = txtweburl.Text.Trim();
    webrowse.Navigate(url);
    webrowse.Width = 1400;
    webrowse.Height = 50000;

    webrowse.DocumentCompleted += webbrowse_DocumentCompleted;
    while (webrowse.ReadyState != WebBrowserReadyState.Complete)
    {
        System.Windows.Forms.Application.DoEvents();
    }
}

In below code I am saving the pdf file after download

        private void webbrowse_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    // btnscreenshot.Visible = false;
    string folderPath = Server.MapPath("~/ImageFiles/");

    WebBrowser webrowse = sender as WebBrowser;
    //Bitmap bitmap = new Bitmap(webrowse.Width, webrowse.Height);

    Bitmap bitmap = new Bitmap(webrowse.Width, webrowse.Height, PixelFormat.Format16bppRgb565);

    webrowse.DrawToBitmap(bitmap, webrowse.Bounds);


    string Systemimagedownloadpath = System.Configuration.ConfigurationManager.AppSettings["Systemimagedownloadpath"].ToString();
    string fullOutputPath = Systemimagedownloadpath + Request.QueryString["VisitedId"].ToString() + ".png";
    MemoryStream stream = new MemoryStream();
    bitmap.Save(fullOutputPath, System.Drawing.Imaging.ImageFormat.Jpeg);



    //generating pdf code 
     Document pdfDoc = new Document(new iTextSharp.text.Rectangle(1100f, 20000.25f));
     PdfWriter writer = PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
     pdfDoc.Open();
     iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(fullOutputPath);   
     img.ScaleAbsoluteHeight(20000);
     img.ScaleAbsoluteWidth(1024);     
     pdfDoc.Add(img);
     pdfDoc.Close();
     //Download the PDF file.
     Response.ContentType = "application/pdf";
     Response.AddHeader("content-disposition", "attachment;filename=ImageExport.pdf");
     Response.Cache.SetCacheability(HttpCacheability.NoCache);
     Response.Write(pdfDoc);
     Response.End();


}

You can also refer my oldest post for more information: Navigation to the webpage was canceled getting message in asp.net web form

Bedrail answered 29/8, 2019 at 11:56 Comment(0)
C
-1

Try this PDF Duo .Net converting component for converting HTML to PDF from ASP.NET application without using additional dlls.

You can pass the HTML string or file, or stream to generate the PDF. Use the code below (Example C#):

string file_html = @"K:\hdoc.html";   
string file_pdf = @"K:\new.pdf";   
try   
{   
    DuoDimension.HtmlToPdf conv = new DuoDimension.HtmlToPdf();   
    conv.OpenHTML(file_html);   
    conv.SavePDF(file_pdf);   
    textBox4.Text = "C# Example: Converting succeeded";   
}   

Info + C#/VB examples you can find at: http://www.duodimension.com/html_pdf_asp.net/component_html_pdf.aspx

Chromous answered 1/8, 2009 at 16:43 Comment(1)
BitDefender reports: "Malware detected! Access to this page has been blocked.". I have no opinion on whether this report is genuine or a false positive.Ahumada

© 2022 - 2024 — McMap. All rights reserved.