wkhtmltopdf relative paths in HTML with redirected in/out streams won't work
Asked Answered
E

3

6

I am using wkhtmltopdf.exe (version 0.12.0 final) to generate pdf files from html files, I do this with .NET C#

My problem is getting javascript, stylesheets and images to work by only specifying relative paths in the html. Right now I have it working if I use absolute paths. But it doesn't work with relative paths, which makes the whole html generation a bit to complicated. I have boiled what I do down to the following example:

string CMDPATH = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe";
string HTML = string.Format(
    "<div><img src=\"{0}\" /></div><div><img src=\"{1}\" /></div><div>{2}</div>",
    "./sohlogo.png",
    "./ACLASS.jpg",
    DateTime.Now.ToString());

WriteFile(HTML, "test.html");

Process p;
ProcessStartInfo psi = new ProcessStartInfo();

psi.FileName = CMDPATH;
psi.UseShellExecute = false;
psi.WorkingDirectory = AppDomain.CurrentDomain.BaseDirectory;
psi.CreateNoWindow = true;
psi.RedirectStandardInput = true;
psi.RedirectStandardOutput = true;
psi.RedirectStandardError = true;

psi.Arguments = "-q - -";

p = Process.Start(psi);

StreamWriter stdin = p.StandardInput;
stdin.AutoFlush = true;
stdin.Write(HTML);
stdin.Dispose();

MemoryStream pdfstream = new MemoryStream();
CopyStream(p.StandardOutput.BaseStream, pdfstream);
p.StandardOutput.Close();
pdfstream.Position = 0;

WriteFile(pdfstream, "test.pdf");

p.WaitForExit(10000);
int test = p.ExitCode;

p.Dispose();

I have tried relative paths like: "./sohlogo.png" and simply "sohlogo.png" both displays correctly in the browser via the html file. But none of them work in the pdf file. There is no data in the error stream.

The following commandline works like a charm with the relative paths:

"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" test.html test.pdf

I could really need some input at this stage. So any help is much appreciated!

Just for reference the WriteFile and CopyStream methods looks like this:

public static void WriteFile(MemoryStream stream, string path)
{
    using (FileStream writer = new FileStream(path, FileMode.Create))
    {
        byte[] bytes = stream.ToArray();
        writer.Write(bytes, 0, bytes.Length);
        writer.Flush();
    }
}

public static void WriteFile(string text, string path)
{
    using (StreamWriter writer = new StreamWriter(path))
    {
        writer.WriteLine(text);
        writer.Flush();
    }
}

public static void CopyStream(Stream input, Stream output)
{
    byte[] buffer = new byte[32768];
    int read;
    while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
    {
        output.Write(buffer, 0, read);
    }
}

EDIT: My Workaround for Neo Nguyen.

I could not get this to work with relative paths. So what I did instead was a method that prepends all paths with a root path. It solves my problem so maybe it will solve yours:

/// <summary>
/// Prepends the basedir x in src="x" or href="x" to the input html text
/// </summary>
/// <param name="html">the initial html</param>
/// <param name="basedir">the basedir to prepend</param>
/// <returns>the new html</returns>
public static string MakeRelativePathsAbsolute(string html, string basedir)
{
    string pathpattern = "(?:href=[\"']|src=[\"'])(.*?)[\"']";

    // SM20140214: tested that both chrome and wkhtmltopdf.exe understands "C:\Dir\..\image.png" and "C:\Dir\.\image.png"
    //             Path.Combine("C:/
    html = Regex.Replace(html, pathpattern, new MatchEvaluator((match) =>
        {
            string newpath = UrlEncode(Path.Combine(basedir, match.Groups[1].Value));
            if (!string.IsNullOrEmpty(match.Groups[1].Value))
            {
                string result = match.Groups[0].Value.Replace(match.Groups[1].Value, newpath);
                return result;
            }
            else
            {
                return UrlEncode(match.Groups[0].Value);
            }
        }));

    return html;
}

private static string UrlEncode(string url)
{
    url = url.Replace(" ", "%20").Replace("#", "%23");
    return url;
}

I tried different System.Uri.Escape*** methods like System.Uri.EscapeDataString(). But they ended up doing to severe url encoding for wkhtmltopdf to understand it. Because of lack of time I just did the quick and dirty UrlEncode above.

Elamitic answered 14/2, 2014 at 9:33 Comment(2)
Hey buddy, have you figured out a way? I'm currently in the same situation as yours...:)Lathing
Hi Neo. I edited my question, to explain how a solved the problem. Hopefully you can use it for something.Elamitic
U
1

as per official docs of the command line , there is an option called --cache-dir.

seems like they meant the working directory. I use it and it works with v0.12.3

wkhtmltopdf /my/path/to/index.html test.pdf --cache-dir /my/path/to
Univalve answered 6/7, 2021 at 9:45 Comment(1)
--cache-dir worked for me. I did not have to update all html files to include absolute paths. Thanks!Ingenuity
H
0

Looking quickly, I think the trouble might be with

psi.WorkingDirectory = AppDomain.CurrentDomain.BaseDirectory;

I think that is where the paths are pointing at. I'm assuming that

"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" test.html test.pdf

working means that your image referenced inside test.html as src="mlp.png" is at c:\Program Files\wkhtmltopdf\bin\mlp.png, right? I think that it works because your image file is in the same folder as wkhtmltopdf... so try setting the WorkingDirectory to that directory and see what happens.

Hairspring answered 14/2, 2014 at 12:39 Comment(2)
AppDomain.CurrentDomain.BaseDirectory is "C:\Development\Tests\wkhtmltopdftest\wkhtmltopdftest\bin\Debug" this is also the location where I run the command line expression. So I don't think this is the problemElamitic
Maybe you could add the absolute paths to test.html, sohlogo.png and also the place where you run the command to the question; it would provide more info and deter wrong answers like mine :)Hairspring
M
0

I use version 0.12.3 of wkthmltopdf. Here you can use relative paths, as far as I could figure out, they are relative to the location of the source file. If you have your html like

/documentroot/tmp/myfile.html 

and your asset is something like

/documentroot/assets/logo.png

then the links should work with

"../assets/logo.png"
Merocrine answered 14/3, 2023 at 18:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.