Convert PDF to clean SVG? [closed]
Asked Answered
D

9

131

I'm attempting to convert a PDF to SVG. However, the one I am using currently maps a path for every letter in every piece of text, meaning if I change the text in its source file, it looks ugly.

I was wondering what the cleanest PDF to SVG converter is, hopefully one that doesn't have a path for it's text areas that simply don't need one. As we know, PDF and SVG are fairly similar, so I assume there's some good converters out there.

Daisy answered 23/4, 2012 at 20:48 Comment(3)
PDF and SVG are similar in the sense that they are both vector-based formats. That's where the comparison ends I believe.Domett
I suppose they both use a lot of absolute positioning of text.Lurk
If anyone come across this question: TeX.SE has conversion - How can I use TikZ to make standalone (SVG) graphics? - TeX - LaTeX Stack Exchange.Whap
B
87

Inkscape is used by many people on Wikipedia to convert PDF to SVG.

http://inkscape.org/

They even have a handy guide on how to do so!

http://en.wikipedia.org/wiki/Wikipedia:Graphic_Lab/Resources/PDF_conversion_to_SVG#Conversion_with_Inkscape

Batch answered 23/4, 2012 at 20:53 Comment(7)
Inkscape doesn't work too well, as it changes the text into paths, too. I also find that they often lose the font data, but don't seem to approximate a good, installed font. How does PDF display it if SVG can't?Daisy
That's a fair question, I am familar with the both formats but I haven't done alot of research into the topic. I may have a look into it. It think it may boil down to the way that the two formats are build. SVG for example is built with XML while PDF uses it's own XML Type format.Batch
Well, the reason I want this is because I want to be able to edit the text using PHP. I could do it directly with PDF, but PDF can't be inlined easily into HTML, but SVG can. I may just stick with PDF and convert it to JPG in PHP after editing it's values..Daisy
That is probably your best option, sorry I couldn't be of more help to you. Good luck!Batch
@DanRedux: AFAIK, you can switch off the 'font texts to paths' conversion in Inkscape. On the Inkscape commandline you would enable this conversion by adding --export-text-to-path.Bullshit
Fonts look great in my PDF but when I export it to SVG they look really ugly. Is there a way to fix it? I'm using: inkscape -l out.svg in.pdfMame
It may be obvious but Illustrator can convert PDF to SVG. Came here, downloaded Inkscape then realized I had Illustrator. en.wikipedia.org/wiki/Wikipedia:Graphics_Lab/Resources/…Beckon
B
106

You can use Inkscape on the commandline only, without opening a GUI. Try this:

inkscape \
  --without-gui \
  --file=input.pdf \
  --export-plain-svg=output.svg 

For a complete list of all commandline options, run inkscape --help.

Bullshit answered 24/4, 2012 at 0:4 Comment(7)
This removes space in text for me.Limemann
@MaxNoe: That's quite possible -- but then this is a "property" of the way how that particular PDF is constructed, internally. For some explanations of difficulties when it comes to recognize and extract "text" from PDFs, see my hand-coded PDF files (with the embedded comments) at GitHub. (Open them in a text editor of your choices as well as a PDF viewer and copy'n'paste text from the files.)Bullshit
Yeah, i think it has to do with the way tex is rendering whitespace, as boxes.Limemann
--without-gui is deprecated at least with inkscape 1.0.1Foreandaft
With inkscape 1.0.1 (or higher) the command should be inkscape --export-type="svg" input.pdfPeepul
However, I found that pdf2svg (see the answer of pierre) produced better results than inkscapePeepul
Wow, this Inkspace tool is really nice, I didn't even know I had it installed already, must come stock with Ubuntu 20.04. I was able to use this tool to export a diagram from OWASP Threat Dragon to Microsoft PowerPoint with the help of this tool.Microcircuit
B
87

Inkscape is used by many people on Wikipedia to convert PDF to SVG.

http://inkscape.org/

They even have a handy guide on how to do so!

http://en.wikipedia.org/wiki/Wikipedia:Graphic_Lab/Resources/PDF_conversion_to_SVG#Conversion_with_Inkscape

Batch answered 23/4, 2012 at 20:53 Comment(7)
Inkscape doesn't work too well, as it changes the text into paths, too. I also find that they often lose the font data, but don't seem to approximate a good, installed font. How does PDF display it if SVG can't?Daisy
That's a fair question, I am familar with the both formats but I haven't done alot of research into the topic. I may have a look into it. It think it may boil down to the way that the two formats are build. SVG for example is built with XML while PDF uses it's own XML Type format.Batch
Well, the reason I want this is because I want to be able to edit the text using PHP. I could do it directly with PDF, but PDF can't be inlined easily into HTML, but SVG can. I may just stick with PDF and convert it to JPG in PHP after editing it's values..Daisy
That is probably your best option, sorry I couldn't be of more help to you. Good luck!Batch
@DanRedux: AFAIK, you can switch off the 'font texts to paths' conversion in Inkscape. On the Inkscape commandline you would enable this conversion by adding --export-text-to-path.Bullshit
Fonts look great in my PDF but when I export it to SVG they look really ugly. Is there a way to fix it? I'm using: inkscape -l out.svg in.pdfMame
It may be obvious but Illustrator can convert PDF to SVG. Came here, downloaded Inkscape then realized I had Illustrator. en.wikipedia.org/wiki/Wikipedia:Graphics_Lab/Resources/…Beckon
B
25

I am currently using PDFBox which has good support for graphic output. There is good support for extracting the vector strokes and also for managing fonts. There are some good tools for trying it out (e.g. PDFReader will display as Java Graphics2D). You can intercept the graphics tool with an SVG tool like Batik (I do this and it gives good capture).

There is no simple way to convert all PDF to SVG - it depends on the strategy and tools used to create the PDFs. Some text is converted to vectors and cannot be easily reconstructed - you have to install vector fonts and look them up.

UPDATE: I have now developed this into a package PDF2SVG which does not use Batik any more:

which has been tested on a range of PDFs. It produces SVG output consisting of

  • characters as one <svg:text> per character
  • paths as <svg:path>
  • images as <svg:image>

Later packages will (hopefully) convert the characters to running text and the paths to higher-level graphics objects

UPDATE: We can now re-create running text from the SVG characters. We've also converted diagrams to domain-specific XML (e.g. chemical spectra). See https://bitbucket.org/petermr/svg2xml-dev. It's still in Alpha, but is moving at a useful speed. Anyone can join in!

UPDATE. (@Tim Kelty) We are continuing to work on PDF2SVG and also downstream tools that do (limited) Java OCR and creation of higher-level graphics primitives (arrows, boxes, etc.) See https://bitbucket.org/petermr/imageanalysis https://bitbucket.org/petermr/diagramanalyzer https://bitbucket.org/petermr/norma and https://bitbucket.org/petermr/ami-core . This is a funded project to capture 100 million facts from the scientific literature (contentmine.org) much of which is PDF.

Brufsky answered 27/4, 2012 at 21:31 Comment(2)
Has that code moved from Bitbucket to somewhere else?Monosome
It's changed a lot. See github.com/petermr/ami3Brufsky
K
23

This topic is quite old, but here is a handy solution that I found:

http://www.cityinthesky.co.uk/opensource/pdf2svg/

It offers a tool, pdf2png, which once installed does exactly the job in command line. I've tested it with irreproachable results so far, including with bitmaps.

EDIT : My mistake, this tool also converts letters to paths, so it does not address the initial question. However it does a good job anyway, and can be useful to anyone who does not intend to modify the code in the svg file, so I'll leave the post.

Kenlay answered 5/2, 2015 at 22:41 Comment(3)
On Ubuntu you can install it with: $ sudo apt-get install pdf2svgUltramarine
Though it converts letters to paths, the results are great. To make some modifications, I used to edit the SVGs directly with an editor. If you open and save them with inkscape as an inkscape SVG, the code looks better and you have object ids, to easily find the entities, you want to change.Ultramarine
You can install it on Mac with brew install pdf2svg.Conjoin
C
10

Here is the process that I ended up using. The main tool I used was Inkscape which was able to convert text alright.

  • used Adobe Acrobat Pro actions with JavaScript to split-up the PDF sheets
  • ran Inkscape Portable 0.48.5 from Windows Cmd to convert to SVG
  • made some manual edits to a particular SVG XML attribute I was having issues with by using Windows Cmd and Windows PowerShell

Separate Pages: Adobe Acrobat Pro with JavaScript

Using Adobe Acrobat Pro Actions (formerly Batch Processing) create a custom action to separate PDF pages into separate files. Alternatively you may be able to split up PDFs with GhostScript

Acrobat JavaScript Action to split pages

/* Extract Pages to Folder */

var re = /.*\/|\.pdf$/ig;
var filename = this.path.replace(re,"");

{
    for ( var i = 0;  i < this.numPages; i++ )
    this.extractPages
     ({
        nStart: i,
        nEnd: i,
        cPath : filename + "_s" + ("000000" + (i+1)).slice (-3) + ".pdf"
    });
};

PDF to SVG Conversion: Inkscape with Windows CMD batch file

Using Windows Cmd created batch file to loop through all PDF files in a folder and convert them to SVG

Batch file to convert PDF to SVG in current folder

:: ===== SETUP =====
@echo off
CLS
echo Starting SVG conversion...
echo.

:: setup working directory (if different)
REM set "_work_dir=%~dp0"
set "_work_dir=%CD%"

:: setup counter
set "count=1"

:: setup file search and save string
set "_work_x1=pdf"
set "_work_x2=svg"
set "_work_file_str=*.%_work_x1%"

:: setup inkscape commands
set "_inkscape_path=D:\InkscapePortable\App\Inkscape\"
set "_inkscape_cmd=%_inkscape_path%inkscape.exe"

:: ===== FIND FILES IN WORKING DIRECTORY =====
:: Output from DIR last element is single  carriage return character. 
:: Carriage return characters are directly removed after percent expansion, 
:: but not with delayed expansion.

pushd "%_work_dir%"
FOR /f "tokens=*" %%A IN ('DIR /A:-D /O:N /B %_work_file_str%') DO (
    CALL :subroutine "%%A"
)
popd

:: ===== CONVERT PDF TO SVG WITH INKSCAPE =====

:subroutine
echo.
IF NOT [%1]==[] (

    echo %count%:%1
    set /A count+=1

    start "" /D "%_work_dir%" /W "%_inkscape_cmd%" --without-gui --file="%~n1.%_work_x1%" --export-dpi=300 --export-plain-svg="%~n1.%_work_x2%"

) ELSE (
    echo End of output
)
echo.

GOTO :eof

:: ===== INKSCAPE REFERENCE =====

:: print inkscape help
REM "%_inkscape_cmd%" --help > "%~dp0\inkscape_help.txt"
REM "%_inkscape_cmd%" --verb-list > "%~dp0\inkscape_verb_list.txt"

Cleanup attributes: Windows Cmd and PowerShell

I realize it is not best practice to manually brute force edit SVG or XML tags or attributes due to potential variations and should use an XML parser instead. However I had a simple issue where the stroke width on one drawing was very small, and on another the font family was being incorrectly identified, so I basically modified the previous Windows Cmd batch script to do a simple find and replace. The only changes were to the search string definitions and changing to call a PowerShell command. The PowerShell command will perform a find and replace and save the modified file with an added suffix. I did find some other references that could be better used to parse or modify the resultant SVG files if some other minor cleanup is needed to be performed.

Modifications to manually find and replace SVG XML data

:: setup file search and save string
set "_work_x1=svg"
set "_work_x2=svg"
set "_work_s2=_mod"
set "_work_file_str=*.%_work_x1%"

powershell -Command "(Get-Content '%~n1.%_work_x1%') | ForEach-Object {$_ -replace 'stroke-width:0.06', 'stroke-width:1'} | ForEach-Object {$_ -replace 'font-family:Times Roman','font-family:Times New Roman'} | Set-Content '%~n1%_work_s2%.%_work_x2%'"

Hope this might help someone

References

Adobe Acrobat Pro Actions and JavaScript references to Separate Pages

GhostScript references to Separate Pages

Inkscape Command Line references for PDF to SVG Conversion

Windows Cmd Batch File Script references

XML tag/attribute replacement research

Casar answered 29/5, 2015 at 20:18 Comment(1)
Thanks. I modified your command-line and used for /l %i in (1,1,58) Do @%inkscape% --pdf-page %i ... to separate the pages and convert them directly in svgOdelia
C
9

If DVI to SVG is an option, you can also use dvisvgm to convert a DVI file to an SVG file. This works perfectly for instance for LaTeX formulas (with option --no-fonts):

dvisvgm --no-fonts input.dvi -o output.svg

There is also pdf2svg which uses poppler and Cairo to convert a pdf into SVG. When I tried this, the SVG was perfectly rendered in inkscape.

Christal answered 3/6, 2013 at 8:42 Comment(2)
I have a PDF which renders some LaTeX symbols from the skak package (chess pieces). This particular file isn't well handled in Inkscape, since symbols becomes Arial letters... I've got correct results with pdf2svg.Hamlet
For Windows systems there is a set of compiled binary tools here: Poppler for Windows.Fixity
R
7

Bash script to convert each page of a PDF into its own SVG file.

#!/bin/bash
#
#  Make one PDF per page using PDF toolkit.
#  Convert this PDF to SVG using inkscape
#

inputPdf=$1

pageCnt=$(pdftk $inputPdf dump_data | grep NumberOfPages | cut -d " " -f 2)

for i in $(seq 1 $pageCnt); do
    echo "converting page $i..."
    pdftk ${inputPdf} cat $i output ${inputPdf%%.*}_${i}.pdf
    inkscape --without-gui "--file=${inputPdf%%.*}_${i}.pdf" "--export-plain-svg=${inputPdf%%.*}_${i}.svg"
done

To generate in png, use --export-png, etc...

Rescissory answered 6/12, 2015 at 16:2 Comment(0)
S
1

I found that xfig did an excellent job:

pstoedit -f fig foo.pdf foo.fig
xfig foo.fig

export to svg

It did much better job than inkscape. Actually it was probably pdtoedit that did it.

Sparteine answered 14/3, 2014 at 14:20 Comment(2)
Links: pstoedit xfigRomeoromeon
I find that you need pdf2ps first.Whap
P
1

Here is the NodeJS REST api for two PDF render scripts. https://github.com/pumppi/pdf2images

Scripts are: pdf2svg and Imagemagicks convert

Piscary answered 3/4, 2016 at 8:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.