Converting all files in a folder to md using pandoc on Mac
Asked Answered
B

5

28

I am trying to convert an entire directory from html into markdown. The directory tree is quite tall, so there are files nested two and three levels down.

In answering this question, John MacFarlane suggested using the following Makefile:

TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))

.PHONY : all

all : $(MDS)

$(TXTDIR) :
    mkdir $(TXTDIR)

$(TXTDIR)/%.markdown : %.html $(TXTDIR)
    pandoc -f html -t markdown -s $< -o $@

Now, this doesn't seem to go inside subdirectories. Is there any easy way to modify this so that it will process the entire tree?

I don't need this to be in make. All I'm looking for is a way of getting a mirror of the initial directory where each html file is replaced by the output of running pandoc on that file.

(I suspect something along these lines should help, but I'm far from confident that I won't break things if I try to go at it on my own. I'm illiterate when it comes to GNU make).)

Biotype answered 30/9, 2014 at 17:9 Comment(2)
If you don't know make, maybe you just try to write your own script in your favourite language, e.g. Python or Ruby? (sorry to not be of more help right now)Cryptoclastic
Yeah, I may just try that instead.Biotype
N
47

Since you mentioned you don't mind not using make, you can try bash.

I modified the code from this answer, use in the parent directory:

find ./ -iname "*.md" -type f -exec sh -c 'pandoc "${0}" -o "${0%.md}.pdf"' {} \;

It worked when I tested it, so it should work for you.

As per the request Any ideas how to specify the output folder? (Using html as the original file and md as the output):

find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "./output/$(basename ${0%.html}.md)"' {} \;

I have tested this and it works for me.

Edit: As per a comment, the {} \; when used with find and the -exec option is used as a, more or less, placeholder for where the filename should be. As in it expands the filenames found to be placed in the command. The \; ends the -exec. See here for more explanation.

Nesto answered 10/10, 2014 at 16:37 Comment(6)
Thanks. Just to clarify. To get it to do what I want (viz. take html files and output md files) it should be: find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \;, right? Any ideas how to specify the output folder? (As it is it just puts the md file in the same folder as the corresponding html one.Biotype
This results in the following error on my machine: pandoc: : openFile: does not exist (No such file or directory). Files are found, but ${0} appears to be empty.Tala
What does the ` {} \;` at the end do?Emerson
Let me know if I should open a new question, but when I run this pandoc can't find any included files. For example, I have html and markdown files with images in subfolders (relative to the .md file). <img src="Notes/Untitled.png"/> OR ![Notes/Untitled.png](Notes/Untitled.png) Gives warning: Could not fetch resource 'Notes/Untitled.png': PandocResourceNotFound "Notes/Untitled.png" However, if I run pandoc directly on a markdown/html file it can find the image in the subfolder.Sacrarium
@SamTuke Make sure to have a folder output as the generated files are directed to it.Bridgework
It may break if you have spaces in your filenames. To avoid it try for n in *; do pandoc "$n" -o "./output/${n%.html}.md"; doneMoffett
E
1

This is how I did it!

files=($(find ${INPUT_FOLDER} -type f -name '*.md'))
for item in ${files[*]}
do
  printf "   %s\n" $item
  install -d ${DIR}/build/$item
  pandoc $item -f markdown -t html -o ${DIR}/build/$item.html;
  rm -Rf ${DIR}/build/$item
done
Eagle answered 18/5, 2016 at 15:34 Comment(0)
M
1

I've created a python script for converting all files under a folder tree which have a given suffix. It's called Pandoc-Folder. It might be useful, so I've put it on github: https://github.com/andrewrproper/pandoc-folder

You can create a settings folder and file (YAML format), and then run it like this:

python pandoc-folder.py ./path/to/book/.pandoc-folder/settings-file.yml

there is an example-book folder and matching .bat and .sh scripts for how to convert the markdown from the example-book folder into a single output file.

I hope this might be useful to someone.

Megadeath answered 15/9, 2021 at 22:15 Comment(0)
T
0

John MacFarlane's answer is almost right. However, one needs to create the subfolder for pandoc, in case it doesn't exist. This is how I'd do it:

TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))

.PHONY : all

all : $(MDS)

$(TXTDIR)/%.markdown : %.html $(TXTDIR)
    mkdir -p $(dir $@)
    pandoc -f html -t markdown -s $< -o $@
Tally answered 23/5, 2022 at 6:41 Comment(0)
V
0

This is a solution using ipython:

from pathlib import Path
files = [path for path in Path('.').rglob('*.html')]
for f in files:
    !pandoc -s {str(path)} -o {path.name.replace(".html",".md")} 

Note that you must execute the command inside the directory where you keep the HTML files, and your file will be saved in the same directory. In case just change the output path.

Vibrant answered 10/12, 2022 at 17:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.