Automatically Converting HTML to PDF on Mac
It’s amazing just how difficult it can be to automatically render HTML files to lightweight PDF documents. After a lot of frustration, I found a welcome solution in the URL2PDF utility.
I like using simple text-based formats (like Markdown and HTML) to write up documentation and guides, rather than anything fancy like ePub. It’s more transportable, more transformable, and plays nicely with version control. End users, however, often prefer nicely formatted PDF guides.
The easiest way to generate an accurate PDF representation of a HTML document is to open it up in a browser and use the in-built “Print to PDF” option - but this isn’t particularly scalable and doesn’t gel well when you’d like to automate your documentation build process.
Finding a way to automate this conversion might seem like a pretty straightforward task - but I really struggled to find much that was suitable. I was after a simple command-line solution that I could drop into a Makefile build process and be done with it, but every solution I came across seemed to have problems. Here are a few things I tried:
wkhtml2pdf
The “de facto” command line utility for rendering HTML to PDF. It does render accurately - but does so by rendering pages as images inside the PDF! So, you get massive file sizes with none of the standard interactivity of PDFs (text selection, links, etc).
html2pdf
Another solution that’s been around for a while that uses FPDF and PHP. Another non-ideal solution with parsing problems and large file output sizes.
Automator / Applescript
I tried hacking together an “open page, Print as PDF” solution using OS X’s Automator (and when that failed, writing something more low-level in AppleScript), but all solutions suggested seemed very hacky (“click button 1 of dialog 2 of window 1 of window 2”) and running an Automator task isn’t exactly what you’d call a “command-line” process.
Finally, in desperation, I stumbled across Scott Garner’s URL2PDF utility. It’s a wrapper around Foundation’s native PDFDownloader
class. It’s simple, open source and works wonders - after building the project and installing, conversion is now just:
url2pdf --url=file:///path/to/local/file.html
URL2PDF is available on GitHub, or if you’d like to tip Scott for this awesome utility, you can grab it here.