How to save a web page as PDF, HTML or an image/screenshot (after running the javascript)

We have a need to save our web pages (of hikes, complete with maps) as PDF's, so they work offline, i.e. we needed to save the page after all the javascript had run, and the map tiles has loaded.

Previously, we used phantomjs. It was designed for testing (save the page as HTML, and perform tests to make sure it has rendered correctly), but there was also a save as PDF facility. However, there were issues with the way the maps were displayed, and some inconsistencies with the CSS.

Now there is a munch easier way. Run Google Chrome from the command line, in 'headless' mode, with a 'save-as-pdf' command line option. There is also a save-as-html option for testers, and a save-as-image option (i.e. a screenshot), for documentation writers.

At the time of writing (July 2017), this works on Mac and Linux only - its coming to Windows.

Step 1 : Install Chrome

On Ubuntu: https://www.ubuntuupdates.org/ppa/google_chrome

Step 2 : Run Google Chrome in 'headless mode' from the command line

google-chrome-stable --headless --disable-gpu --print-to-pdf=output.pdf https://www.example.com

Testing is easy. It should look the same as print preview in Chrome. We previously had issues with the rendering engine being different in phantomjs and chrome. So now, if some CSS, like page-break-inside is implemented in Chrome, it'll be implemented in the PDF as well. Happy days.

Or, for a screenshot:

google-chrome-stable --headless --disable-gpu --screenshot=output.png https://www.example.com

Or, for the HTML (the contents of the <body> tag)

google-chrome-stable --headless --disable-gpu --dump-dom https://www.example.com > example.html

So now there is no excuse for not testing. For example, I test that a particular map tile (https://maps.google.com/...png) is included in a page.

For more information

https://developers.google.com/web/updates/2017/04/headless-chrome


No comments:

Post a Comment