Somewhat related, we were playing with trying to generate pdfs of web content with a bunch of external content and started try out this library which takes a page and inlines all the content (js, css, images)
https://github.com/mitechie/python-webpage-inliner
In this way you can download a single .html file and then store that and still get all the extra bits (minus dynamically loaded content like hover icons and such) the site used.

