Blogger scraper tidied up and documented.

This commit is contained in:
Simon Brooke 2019-04-30 20:11:44 +01:00
parent cb801b193f
commit 80cc2e4335
6 changed files with 87 additions and 7 deletions

View file

@ -1,9 +1,5 @@
# Introduction to html-to-md
TODO: write [great documentation](http://jacobian.org/writing/what-to-write/)
## Introduction
The itch I'm trying to scratch at present is to transform
[Blogger.com](http://www.blogger.com)'s dreadful tag-soup markup into markdown;
but my architecture for doing this is to build a completely general [HT|SG|X]ML
@ -45,6 +41,21 @@ This will read (X)HTML from `url` and write Markdown to `output-file`. If
(def md (html-to-md url))
```
If you are specifically scraping [blogger.com](https://www.blogger.com/")
pages, you may *try* the following recipe:
```clojure
(require '[html-to-md.core :refer [blogger-to-md]])
(blogger-to-md url output-file)
```
It works for my blogger pages. However, I'm not sure to what extent the
skinning of blogger pages is pure CSS (in which case my recipe should work
for yours) and to what extent it's HTML templating (in which case it
probably won't). Results not guaranteed, if it doesn't work you get to
keep all the pieces.
## Extending the transformer
In principle, the transformer can transform any [HT|SG|X]ML markup into any