diff --git a/README.md b/README.md index 0223912..2f50ecd 100644 --- a/README.md +++ b/README.md @@ -4,82 +4,9 @@ A Clojure library designed to convert ([Enlive](https://github.com/cgrand/enlive)ned) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation. -## Introduction +[Documentation is here](https://simon-brooke.github.io/html-to-md/). In +particular, please read the +[introduction](https://simon-brooke.github.io/html-to-md/intro.html), which +contains everything you want to know. -The itch I'm trying to scratch at present is to transform -[Blogger.com](http://www.blogger.com)'s dreadful tag-soup markup into markdown; -but my architecture for doing this is to build a completely general [HT|SG|X]ML -transformation framework and then specialise it. - -**WARNING:** this is presently alpha-quality code, although it does have fair -unit test coverage. - -## Usage - -To use this library in your project, add the following leiningen dependency: - - [org.clojars.simon_brooke/html-to-md "0.3.0"] - -To use it in your namespace, require: - - [html-to-md.core :refer [html-to-md]] - -For default usage, that's all you need. To play more sophisticated tricks, -consider: - - [html-to-md.transformer :refer [transform process]] - [html-to-md.html-to-md :refer [markdown-dispatcher]] - -The intended usage is as follows: - -```clojure -(require '[html-to-md.core :refer [html-to-md]]) - -(html-to-md url output-file) -``` - -This will read (X)HTML from `url` and write Markdown to `output-file`. If -`output-file` is not supplied, it will return the markdown as a string: - -```clojure -(require '[html-to-md.core :refer [html-to-md]]) - -(def md (html-to-md url)) -``` - -If you are specifically scraping [blogger.com](https://www.blogger.com/") -pages, you may *try* the following recipe: - -```clojure -(require '[html-to-md.core :refer [blogger-to-md]]) - -(blogger-to-md url output-file) -``` - -It works for my blogger pages. However, I'm not sure to what extent the -skinning of blogger pages is pure CSS (in which case my recipe should work -for yours) and to what extent it's HTML templating (in which case it -probably won't). Results not guaranteed, if it doesn't work you get to -keep all the pieces. - -## Extending the transformer - -In principle, the transformer can transform any [HT|SG|X]ML markup into any -other, or into any textual form. To extend it to do something other than -markdown, supply a **dispatcher**. A dispatcher is essentially a function of one -argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns -a **processor,** which should be a function of two arguments, an element assumed -to have that tag, and a dispatcher. The processor should return the value that -you want elements of that tag transformed into. - -Obviously it is convenient to write dispatchers as maps, but it isn't required -that you do so: anything which, given a keyword, will return a processor, will -work. - -## License - -Copyright © 2019 Simon Brooke - -Distributed under the Eclipse Public License either version 1.0 or (at -your option) any later version. diff --git a/docs/html-to-md.blogger-to-md.html b/docs/html-to-md.blogger-to-md.html index e14c706..56285f1 100644 --- a/docs/html-to-md.blogger-to-md.html +++ b/docs/html-to-md.blogger-to-md.html @@ -1,3 +1,3 @@ -html-to-md.blogger-to-md documentation

html-to-md.blogger-to-md

Convert blogger posts to Markdown format, omitting all the Blogger chrome and navigation.

blogger-dispatcher

Adaptation of markdown-dispatcher, q.v., with the :table and :html dispatches overridden.

blogger-scraper

(blogger-scraper e d)

Processor which scrapes the actual post content out of a blogger page. NOTE: This was written to scrape my blogger pages, yours may be different!

image-table-processor

(image-table-processor e d)

Blogger’s horrible tag soup wraps images in tables. Is this table such a table? If so extract the image from it and process it to markdown; otherwise, fall back on what markdown-dispatcher would do with the table (which is currently nothing, but that will change).

\ No newline at end of file +html-to-md.blogger-to-md documentation

html-to-md.blogger-to-md

Convert blogger posts to Markdown format, omitting all the Blogger chrome and navigation.

blogger-dispatcher

Adaptation of markdown-dispatcher, q.v., with the :table and :html dispatches overridden.

blogger-scraper

(blogger-scraper e d)

Processor which scrapes the actual post content out of a blogger page. NOTE: This was written to scrape my blogger pages, yours may be different!

image-table-processor

(image-table-processor e d)

Blogger’s horrible tag soup wraps images in tables. Is this table such a table? If so extract the image from it and process it to markdown; otherwise, fall back on what markdown-dispatcher would do with the table (which is currently nothing, but that will change).

\ No newline at end of file diff --git a/docs/html-to-md.core.html b/docs/html-to-md.core.html index 7f95066..c9fdebe 100644 --- a/docs/html-to-md.core.html +++ b/docs/html-to-md.core.html @@ -1,3 +1,3 @@ -html-to-md.core documentation

html-to-md.core

Top level functions intended for very simple use.

blogger-to-md

(blogger-to-md url)(blogger-to-md url output)

Transform the Blogger post referenced by url into Markdown, and write it to output, if supplied. NOTE: This was written to scrape my blogger pages, yours may be different!

html-to-md

(html-to-md url)(html-to-md url output)

Transform the HTML document referenced by url into Markdown, and write it to output, if supplied.

\ No newline at end of file +html-to-md.core documentation

html-to-md.core

Top level functions intended for very simple use.

blogger-to-md

(blogger-to-md url)(blogger-to-md url output)

Transform the Blogger post referenced by url into Markdown, and write it to output, if supplied. NOTE: This was written to scrape my blogger pages, yours may be different!

html-to-md

(html-to-md url)(html-to-md url output)

Transform the HTML document referenced by url into Markdown, and write it to output, if supplied.

\ No newline at end of file diff --git a/docs/html-to-md.html-to-md.html b/docs/html-to-md.html-to-md.html index 73138d9..6be170d 100644 --- a/docs/html-to-md.html-to-md.html +++ b/docs/html-to-md.html-to-md.html @@ -1,3 +1,3 @@ -html-to-md.html-to-md documentation

html-to-md.html-to-md

Transform general HTML to Markdown, as faithfully as is reasonably possible.

markdown-a

(markdown-a e d)

Process the anchor element e into markdown, using dispatcher d.

markdown-br

(markdown-br e d)

Process the line-break element e, so beloved of tag-soupers, into markdown

markdown-code

(markdown-code e d)

Process the code or samp e into markdown, using dispatcher d.

markdown-default

(markdown-default e d)

Process an element e for which we have no other function into markdown, using dispatcher d.

markdown-dispatcher

A dispatcher for transforming (X)HTML into Markdown.

markdown-div

(markdown-div e d)

Process the division element e into markdown, using dispatcher d.

markdown-em

(markdown-em e d)

Process the emphasis element e into markdown, using dispatcher d.

markdown-h1

(markdown-h1 e d)

Process the header element e into markdown, with level 1, using dispatcher d.

markdown-h2

(markdown-h2 e d)

Process the header element e into markdown, with level 2, using dispatcher d.

markdown-h3

(markdown-h3 e d)

Process the header element e into markdown, with level 3, using dispatcher d.

markdown-h4

(markdown-h4 e d)

Process the header element e into markdown, with level 4, using dispatcher d.

markdown-h5

(markdown-h5 e d)

Process the header element e into markdown, with level 5, using dispatcher d.

markdown-h6

(markdown-h6 e d)

Process the header element e into markdown, with level 6, using dispatcher d.

markdown-header

(markdown-header e d level)

Process the header element e into markdown, with level level, using dispatcher d.

markdown-html

(markdown-html e d)

Process this HTML element e into markdown, using dispatcher d.

markdown-img

(markdown-img e d)

Process this image element e into markdown, using dispatcher d.

markdown-ol

(markdown-ol e d)

Process this ordered list element e into markdown, using dispatcher d.

markdown-omit

(markdown-omit e d)

Don’t process the element e into markdown, but return nil.

markdown-pre

(markdown-pre e d)

Process the preformatted emphasis element e into markdown, using dispatcher d.

markdown-strong

(markdown-strong e d)

Process the strong emphasis element e into markdown, using dispatcher d.

markdown-ul

(markdown-ul e d)

Process this unordered list element e into markdown, using dispatcher d.

\ No newline at end of file +html-to-md.html-to-md documentation

html-to-md.html-to-md

Transform general HTML to Markdown, as faithfully as is reasonably possible.

markdown-a

(markdown-a e d)

Process the anchor element e into markdown, using dispatcher d.

markdown-br

(markdown-br e d)

Process the line-break element e, so beloved of tag-soupers, into markdown

markdown-code

(markdown-code e d)

Process the code or samp e into markdown, using dispatcher d.

markdown-default

(markdown-default e d)

Process an element e for which we have no other function into markdown, using dispatcher d.

markdown-dispatcher

A dispatcher for transforming (X)HTML into Markdown.

markdown-div

(markdown-div e d)

Process the division element e into markdown, using dispatcher d.

markdown-em

(markdown-em e d)

Process the emphasis element e into markdown, using dispatcher d.

markdown-h1

(markdown-h1 e d)

Process the header element e into markdown, with level 1, using dispatcher d.

markdown-h2

(markdown-h2 e d)

Process the header element e into markdown, with level 2, using dispatcher d.

markdown-h3

(markdown-h3 e d)

Process the header element e into markdown, with level 3, using dispatcher d.

markdown-h4

(markdown-h4 e d)

Process the header element e into markdown, with level 4, using dispatcher d.

markdown-h5

(markdown-h5 e d)

Process the header element e into markdown, with level 5, using dispatcher d.

markdown-h6

(markdown-h6 e d)

Process the header element e into markdown, with level 6, using dispatcher d.

markdown-header

(markdown-header e d level)

Process the header element e into markdown, with level level, using dispatcher d.

markdown-html

(markdown-html e d)

Process this HTML element e into markdown, using dispatcher d.

markdown-img

(markdown-img e d)

Process this image element e into markdown, using dispatcher d.

markdown-ol

(markdown-ol e d)

Process this ordered list element e into markdown, using dispatcher d.

markdown-omit

(markdown-omit e d)

Don’t process the element e into markdown, but return nil.

markdown-pre

(markdown-pre e d)

Process the preformatted emphasis element e into markdown, using dispatcher d.

markdown-strong

(markdown-strong e d)

Process the strong emphasis element e into markdown, using dispatcher d.

markdown-ul

(markdown-ul e d)

Process this unordered list element e into markdown, using dispatcher d.

\ No newline at end of file diff --git a/docs/html-to-md.transformer.html b/docs/html-to-md.transformer.html index 64d28e8..5867b5f 100644 --- a/docs/html-to-md.transformer.html +++ b/docs/html-to-md.transformer.html @@ -1,6 +1,6 @@ -html-to-md.transformer documentation

html-to-md.transformer

The actual transformation engine, which is actually far more general than just something to generate Markdown. It isn’t as general as XSL-T but can nevertheless do a great deal of transformation on [HT|SG|X]ML documents.

+html-to-md.transformer documentation

html-to-md.transformer

The actual transformation engine, which is actually far more general than just something to generate Markdown. It isn’t as general as XSL-T but can nevertheless do a great deal of transformation on [HT|SG|X]ML documents.

Terminology

In this documentation the following terminology is used: