html-to-md/doc/intro.md

3.2 KiB

Introduction to html-to-md

TODO: write great documentation

Introduction

The itch I'm trying to scratch at present is to transform Blogger.com's dreadful tag-soup markup into markdown; but my architecture for doing this is to build a completely general [HT|SG|X]ML transformation framework and then specialise it.

WARNING: this is presently alpha-quality code, although it does have fair unit test coverage.

Usage

To use this library in your project, add the following leiningen dependency:

[org.clojars.simon_brooke/html-to-md "0.1.0"]

To use it in your namespace, require:

[html-to-md.core :refer [html-to-md]]

For default usage, that's all you need. To play more sophisticated tricks, consider:

[html-to-md.transformer :refer [transform process]]
[html-to-md.html-to-md :refer [markdown-dispatcher]]

The intended usage is as follows:

(require '[html-to-md.core :refer [html-to-md]])

(html-to-md url output-file)

This will read (X)HTML from url and write Markdown to output-file. If output-file is not supplied, it will return the markdown as a string:

(require '[html-to-md.core :refer [html-to-md]])

(def md (html-to-md url))

Extending the transformer

In principle, the transformer can transform any [HT|SG|X]ML markup into any other, or into any textual form. To extend it to do something other than markdown, supply a dispatcher. A dispatcher is essentially a function of one argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns a processor, which should be a function of two arguments, an element assumed to have that tag, and a dispatcher. The processor should return the value that you want elements of that tag transformed into.

Thus the html-to-md.html-to-md namespace comprises a number of processor functions, such as this one:

(defn markdown-a
    "Process the anchor element `e` into markdown, using dispatcher `d`."
    [e d]
    (str
        "["
        (s/trim (apply str (process (:content e) d)))
        "]("
        (-> e :attrs :href)
        ")"))

and a dispatcher map:

(def markdown-dispatcher
    "A despatcher for transforming (X)HTML into Markdown."
    {:a markdown-a
     :b markdown-strong
     :br markdown-br
     :code markdown-code
     :body markdown-default
     :div markdown-div
     :em markdown-em
     :h1 markdown-h1
     :h2 markdown-h2
     :h3 markdown-h3
     :h4 markdown-h4
     :h5 markdown-h5
     :h6 markdown-h6
     :html markdown-html
     :i markdown-em
     :img markdown-img
     :ol markdown-ol
     :p markdown-div
     :pre markdown-pre
     :samp markdown-code
     :script markdown-omit
     :span markdown-default
     :strong markdown-strong
     :style markdown-omit
     :ul markdown-ul
     })

Obviously it is convenient to write dispatchers as maps, but it isn't required that you do so: anything which, given a keyword, will return a processor, will work.

License

Copyright © 2019 Simon Brooke simon@journeyman.cc

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.