diff --git a/README.md b/README.md index 0223912..2f50ecd 100644 --- a/README.md +++ b/README.md @@ -4,82 +4,9 @@ A Clojure library designed to convert ([Enlive](https://github.com/cgrand/enlive)ned) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation. -## Introduction +[Documentation is here](https://simon-brooke.github.io/html-to-md/). In +particular, please read the +[introduction](https://simon-brooke.github.io/html-to-md/intro.html), which +contains everything you want to know. -The itch I'm trying to scratch at present is to transform -[Blogger.com](http://www.blogger.com)'s dreadful tag-soup markup into markdown; -but my architecture for doing this is to build a completely general [HT|SG|X]ML -transformation framework and then specialise it. - -**WARNING:** this is presently alpha-quality code, although it does have fair -unit test coverage. - -## Usage - -To use this library in your project, add the following leiningen dependency: - - [org.clojars.simon_brooke/html-to-md "0.3.0"] - -To use it in your namespace, require: - - [html-to-md.core :refer [html-to-md]] - -For default usage, that's all you need. To play more sophisticated tricks, -consider: - - [html-to-md.transformer :refer [transform process]] - [html-to-md.html-to-md :refer [markdown-dispatcher]] - -The intended usage is as follows: - -```clojure -(require '[html-to-md.core :refer [html-to-md]]) - -(html-to-md url output-file) -``` - -This will read (X)HTML from `url` and write Markdown to `output-file`. If -`output-file` is not supplied, it will return the markdown as a string: - -```clojure -(require '[html-to-md.core :refer [html-to-md]]) - -(def md (html-to-md url)) -``` - -If you are specifically scraping [blogger.com](https://www.blogger.com/") -pages, you may *try* the following recipe: - -```clojure -(require '[html-to-md.core :refer [blogger-to-md]]) - -(blogger-to-md url output-file) -``` - -It works for my blogger pages. However, I'm not sure to what extent the -skinning of blogger pages is pure CSS (in which case my recipe should work -for yours) and to what extent it's HTML templating (in which case it -probably won't). Results not guaranteed, if it doesn't work you get to -keep all the pieces. - -## Extending the transformer - -In principle, the transformer can transform any [HT|SG|X]ML markup into any -other, or into any textual form. To extend it to do something other than -markdown, supply a **dispatcher**. A dispatcher is essentially a function of one -argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns -a **processor,** which should be a function of two arguments, an element assumed -to have that tag, and a dispatcher. The processor should return the value that -you want elements of that tag transformed into. - -Obviously it is convenient to write dispatchers as maps, but it isn't required -that you do so: anything which, given a keyword, will return a processor, will -work. - -## License - -Copyright © 2019 Simon Brooke - -Distributed under the Eclipse Public License either version 1.0 or (at -your option) any later version. diff --git a/project.clj b/project.clj index 578d571..797c09d 100644 --- a/project.clj +++ b/project.clj @@ -1,4 +1,4 @@ -(defproject html-to-md "0.3.0" +(defproject html-to-md "0.4.0-SNAPSHOT" :description "Convert (Enlivened) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation." :url "https://github.com/simon-brooke/html-to-md" :license {:name "Eclipse Public License" diff --git a/src/html_to_md/transformer.clj b/src/html_to_md/transformer.clj index 5933b3c..445aba5 100644 --- a/src/html_to_md/transformer.clj +++ b/src/html_to_md/transformer.clj @@ -93,6 +93,4 @@ (if url (transform url dispatcher) ;; otherwise, if s is not a URL, consider it as an HTML fragment, ;; parse and process it - (process (tagsoup/parser (java.io.StringReader s)) dispatcher) - ))) - + (process (tagsoup/parser (java.io.StringReader. s)) dispatcher)))) diff --git a/test/html_to_md/transformer_test.clj b/test/html_to_md/transformer_test.clj new file mode 100644 index 0000000..48369a4 --- /dev/null +++ b/test/html_to_md/transformer_test.clj @@ -0,0 +1,10 @@ +(ns html-to-md.transformer-test + (:require + [clojure.test :as t :refer [deftest is testing]] + [html-to-md.html-to-md :refer [markdown-dispatcher]] + [html-to-md.transformer :refer [transform]])) + +(deftest transform-payload + (testing "String `obj` for: 3. A string representation of an (X)HTML fragment;" + (is (= '("\n# This is a header\n") + (transform "

This is a header

" markdown-dispatcher)))))