Compare commits
9 commits
Author | SHA1 | Date | |
---|---|---|---|
|
7f2ccd2d29 | ||
|
08af659366 | ||
|
ac80507b5f | ||
|
44b28902db | ||
|
4aa6bf978f | ||
|
ebd6230bdb | ||
|
f69fb619cb | ||
|
10d8574ace | ||
|
916c5d4f36 |
81
README.md
81
README.md
|
@ -4,82 +4,9 @@ A Clojure library designed to convert
|
|||
([Enlive](https://github.com/cgrand/enlive)ned) HTML to markdown; but, more
|
||||
generally, a framework for [HT|SG|X]ML transformation.
|
||||
|
||||
## Introduction
|
||||
[Documentation is here](https://simon-brooke.github.io/html-to-md/). In
|
||||
particular, please read the
|
||||
[introduction](https://simon-brooke.github.io/html-to-md/intro.html), which
|
||||
contains everything you want to know.
|
||||
|
||||
The itch I'm trying to scratch at present is to transform
|
||||
[Blogger.com](http://www.blogger.com)'s dreadful tag-soup markup into markdown;
|
||||
but my architecture for doing this is to build a completely general [HT|SG|X]ML
|
||||
transformation framework and then specialise it.
|
||||
|
||||
**WARNING:** this is presently alpha-quality code, although it does have fair
|
||||
unit test coverage.
|
||||
|
||||
## Usage
|
||||
|
||||
To use this library in your project, add the following leiningen dependency:
|
||||
|
||||
[org.clojars.simon_brooke/html-to-md "0.3.0"]
|
||||
|
||||
To use it in your namespace, require:
|
||||
|
||||
[html-to-md.core :refer [html-to-md]]
|
||||
|
||||
For default usage, that's all you need. To play more sophisticated tricks,
|
||||
consider:
|
||||
|
||||
[html-to-md.transformer :refer [transform process]]
|
||||
[html-to-md.html-to-md :refer [markdown-dispatcher]]
|
||||
|
||||
The intended usage is as follows:
|
||||
|
||||
```clojure
|
||||
(require '[html-to-md.core :refer [html-to-md]])
|
||||
|
||||
(html-to-md url output-file)
|
||||
```
|
||||
|
||||
This will read (X)HTML from `url` and write Markdown to `output-file`. If
|
||||
`output-file` is not supplied, it will return the markdown as a string:
|
||||
|
||||
```clojure
|
||||
(require '[html-to-md.core :refer [html-to-md]])
|
||||
|
||||
(def md (html-to-md url))
|
||||
```
|
||||
|
||||
If you are specifically scraping [blogger.com](https://www.blogger.com/")
|
||||
pages, you may *try* the following recipe:
|
||||
|
||||
```clojure
|
||||
(require '[html-to-md.core :refer [blogger-to-md]])
|
||||
|
||||
(blogger-to-md url output-file)
|
||||
```
|
||||
|
||||
It works for my blogger pages. However, I'm not sure to what extent the
|
||||
skinning of blogger pages is pure CSS (in which case my recipe should work
|
||||
for yours) and to what extent it's HTML templating (in which case it
|
||||
probably won't). Results not guaranteed, if it doesn't work you get to
|
||||
keep all the pieces.
|
||||
|
||||
## Extending the transformer
|
||||
|
||||
In principle, the transformer can transform any [HT|SG|X]ML markup into any
|
||||
other, or into any textual form. To extend it to do something other than
|
||||
markdown, supply a **dispatcher**. A dispatcher is essentially a function of one
|
||||
argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns
|
||||
a **processor,** which should be a function of two arguments, an element assumed
|
||||
to have that tag, and a dispatcher. The processor should return the value that
|
||||
you want elements of that tag transformed into.
|
||||
|
||||
Obviously it is convenient to write dispatchers as maps, but it isn't required
|
||||
that you do so: anything which, given a keyword, will return a processor, will
|
||||
work.
|
||||
|
||||
## License
|
||||
|
||||
Copyright © 2019 Simon Brooke <simon@journeyman.cc>
|
||||
|
||||
Distributed under the Eclipse Public License either version 1.0 or (at
|
||||
your option) any later version.
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
(defproject html-to-md "0.3.0"
|
||||
(defproject html-to-md "0.4.0-SNAPSHOT"
|
||||
:description "Convert (Enlivened) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation."
|
||||
:url "https://github.com/simon-brooke/html-to-md"
|
||||
:license {:name "Eclipse Public License"
|
||||
|
|
|
@ -93,6 +93,4 @@
|
|||
(if url (transform url dispatcher)
|
||||
;; otherwise, if s is not a URL, consider it as an HTML fragment,
|
||||
;; parse and process it
|
||||
(process (tagsoup/parser (java.io.StringReader s)) dispatcher)
|
||||
)))
|
||||
|
||||
(process (tagsoup/parser (java.io.StringReader. s)) dispatcher))))
|
||||
|
|
10
test/html_to_md/transformer_test.clj
Normal file
10
test/html_to_md/transformer_test.clj
Normal file
|
@ -0,0 +1,10 @@
|
|||
(ns html-to-md.transformer-test
|
||||
(:require
|
||||
[clojure.test :as t :refer [deftest is testing]]
|
||||
[html-to-md.html-to-md :refer [markdown-dispatcher]]
|
||||
[html-to-md.transformer :refer [transform]]))
|
||||
|
||||
(deftest transform-payload
|
||||
(testing "String `obj` for: 3. A string representation of an (X)HTML fragment;"
|
||||
(is (= '("\n# This is a header\n")
|
||||
(transform "<h1>This is a header</h1>" markdown-dispatcher)))))
|
Loading…
Reference in a new issue