Compare commits

..

4 commits

10 changed files with 88 additions and 23 deletions

View file

@ -4,9 +4,82 @@ A Clojure library designed to convert
([Enlive](https://github.com/cgrand/enlive)ned) HTML to markdown; but, more ([Enlive](https://github.com/cgrand/enlive)ned) HTML to markdown; but, more
generally, a framework for [HT|SG|X]ML transformation. generally, a framework for [HT|SG|X]ML transformation.
[Documentation is here](https://simon-brooke.github.io/html-to-md/). In ## Introduction
particular, please read the
[introduction](https://simon-brooke.github.io/html-to-md/intro.html), which
contains everything you want to know.
The itch I'm trying to scratch at present is to transform
[Blogger.com](http://www.blogger.com)'s dreadful tag-soup markup into markdown;
but my architecture for doing this is to build a completely general [HT|SG|X]ML
transformation framework and then specialise it.
**WARNING:** this is presently alpha-quality code, although it does have fair
unit test coverage.
## Usage
To use this library in your project, add the following leiningen dependency:
[org.clojars.simon_brooke/html-to-md "0.3.0"]
To use it in your namespace, require:
[html-to-md.core :refer [html-to-md]]
For default usage, that's all you need. To play more sophisticated tricks,
consider:
[html-to-md.transformer :refer [transform process]]
[html-to-md.html-to-md :refer [markdown-dispatcher]]
The intended usage is as follows:
```clojure
(require '[html-to-md.core :refer [html-to-md]])
(html-to-md url output-file)
```
This will read (X)HTML from `url` and write Markdown to `output-file`. If
`output-file` is not supplied, it will return the markdown as a string:
```clojure
(require '[html-to-md.core :refer [html-to-md]])
(def md (html-to-md url))
```
If you are specifically scraping [blogger.com](https://www.blogger.com/")
pages, you may *try* the following recipe:
```clojure
(require '[html-to-md.core :refer [blogger-to-md]])
(blogger-to-md url output-file)
```
It works for my blogger pages. However, I'm not sure to what extent the
skinning of blogger pages is pure CSS (in which case my recipe should work
for yours) and to what extent it's HTML templating (in which case it
probably won't). Results not guaranteed, if it doesn't work you get to
keep all the pieces.
## Extending the transformer
In principle, the transformer can transform any [HT|SG|X]ML markup into any
other, or into any textual form. To extend it to do something other than
markdown, supply a **dispatcher**. A dispatcher is essentially a function of one
argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns
a **processor,** which should be a function of two arguments, an element assumed
to have that tag, and a dispatcher. The processor should return the value that
you want elements of that tag transformed into.
Obviously it is convenient to write dispatchers as maps, but it isn't required
that you do so: anything which, given a keyword, will return a processor, will
work.
## License
Copyright © 2019 Simon Brooke <simon@journeyman.cc>
Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.

View file

@ -1,3 +1,3 @@
<!DOCTYPE html PUBLIC "" <!DOCTYPE html PUBLIC ""
""> "">
<html><head><meta charset="UTF-8" /><title>html-to-md.blogger-to-md documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch current"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-dispatcher"><div class="inner"><span>blogger-dispatcher</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-scraper"><div class="inner"><span>blogger-scraper</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-image-table-processor"><div class="inner"><span>image-table-processor</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.blogger-to-md</h1><div class="doc"><div class="markdown"><p>Convert blogger posts to Markdown format, omitting all the Blogger chrome and navigation.</p></div></div><div class="public anchor" id="var-blogger-dispatcher"><h3>blogger-dispatcher</h3><div class="usage"></div><div class="doc"><div class="markdown"><p>Adaptation of <code>markdown-dispatcher</code>, q.v., with the <code>:table</code> and <code>:html</code> dispatches overridden.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L38">view source</a></div></div><div class="public anchor" id="var-blogger-scraper"><h3>blogger-scraper</h3><div class="usage"><code>(blogger-scraper e d)</code></div><div class="doc"><div class="markdown"><p>Processor which scrapes the actual post content out of a blogger page. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L9">view source</a></div></div><div class="public anchor" id="var-image-table-processor"><h3>image-table-processor</h3><div class="usage"><code>(image-table-processor e d)</code></div><div class="doc"><div class="markdown"><p>Bloggers horrible tag soup wraps images in tables. Is this table such a table? If so extract the image from it and process it to markdown; otherwise, fall back on what <code>markdown-dispatcher</code> would do with the table (which is currently nothing, but that will change).</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L23">view source</a></div></div></div></body></html> <html><head><meta charset="UTF-8" /><title>html-to-md.blogger-to-md documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch current"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-dispatcher"><div class="inner"><span>blogger-dispatcher</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-scraper"><div class="inner"><span>blogger-scraper</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-image-table-processor"><div class="inner"><span>image-table-processor</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.blogger-to-md</h1><div class="doc"><div class="markdown"><p>Convert blogger posts to Markdown format, omitting all the Blogger chrome and navigation.</p></div></div><div class="public anchor" id="var-blogger-dispatcher"><h3>blogger-dispatcher</h3><div class="usage"></div><div class="doc"><div class="markdown"><p>Adaptation of <code>markdown-dispatcher</code>, q.v., with the <code>:table</code> and <code>:html</code> dispatches overridden.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L38">view source</a></div></div><div class="public anchor" id="var-blogger-scraper"><h3>blogger-scraper</h3><div class="usage"><code>(blogger-scraper e d)</code></div><div class="doc"><div class="markdown"><p>Processor which scrapes the actual post content out of a blogger page. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L9">view source</a></div></div><div class="public anchor" id="var-image-table-processor"><h3>image-table-processor</h3><div class="usage"><code>(image-table-processor e d)</code></div><div class="doc"><div class="markdown"><p>Bloggers horrible tag soup wraps images in tables. Is this table such a table? If so extract the image from it and process it to markdown; otherwise, fall back on what <code>markdown-dispatcher</code> would do with the table (which is currently nothing, but that will change).</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L23">view source</a></div></div></div></body></html>

View file

@ -1,3 +1,3 @@
<!DOCTYPE html PUBLIC "" <!DOCTYPE html PUBLIC ""
""> "">
<html><head><meta charset="UTF-8" /><title>html-to-md.core documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch current"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.core.html#var-blogger-to-md"><div class="inner"><span>blogger-to-md</span></div></a></li><li class="depth-1"><a href="html-to-md.core.html#var-html-to-md"><div class="inner"><span>html-to-md</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.core</h1><div class="doc"><div class="markdown"><p>Top level functions intended for very simple use.</p></div></div><div class="public anchor" id="var-blogger-to-md"><h3>blogger-to-md</h3><div class="usage"><code>(blogger-to-md url)</code><code>(blogger-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the Blogger post referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L15">view source</a></div></div><div class="public anchor" id="var-html-to-md"><h3>html-to-md</h3><div class="usage"><code>(html-to-md url)</code><code>(html-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the HTML document referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L7">view source</a></div></div></div></body></html> <html><head><meta charset="UTF-8" /><title>html-to-md.core documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch current"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.core.html#var-blogger-to-md"><div class="inner"><span>blogger-to-md</span></div></a></li><li class="depth-1"><a href="html-to-md.core.html#var-html-to-md"><div class="inner"><span>html-to-md</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.core</h1><div class="doc"><div class="markdown"><p>Top level functions intended for very simple use.</p></div></div><div class="public anchor" id="var-blogger-to-md"><h3>blogger-to-md</h3><div class="usage"><code>(blogger-to-md url)</code><code>(blogger-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the Blogger post referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L15">view source</a></div></div><div class="public anchor" id="var-html-to-md"><h3>html-to-md</h3><div class="usage"><code>(html-to-md url)</code><code>(html-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the HTML document referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L7">view source</a></div></div></div></body></html>

File diff suppressed because one or more lines are too long

View file

@ -1,6 +1,6 @@
<!DOCTYPE html PUBLIC "" <!DOCTYPE html PUBLIC ""
""> "">
<html><head><meta charset="UTF-8" /><title>html-to-md.transformer documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2 current"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.transformer.html#var-process"><div class="inner"><span>process</span></div></a></li><li class="depth-1"><a href="html-to-md.transformer.html#var-transform"><div class="inner"><span>transform</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.transformer</h1><div class="doc"><div class="markdown"><p>The actual transformation engine, which is actually far more general than just something to generate <a href="https://daringfireball.net/projects/markdown/">Markdown</a>. It isnt as general as <a href="https://www.w3.org/standards/xml/transformation">XSL-T</a> but can nevertheless do a great deal of transformation on [HT|SG|X]ML documents.</p> <html><head><meta charset="UTF-8" /><title>html-to-md.transformer documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2 current"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.transformer.html#var-process"><div class="inner"><span>process</span></div></a></li><li class="depth-1"><a href="html-to-md.transformer.html#var-transform"><div class="inner"><span>transform</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.transformer</h1><div class="doc"><div class="markdown"><p>The actual transformation engine, which is actually far more general than just something to generate <a href="https://daringfireball.net/projects/markdown/">Markdown</a>. It isnt as general as <a href="https://www.w3.org/standards/xml/transformation">XSL-T</a> but can nevertheless do a great deal of transformation on [HT|SG|X]ML documents.</p>
<h2><a href="#terminology" name="terminology"></a>Terminology</h2> <h2><a href="#terminology" name="terminology"></a>Terminology</h2>
<p>In this documentation the following terminology is used:</p> <p>In this documentation the following terminology is used:</p>
<ul> <ul>

File diff suppressed because one or more lines are too long

View file

@ -1,11 +1,11 @@
<!DOCTYPE html PUBLIC "" <!DOCTYPE html PUBLIC ""
""> "">
<html><head><meta charset="UTF-8" /><title>Introduction to html-to-md</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 current"><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#introduction-to-html-to-md" name="introduction-to-html-to-md"></a>Introduction to html-to-md</h1> <html><head><meta charset="UTF-8" /><title>Introduction to html-to-md</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 current"><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#introduction-to-html-to-md" name="introduction-to-html-to-md"></a>Introduction to html-to-md</h1>
<p>The itch Im trying to scratch at present is to transform <a href="http://www.blogger.com">Blogger.com</a>s dreadful tag-soup markup into markdown; but my architecture for doing this is to build a completely general [HT|SG|X]ML transformation framework and then specialise it.</p> <p>The itch Im trying to scratch at present is to transform <a href="http://www.blogger.com">Blogger.com</a>s dreadful tag-soup markup into markdown; but my architecture for doing this is to build a completely general [HT|SG|X]ML transformation framework and then specialise it.</p>
<p><strong>WARNING:</strong> this is presently alpha-quality code, although it does have fair unit test coverage.</p> <p><strong>WARNING:</strong> this is presently alpha-quality code, although it does have fair unit test coverage.</p>
<h2><a href="#usage" name="usage"></a>Usage</h2> <h2><a href="#usage" name="usage"></a>Usage</h2>
<p>To use this library in your project, add the following leiningen dependency:</p> <p>To use this library in your project, add the following leiningen dependency:</p>
<pre><code>[org.clojars.simon_brooke/html-to-md "0.2.0"] <pre><code>[org.clojars.simon_brooke/html-to-md "0.3.0"]
</code></pre> </code></pre>
<p>To use it in your namespace, require:</p> <p>To use it in your namespace, require:</p>
<pre><code>[html-to-md.core :refer [html-to-md]] <pre><code>[html-to-md.core :refer [html-to-md]]

View file

@ -1,4 +1,4 @@
(defproject html-to-md "0.4.0-SNAPSHOT" (defproject html-to-md "0.3.0"
:description "Convert (Enlivened) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation." :description "Convert (Enlivened) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation."
:url "https://github.com/simon-brooke/html-to-md" :url "https://github.com/simon-brooke/html-to-md"
:license {:name "Eclipse Public License" :license {:name "Eclipse Public License"

View file

@ -93,4 +93,6 @@
(if url (transform url dispatcher) (if url (transform url dispatcher)
;; otherwise, if s is not a URL, consider it as an HTML fragment, ;; otherwise, if s is not a URL, consider it as an HTML fragment,
;; parse and process it ;; parse and process it
(process (tagsoup/parser (java.io.StringReader. s)) dispatcher)))) (process (tagsoup/parser (java.io.StringReader s)) dispatcher)
)))

View file

@ -1,10 +0,0 @@
(ns html-to-md.transformer-test
(:require
[clojure.test :as t :refer [deftest is testing]]
[html-to-md.html-to-md :refer [markdown-dispatcher]]
[html-to-md.transformer :refer [transform]]))
(deftest transform-payload
(testing "String `obj` for: 3. A string representation of an (X)HTML fragment;"
(is (= '("\n# This is a header\n")
(transform "<h1>This is a header</h1>" markdown-dispatcher)))))