Compare commits
4 commits
Author | SHA1 | Date | |
---|---|---|---|
|
cb99663861 | ||
|
b00a4e4890 | ||
|
97351feafe | ||
|
30d0cbfeca |
81
README.md
81
README.md
|
@ -4,9 +4,82 @@ A Clojure library designed to convert
|
|||
([Enlive](https://github.com/cgrand/enlive)ned) HTML to markdown; but, more
|
||||
generally, a framework for [HT|SG|X]ML transformation.
|
||||
|
||||
[Documentation is here](https://simon-brooke.github.io/html-to-md/). In
|
||||
particular, please read the
|
||||
[introduction](https://simon-brooke.github.io/html-to-md/intro.html), which
|
||||
contains everything you want to know.
|
||||
## Introduction
|
||||
|
||||
The itch I'm trying to scratch at present is to transform
|
||||
[Blogger.com](http://www.blogger.com)'s dreadful tag-soup markup into markdown;
|
||||
but my architecture for doing this is to build a completely general [HT|SG|X]ML
|
||||
transformation framework and then specialise it.
|
||||
|
||||
**WARNING:** this is presently alpha-quality code, although it does have fair
|
||||
unit test coverage.
|
||||
|
||||
## Usage
|
||||
|
||||
To use this library in your project, add the following leiningen dependency:
|
||||
|
||||
[org.clojars.simon_brooke/html-to-md "0.3.0"]
|
||||
|
||||
To use it in your namespace, require:
|
||||
|
||||
[html-to-md.core :refer [html-to-md]]
|
||||
|
||||
For default usage, that's all you need. To play more sophisticated tricks,
|
||||
consider:
|
||||
|
||||
[html-to-md.transformer :refer [transform process]]
|
||||
[html-to-md.html-to-md :refer [markdown-dispatcher]]
|
||||
|
||||
The intended usage is as follows:
|
||||
|
||||
```clojure
|
||||
(require '[html-to-md.core :refer [html-to-md]])
|
||||
|
||||
(html-to-md url output-file)
|
||||
```
|
||||
|
||||
This will read (X)HTML from `url` and write Markdown to `output-file`. If
|
||||
`output-file` is not supplied, it will return the markdown as a string:
|
||||
|
||||
```clojure
|
||||
(require '[html-to-md.core :refer [html-to-md]])
|
||||
|
||||
(def md (html-to-md url))
|
||||
```
|
||||
|
||||
If you are specifically scraping [blogger.com](https://www.blogger.com/")
|
||||
pages, you may *try* the following recipe:
|
||||
|
||||
```clojure
|
||||
(require '[html-to-md.core :refer [blogger-to-md]])
|
||||
|
||||
(blogger-to-md url output-file)
|
||||
```
|
||||
|
||||
It works for my blogger pages. However, I'm not sure to what extent the
|
||||
skinning of blogger pages is pure CSS (in which case my recipe should work
|
||||
for yours) and to what extent it's HTML templating (in which case it
|
||||
probably won't). Results not guaranteed, if it doesn't work you get to
|
||||
keep all the pieces.
|
||||
|
||||
## Extending the transformer
|
||||
|
||||
In principle, the transformer can transform any [HT|SG|X]ML markup into any
|
||||
other, or into any textual form. To extend it to do something other than
|
||||
markdown, supply a **dispatcher**. A dispatcher is essentially a function of one
|
||||
argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns
|
||||
a **processor,** which should be a function of two arguments, an element assumed
|
||||
to have that tag, and a dispatcher. The processor should return the value that
|
||||
you want elements of that tag transformed into.
|
||||
|
||||
Obviously it is convenient to write dispatchers as maps, but it isn't required
|
||||
that you do so: anything which, given a keyword, will return a processor, will
|
||||
work.
|
||||
|
||||
## License
|
||||
|
||||
Copyright © 2019 Simon Brooke <simon@journeyman.cc>
|
||||
|
||||
Distributed under the Eclipse Public License either version 1.0 or (at
|
||||
your option) any later version.
|
||||
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
<!DOCTYPE html PUBLIC ""
|
||||
"">
|
||||
<html><head><meta charset="UTF-8" /><title>html-to-md.blogger-to-md documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch current"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-dispatcher"><div class="inner"><span>blogger-dispatcher</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-scraper"><div class="inner"><span>blogger-scraper</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-image-table-processor"><div class="inner"><span>image-table-processor</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.blogger-to-md</h1><div class="doc"><div class="markdown"><p>Convert blogger posts to Markdown format, omitting all the Blogger chrome and navigation.</p></div></div><div class="public anchor" id="var-blogger-dispatcher"><h3>blogger-dispatcher</h3><div class="usage"></div><div class="doc"><div class="markdown"><p>Adaptation of <code>markdown-dispatcher</code>, q.v., with the <code>:table</code> and <code>:html</code> dispatches overridden.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L38">view source</a></div></div><div class="public anchor" id="var-blogger-scraper"><h3>blogger-scraper</h3><div class="usage"><code>(blogger-scraper e d)</code></div><div class="doc"><div class="markdown"><p>Processor which scrapes the actual post content out of a blogger page. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L9">view source</a></div></div><div class="public anchor" id="var-image-table-processor"><h3>image-table-processor</h3><div class="usage"><code>(image-table-processor e d)</code></div><div class="doc"><div class="markdown"><p>Blogger’s horrible tag soup wraps images in tables. Is this table such a table? If so extract the image from it and process it to markdown; otherwise, fall back on what <code>markdown-dispatcher</code> would do with the table (which is currently nothing, but that will change).</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L23">view source</a></div></div></div></body></html>
|
||||
<html><head><meta charset="UTF-8" /><title>html-to-md.blogger-to-md documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch current"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-dispatcher"><div class="inner"><span>blogger-dispatcher</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-blogger-scraper"><div class="inner"><span>blogger-scraper</span></div></a></li><li class="depth-1"><a href="html-to-md.blogger-to-md.html#var-image-table-processor"><div class="inner"><span>image-table-processor</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.blogger-to-md</h1><div class="doc"><div class="markdown"><p>Convert blogger posts to Markdown format, omitting all the Blogger chrome and navigation.</p></div></div><div class="public anchor" id="var-blogger-dispatcher"><h3>blogger-dispatcher</h3><div class="usage"></div><div class="doc"><div class="markdown"><p>Adaptation of <code>markdown-dispatcher</code>, q.v., with the <code>:table</code> and <code>:html</code> dispatches overridden.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L38">view source</a></div></div><div class="public anchor" id="var-blogger-scraper"><h3>blogger-scraper</h3><div class="usage"><code>(blogger-scraper e d)</code></div><div class="doc"><div class="markdown"><p>Processor which scrapes the actual post content out of a blogger page. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L9">view source</a></div></div><div class="public anchor" id="var-image-table-processor"><h3>image-table-processor</h3><div class="usage"><code>(image-table-processor e d)</code></div><div class="doc"><div class="markdown"><p>Blogger’s horrible tag soup wraps images in tables. Is this table such a table? If so extract the image from it and process it to markdown; otherwise, fall back on what <code>markdown-dispatcher</code> would do with the table (which is currently nothing, but that will change).</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/blogger_to_md.clj#L23">view source</a></div></div></div></body></html>
|
|
@ -1,3 +1,3 @@
|
|||
<!DOCTYPE html PUBLIC ""
|
||||
"">
|
||||
<html><head><meta charset="UTF-8" /><title>html-to-md.core documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch current"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.core.html#var-blogger-to-md"><div class="inner"><span>blogger-to-md</span></div></a></li><li class="depth-1"><a href="html-to-md.core.html#var-html-to-md"><div class="inner"><span>html-to-md</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.core</h1><div class="doc"><div class="markdown"><p>Top level functions intended for very simple use.</p></div></div><div class="public anchor" id="var-blogger-to-md"><h3>blogger-to-md</h3><div class="usage"><code>(blogger-to-md url)</code><code>(blogger-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the Blogger post referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L15">view source</a></div></div><div class="public anchor" id="var-html-to-md"><h3>html-to-md</h3><div class="usage"><code>(html-to-md url)</code><code>(html-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the HTML document referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L7">view source</a></div></div></div></body></html>
|
||||
<html><head><meta charset="UTF-8" /><title>html-to-md.core documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch current"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.core.html#var-blogger-to-md"><div class="inner"><span>blogger-to-md</span></div></a></li><li class="depth-1"><a href="html-to-md.core.html#var-html-to-md"><div class="inner"><span>html-to-md</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.core</h1><div class="doc"><div class="markdown"><p>Top level functions intended for very simple use.</p></div></div><div class="public anchor" id="var-blogger-to-md"><h3>blogger-to-md</h3><div class="usage"><code>(blogger-to-md url)</code><code>(blogger-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the Blogger post referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied. <em>NOTE:</em> This was written to scrape <em>my</em> blogger pages, yours may be different!</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L15">view source</a></div></div><div class="public anchor" id="var-html-to-md"><h3>html-to-md</h3><div class="usage"><code>(html-to-md url)</code><code>(html-to-md url output)</code></div><div class="doc"><div class="markdown"><p>Transform the HTML document referenced by <code>url</code> into Markdown, and write it to <code>output</code>, if supplied.</p></div></div><div class="src-link"><a href="https://github.com/simon-brooke/html-to-md/blob/master/src/html_to_md/core.clj#L7">view source</a></div></div></div></body></html>
|
File diff suppressed because one or more lines are too long
|
@ -1,6 +1,6 @@
|
|||
<!DOCTYPE html PUBLIC ""
|
||||
"">
|
||||
<html><head><meta charset="UTF-8" /><title>html-to-md.transformer documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2 current"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.transformer.html#var-process"><div class="inner"><span>process</span></div></a></li><li class="depth-1"><a href="html-to-md.transformer.html#var-transform"><div class="inner"><span>transform</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.transformer</h1><div class="doc"><div class="markdown"><p>The actual transformation engine, which is actually far more general than just something to generate <a href="https://daringfireball.net/projects/markdown/">Markdown</a>. It isn’t as general as <a href="https://www.w3.org/standards/xml/transformation">XSL-T</a> but can nevertheless do a great deal of transformation on [HT|SG|X]ML documents.</p>
|
||||
<html><head><meta charset="UTF-8" /><title>html-to-md.transformer documentation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2 current"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="html-to-md.transformer.html#var-process"><div class="inner"><span>process</span></div></a></li><li class="depth-1"><a href="html-to-md.transformer.html#var-transform"><div class="inner"><span>transform</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">html-to-md.transformer</h1><div class="doc"><div class="markdown"><p>The actual transformation engine, which is actually far more general than just something to generate <a href="https://daringfireball.net/projects/markdown/">Markdown</a>. It isn’t as general as <a href="https://www.w3.org/standards/xml/transformation">XSL-T</a> but can nevertheless do a great deal of transformation on [HT|SG|X]ML documents.</p>
|
||||
<h2><a href="#terminology" name="terminology"></a>Terminology</h2>
|
||||
<p>In this documentation the following terminology is used:</p>
|
||||
<ul>
|
||||
|
|
File diff suppressed because one or more lines are too long
|
@ -1,11 +1,11 @@
|
|||
<!DOCTYPE html PUBLIC ""
|
||||
"">
|
||||
<html><head><meta charset="UTF-8" /><title>Introduction to html-to-md</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 current"><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#introduction-to-html-to-md" name="introduction-to-html-to-md"></a>Introduction to html-to-md</h1>
|
||||
<html><head><meta charset="UTF-8" /><title>Introduction to html-to-md</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 current"><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#introduction-to-html-to-md" name="introduction-to-html-to-md"></a>Introduction to html-to-md</h1>
|
||||
<p>The itch I’m trying to scratch at present is to transform <a href="http://www.blogger.com">Blogger.com</a>’s dreadful tag-soup markup into markdown; but my architecture for doing this is to build a completely general [HT|SG|X]ML transformation framework and then specialise it.</p>
|
||||
<p><strong>WARNING:</strong> this is presently alpha-quality code, although it does have fair unit test coverage.</p>
|
||||
<h2><a href="#usage" name="usage"></a>Usage</h2>
|
||||
<p>To use this library in your project, add the following leiningen dependency:</p>
|
||||
<pre><code>[org.clojars.simon_brooke/html-to-md "0.2.0"]
|
||||
<pre><code>[org.clojars.simon_brooke/html-to-md "0.3.0"]
|
||||
</code></pre>
|
||||
<p>To use it in your namespace, require:</p>
|
||||
<pre><code>[html-to-md.core :refer [html-to-md]]
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
(defproject html-to-md "0.4.0-SNAPSHOT"
|
||||
(defproject html-to-md "0.3.0"
|
||||
:description "Convert (Enlivened) HTML to markdown; but, more generally, a framework for [HT|SG|X]ML transformation."
|
||||
:url "https://github.com/simon-brooke/html-to-md"
|
||||
:license {:name "Eclipse Public License"
|
||||
|
|
|
@ -93,4 +93,6 @@
|
|||
(if url (transform url dispatcher)
|
||||
;; otherwise, if s is not a URL, consider it as an HTML fragment,
|
||||
;; parse and process it
|
||||
(process (tagsoup/parser (java.io.StringReader. s)) dispatcher))))
|
||||
(process (tagsoup/parser (java.io.StringReader s)) dispatcher)
|
||||
)))
|
||||
|
||||
|
|
|
@ -1,10 +0,0 @@
|
|||
(ns html-to-md.transformer-test
|
||||
(:require
|
||||
[clojure.test :as t :refer [deftest is testing]]
|
||||
[html-to-md.html-to-md :refer [markdown-dispatcher]]
|
||||
[html-to-md.transformer :refer [transform]]))
|
||||
|
||||
(deftest transform-payload
|
||||
(testing "String `obj` for: 3. A string representation of an (X)HTML fragment;"
|
||||
(is (= '("\n# This is a header\n")
|
||||
(transform "<h1>This is a header</h1>" markdown-dispatcher)))))
|
Loading…
Reference in a new issue