79 lines
6.5 KiB
HTML
79 lines
6.5 KiB
HTML
<!DOCTYPE html PUBLIC ""
|
||
"">
|
||
<html><head><meta charset="UTF-8" /><title>Introduction to html-to-md</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.3.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 current"><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#introduction-to-html-to-md" name="introduction-to-html-to-md"></a>Introduction to html-to-md</h1>
|
||
<p>The itch I’m trying to scratch at present is to transform <a href="http://www.blogger.com">Blogger.com</a>’s dreadful tag-soup markup into markdown; but my architecture for doing this is to build a completely general [HT|SG|X]ML transformation framework and then specialise it.</p>
|
||
<p><strong>WARNING:</strong> this is presently alpha-quality code, although it does have fair unit test coverage.</p>
|
||
<h2><a href="#usage" name="usage"></a>Usage</h2>
|
||
<p>To use this library in your project, add the following leiningen dependency:</p>
|
||
<pre><code>[org.clojars.simon_brooke/html-to-md "0.3.0"]
|
||
</code></pre>
|
||
<p>To use it in your namespace, require:</p>
|
||
<pre><code>[html-to-md.core :refer [html-to-md]]
|
||
</code></pre>
|
||
<p>For default usage, that’s all you need. To play more sophisticated tricks, consider:</p>
|
||
<pre><code>[html-to-md.transformer :refer [transform process]]
|
||
[html-to-md.html-to-md :refer [markdown-dispatcher]]
|
||
</code></pre>
|
||
<p>The intended usage is as follows:</p>
|
||
<pre><code class="clojure">(require '[html-to-md.core :refer [html-to-md]])
|
||
|
||
(html-to-md url output-file)
|
||
</code></pre>
|
||
<p>This will read (X)HTML from <code>url</code> and write Markdown to <code>output-file</code>. If <code>output-file</code> is not supplied, it will return the markdown as a string:</p>
|
||
<pre><code class="clojure">(require '[html-to-md.core :refer [html-to-md]])
|
||
|
||
(def md (html-to-md url))
|
||
</code></pre>
|
||
<p>If you are specifically scraping <a href="https://www.blogger.com/" "="">blogger.com</a> pages, you may <em>try</em> the following recipe:</p>
|
||
<pre><code class="clojure">(require '[html-to-md.core :refer [blogger-to-md]])
|
||
|
||
(blogger-to-md url output-file)
|
||
</code></pre>
|
||
<p>It works for my blogger pages. However, I’m not sure to what extent the skinning of blogger pages is pure CSS (in which case my recipe should work for yours) and to what extent it’s HTML templating (in which case it probably won’t). Results not guaranteed, if it doesn’t work you get to keep all the pieces.</p>
|
||
<h2><a href="#extending-the-transformer" name="extending-the-transformer"></a>Extending the transformer</h2>
|
||
<p>In principle, the transformer can transform any [HT|SG|X]ML markup into any other, or into any textual form. To extend it to do something other than markdown, supply a <strong>dispatcher</strong>. A dispatcher is essentially a function of one argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns a <strong>processor,</strong> which should be a function of two arguments, an element assumed to have that tag, and a dispatcher. The processor should return the value that you want elements of that tag transformed into.</p>
|
||
<p>Thus the <code>html-to-md.html-to-md</code> namespace comprises a number of <em>processor</em> functions, such as this one:</p>
|
||
<pre><code class="clojure">(defn markdown-a
|
||
"Process the anchor element `e` into markdown, using dispatcher `d`."
|
||
[e d]
|
||
(str
|
||
"["
|
||
(s/trim (apply str (process (:content e) d)))
|
||
"]("
|
||
(-> e :attrs :href)
|
||
")"))
|
||
</code></pre>
|
||
<p>and a <em>dispatcher</em> map:</p>
|
||
<pre><code class="clojure">(def markdown-dispatcher
|
||
"A despatcher for transforming (X)HTML into Markdown."
|
||
{:a markdown-a
|
||
:b markdown-strong
|
||
:br markdown-br
|
||
:code markdown-code
|
||
:body markdown-default
|
||
:div markdown-div
|
||
:em markdown-em
|
||
:h1 markdown-h1
|
||
:h2 markdown-h2
|
||
:h3 markdown-h3
|
||
:h4 markdown-h4
|
||
:h5 markdown-h5
|
||
:h6 markdown-h6
|
||
:html markdown-html
|
||
:i markdown-em
|
||
:img markdown-img
|
||
:ol markdown-ol
|
||
:p markdown-div
|
||
:pre markdown-pre
|
||
:samp markdown-code
|
||
:script markdown-omit
|
||
:span markdown-default
|
||
:strong markdown-strong
|
||
:style markdown-omit
|
||
:ul markdown-ul
|
||
})
|
||
</code></pre>
|
||
<p>Obviously it is convenient to write dispatchers as maps, but it isn’t required that you do so: anything which, given a keyword, will return a processor, will work.</p>
|
||
<h2><a href="#license" name="license"></a>License</h2>
|
||
<p>Copyright © 2019 Simon Brooke <a href="mailto:simon@journeyman.cc">simon@journeyman.cc</a></p>
|
||
<p>Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.</p></div></div></div></body></html> |