html-to-md/docs/intro.html
Simon Brooke e066c033be Deliberately added generated documentation to the repo
To see if I can make documentation pages work on github.
2019-05-01 14:02:24 +01:00

79 lines
6.5 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>Introduction to html-to-md</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="css/highlight.css" /><script type="text/javascript" src="js/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a></h2><h1><a href="index.html"><span class="project-title"><span class="project-name">Html-to-md</span> <span class="project-version">0.2.0</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 current"><a href="intro.html"><div class="inner"><span>Introduction to html-to-md</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></div></li><li class="depth-2 branch"><a href="html-to-md.blogger-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>blogger-to-md</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.core.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>core</span></div></a></li><li class="depth-2 branch"><a href="html-to-md.html-to-md.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>html-to-md</span></div></a></li><li class="depth-2"><a href="html-to-md.transformer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>transformer</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#introduction-to-html-to-md" name="introduction-to-html-to-md"></a>Introduction to html-to-md</h1>
<p>The itch Im trying to scratch at present is to transform <a href="http://www.blogger.com">Blogger.com</a>s dreadful tag-soup markup into markdown; but my architecture for doing this is to build a completely general [HT|SG|X]ML transformation framework and then specialise it.</p>
<p><strong>WARNING:</strong> this is presently alpha-quality code, although it does have fair unit test coverage.</p>
<h2><a href="#usage" name="usage"></a>Usage</h2>
<p>To use this library in your project, add the following leiningen dependency:</p>
<pre><code>[org.clojars.simon_brooke/html-to-md "0.2.0"]
</code></pre>
<p>To use it in your namespace, require:</p>
<pre><code>[html-to-md.core :refer [html-to-md]]
</code></pre>
<p>For default usage, thats all you need. To play more sophisticated tricks, consider:</p>
<pre><code>[html-to-md.transformer :refer [transform process]]
[html-to-md.html-to-md :refer [markdown-dispatcher]]
</code></pre>
<p>The intended usage is as follows:</p>
<pre><code class="clojure">(require '[html-to-md.core :refer [html-to-md]])
(html-to-md url output-file)
</code></pre>
<p>This will read (X)HTML from <code>url</code> and write Markdown to <code>output-file</code>. If <code>output-file</code> is not supplied, it will return the markdown as a string:</p>
<pre><code class="clojure">(require '[html-to-md.core :refer [html-to-md]])
(def md (html-to-md url))
</code></pre>
<p>If you are specifically scraping <a href="https://www.blogger.com/" "="">blogger.com</a> pages, you may <em>try</em> the following recipe:</p>
<pre><code class="clojure">(require '[html-to-md.core :refer [blogger-to-md]])
(blogger-to-md url output-file)
</code></pre>
<p>It works for my blogger pages. However, Im not sure to what extent the skinning of blogger pages is pure CSS (in which case my recipe should work for yours) and to what extent its HTML templating (in which case it probably wont). Results not guaranteed, if it doesnt work you get to keep all the pieces.</p>
<h2><a href="#extending-the-transformer" name="extending-the-transformer"></a>Extending the transformer</h2>
<p>In principle, the transformer can transform any [HT|SG|X]ML markup into any other, or into any textual form. To extend it to do something other than markdown, supply a <strong>dispatcher</strong>. A dispatcher is essentially a function of one argument, a [HT|SG|X]ML tag represented as a Clojure keyword, which returns a <strong>processor,</strong> which should be a function of two arguments, an element assumed to have that tag, and a dispatcher. The processor should return the value that you want elements of that tag transformed into.</p>
<p>Thus the <code>html-to-md.html-to-md</code> namespace comprises a number of <em>processor</em> functions, such as this one:</p>
<pre><code class="clojure">(defn markdown-a
"Process the anchor element `e` into markdown, using dispatcher `d`."
[e d]
(str
"["
(s/trim (apply str (process (:content e) d)))
"]("
(-&gt; e :attrs :href)
")"))
</code></pre>
<p>and a <em>dispatcher</em> map:</p>
<pre><code class="clojure">(def markdown-dispatcher
"A despatcher for transforming (X)HTML into Markdown."
{:a markdown-a
:b markdown-strong
:br markdown-br
:code markdown-code
:body markdown-default
:div markdown-div
:em markdown-em
:h1 markdown-h1
:h2 markdown-h2
:h3 markdown-h3
:h4 markdown-h4
:h5 markdown-h5
:h6 markdown-h6
:html markdown-html
:i markdown-em
:img markdown-img
:ol markdown-ol
:p markdown-div
:pre markdown-pre
:samp markdown-code
:script markdown-omit
:span markdown-default
:strong markdown-strong
:style markdown-omit
:ul markdown-ul
})
</code></pre>
<p>Obviously it is convenient to write dispatchers as maps, but it isnt required that you do so: anything which, given a keyword, will return a processor, will work.</p>
<h2><a href="#license" name="license"></a>License</h2>
<p>Copyright © 2019 Simon Brooke <a href="mailto:simon@journeyman.cc">simon@journeyman.cc</a></p>
<p>Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.</p></div></div></div></body></html>