Is there a parser for html to hiccup structures?
Asked Answered
V

4

8

I'm looking for a function that reverses clojure hiccup

so

   <html></html>

turns into

[:html]

etc.


Following up from the answer by @kotarak, This now works for me:

(use 'net.cgrand.enlive-html)
(import 'java.io.StringReader)

(defn enlive->hiccup
   [el]
   (if-not (string? el)
     (->> (map enlive->hiccup (:content el))
       (concat [(:tag el) (:attrs el)])
       (keep identity)
       vec)
     el))

(defn html->enlive 
  [html]
  (first (html-resource (StringReader. html))))

(defn html->hiccup [html]
  (-> html
      html->enlive
      enlive->hiccup))

=> (html->hiccup "<html><body id='foo'>hello</body></html>")
[:html [:body {:id "foo"} "hello"]]
Viveca answered 19/6, 2012 at 5:24 Comment(3)
For example... if I was working with a designer who gave me a bunch of html files... i would have to 'translate' it by hand... most web tooling in general don't output hiccup structures and its a hassle to do anything with the html output if i'm working with hiccup... this way I can put it in the 'translator' and get the code i need.Viveca
@Viveca Heretic question: why don't you use enlive, then?Moulton
@Moulton Its a preference and a workflow thing... Essentially, I found that my brain's not fast enough to switch back and forth between html and clojure when I'm tweaking stuff. All my views and templates readily accessible in one big file to cut/paste/insert - instead of splitting off into html and code. And its nice to work with in clojurescript with the hiccup equivalent - crate.Viveca
M
8

You could html-resource from enlive to get a structure like this:

{:tag :html :attrs {} :content []}

Then traverse this and turn it into a hiccup structure.

(defn html->hiccup
   [html]
   (if-not (string? html)
     (->> (map html->hiccup (:content html))
       (concat [(:tag html) (:attrs html)])
       (keep identity)
       vec)
     html))

Here a usage example:

user=>  (html->hiccup {:tag     :p
                       :content ["Hello" {:tag     :a
                                          :attrs   {:href "/foo"}
                                          :content ["World"]}
                                 "!"]})
[:p "Hello" [:a {:href "/foo"} "World"] "!"]
Moulton answered 19/6, 2012 at 5:42 Comment(3)
thanks! I tried looking at enlive before but was confused by the fact that it takes a file as input. Is there anyway to input a string in enlive as opposed to a resource?Viveca
I'd expect you can define a simple helper function: (defn str-resource [s] (html-resource (StringReader. s))). Not tested.Moulton
There are better answers now that libraries exist; see belowCounterbalance
S
6

There is a page on the Hiccup Github Wiki:

https://github.com/weavejester/hiccup/wiki/Converting-html-to-hiccup

which links to three solutions:

https://github.com/davidsantiago/hickory

https://github.com/nathell/clj-tagsoup

https://github.com/hozumi/hiccup-bridge

(Oddly, I found this question and that wiki page in the same search just now... and I was the most recent editor of that Wiki page, 2 years ago.)

Stebbins answered 24/9, 2014 at 1:0 Comment(0)
M
3

There is now Hickory which does this: https://github.com/davidsantiago/hickory

Molybdous answered 7/3, 2014 at 10:46 Comment(0)
D
0

There is this snippet of code that I wrote, which (unlike hickory) runs truly cross platform without relying on the browser:

(ns hiccdown.html
  (:require [clojure.edn :as edn]
            [instaparse.core :as insta :refer [defparser]]))

(defparser html-parser "
  nodes = node*
  <node> = text | open-close-tags | self-closing-tag
  open-close-tags = opening-tag nodes closing-tag
  opening-tag = <'<'> <spaces>? tag-name attributes? <spaces>? <'>'>
  closing-tag = <'</'> tag-name <'>'>
  self-closing-tag = <'<'> <spaces>? tag-name attributes? <spaces>? <'/>'>
  tag-name = #'[^ </>]+'
  attributes = (<spaces> attribute)+
  attribute = attribute-name (<'='> attribute-value)?
  <attribute-name> = #'[^ \t=]+'
  <attribute-value> = #'[^ \t]+' | #'\"[^\"]*\"'
  <text> = #'[^<]+'
  spaces = #'[ \t]+'
")

(defn html->hiccup [html-str]
  (->> (html-parser html-str)
       (insta/transform {:nodes            (fn [& nodes] nodes)
                         :open-close-tags  (fn [opening-tag nodes _closing-tag]
                                             (into opening-tag nodes))
                         :opening-tag      (fn ([tag-name] [tag-name])
                                               ([tag-name attributes] [tag-name attributes]))
                         :self-closing-tag (fn ([tag-name] [tag-name])
                                               ([tag-name attributes] [tag-name attributes]))
                         :tag-name         keyword
                         :attributes       (fn [& attributes]
                                             (into {} attributes))
                         :attribute        (fn ([attribute-name]
                                                [(keyword attribute-name) true])
                                               ([attribute-name attribute-value]
                                                [(keyword attribute-name) (edn/read-string attribute-value)]))})))

Update: This snippet became a Clojure lib.

Doyen answered 14/3, 2022 at 4:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.