Implementing a data model to prevent common errors
Asked Answered
O

2

7

There seem to be multiple ways to implement data models in Clojure:

  • ordinary built-in datatypes (maps/lists/sets/vectors)
  • built-in datatypes + meta-data -- for example: (type ^{:type ::mytype} {:fieldname 1})
  • built-in datatypes + special accessor functions (for instance, getting a non-existent key from a map throws an exception, instead of silently returning nil)
  • deftype
  • defstruct
  • defrecord
  • defprotocol

We've reached the point where maps/lists are no longer working well for us -- we run into lots of errors that pre-conditions/post-conditions could easily catch, but take a very long time to hunt down otherwise (and it's hard to write effective pre/post-conditions for functions that accept nested maps/lists/vectors) -- but we're not sure which of the above to choose from.

We have three major goals:

  • write idiomatic Clojure code
  • avoid spending large amounts of time hunting down stupid type errors
  • have confidence in our ability to change/refactor code with silently breaking anything

How can we harness the power of Clojure to help us?

Olympiaolympiad answered 26/10, 2011 at 16:10 Comment(0)
A
4

Clojure culture is strongly supportive of the raw data types. Justifiably so. But explicit types can be useful. When your plain datatypes get sufficiently complex and specific, you essentially have an implicit dataype without the specification.

Rely on constructors. This sounds a bit dirty, in an OOP kind of way, but a constructor is nothing more than a function that creates your data type safely and conveniently. A drawback of plain data structures is that they encourage creating the data on the fly. So, instead of calling (myconstructor ...), I attempt to compose my data directly. And with much potential for error, as well as problems if I need to change the underlying data type.

Records are the sweet spot. With all the fuss about raw data types, it's easy to forget that records do a lot of things that maps can do. They can be accessed the same way. You can call seq on them. You can destructure them the same way. The vast majority of functions that expect a map will accept a record as well.

Meta data will not save you. My main objection to relying on meta data is that it isn't reflected in equality.

user> (= (with-meta [1 2 3] {:type :A})  (with-meta [1 2 3] {:type :B}))
true

Whether that's acceptable or not is up to you, but I'd worry about this introducing new subtle bugs.


The other dataype options:

  • deftype is only for low level work in creating new basic or special purpose data structures. Unlike defrecord, it doesn't bring all of the clojure goodness along with it. For most work, it isn't necessary or adviseable.
  • defstruct should be deprecated. When Rich Hickey introduced types and protocols, he essentially said that defrecord should be preferred evermore.

Protocols are very useful, even though they feel like a bit of a departure from the (functions + data) paradigm. If you find yourself creating records, you should consider defining protocols as well.

EDIT: I discovered another advantage to plain datatypes that hadn't been apparent to me earlier: if you're doing web programming, the plain datatypes can be converted to and from JSON efficiently and easily. (Libraries for doing this include clojure.data.json, clj-json, and my favourite, cheshire). With records and datatypes, the task is considerably more annoying.

Arboretum answered 27/10, 2011 at 18:58 Comment(6)
Okay, so I need to figure out defrecord and defprotocol, can ignore defstruct, and don't have to worry too much about deftype. Does it matter to a clojure program, that defrecord creates java code -- in the sense that I don't want to worry about having a java class, but if clojure wants to use one privately, that's fine? Great answer, very hepful.Olympiaolympiad
Well, a plain map is a java class as well, as you can see at github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/…Arboretum
So it's not really a huge deal that defrecord creates a java class. From our point of view using the record, it won't make much of a difference -- except for construction, it will feel entirely like clojure data.Arboretum
And yeah, definitely read up on defrecord defprotocol and extend protocol. The logic behind them is quite well thought out, IMO.Arboretum
My concern is I don't know exactly what this means: Dynamically generates compiled bytecode for class with the given name, in a package with the same name as the current namespace, the given fields, and, optionally, methods for protocols and/or interfaces. (from the defrecord docs)Olympiaolympiad
It means that you can call it from java if you need to. Records are nice in that they can be first class citizens in both clojure and java. (Compiled bytecode isn't any different from everything else you write in clojure -- it all gets compiled to bytecode. The rest is just relevant to knowing what to call if you are calling from java.)Arboretum
H
1

It's really convienient to be able to compose functions that work on maps and lists, and it would be something of a shame to loose that by switching to classes and protocols. after all it is better to have one hundred functions on one type. Switching to protocols or records would be a little heavy handed. It would prevent you from (debug (map :rows (get-state)) while debugging for instance.

meta data is a great way to add "just enough type" to make your data safer in the places that need it with out loosing the benefits in the rest of your codebase. I would reccommend going with option 2

  • 'built-in datatypes + meta-data ((type ^{:type ::mytype} {:fieldname 1}))'
Horodko answered 26/10, 2011 at 18:53 Comment(2)
Isn't that an Alan Perlis-ism about one hundred functions on one datatype? I reformatted that example in the title -- accidentally put it in non-LISP parentheses!Olympiaolympiad
yep, thats the quote I was going for, Credit where credit is due!Horodko

© 2022 - 2024 — McMap. All rights reserved.