Handling incremental Data Modeling Changes in Functional Programming
Asked Answered
C

5

18

Most of the problems I have to solve in my job as a developer have to do with data modeling. For example in a OOP Web Application world I often have to change the data properties that are in a object to meet new requirements.

If I'm lucky I don't even need to programmatically add new "behavior" code (functions,methods). Instead I can declarative add validation and even UI options by annotating the property (Java).

In Functional Programming it seems that adding new data properties requires lots of code changes because of pattern matching and data constructors (Haskell, ML).

How do I minimize this problem?

This seems to be a recognized problem as Xavier Leroy states nicely on page 24 of "Objects and Classes vs. Modules" - To summarize for those that don't have a PostScript viewer it basically says FP languages are better than OOP languages for adding new behavior over data objects but OOP languages are better for adding new data objects/properties.

Are there any design pattern used in FP languages to help mitigate this problem?

I have read Phillip Wadler's recommendation of using Monads to help this modularity problem but I'm not sure I understand how?

Chide answered 11/5, 2010 at 2:3 Comment(2)
Some discussion of the Expression Problem here: lambda-the-ultimate.org/node/1641 seems to help but no answer yet.Chide
Perhaps one solution would be better Refactoring Tools for FP languages. At least for Haskell and ML I have the compilerChide
C
24

As Darius Bacon noted, this is essentially the expression problem, a long-standing issue with no universally accepted solution. The lack of a best-of-both-worlds approach doesn't stop us from sometimes wanting to go one way or the other, though. Now, you asked for a "design pattern for functional languages", so let's take a shot at it. The example that follows is written in Haskell, but isn't necessarily idiomatic for Haskell (or any other language).

First, a quick review of the "expression problem". Consider the following algebraic data type:

data Expr a = Lit a | Sum (Expr a) (Expr a)

exprEval (Lit x) = x
exprEval (Sum x y) = exprEval x + exprEval y

exprShow (Lit x) = show x
exprShow (Sum x y) = unwords ["(", exprShow x, " + ", exprShow y, ")"]

This represents simple mathematical expressions, containing only literal values and addition. With the functions we have here, we can take an expression and evaluate it, or show it as a String. Now, say we want to add a new function--say, map a function over all the literal values:

exprMap f (Lit x) = Lit (f x)
exprMap f (Sum x y) = Sum (exprMap f x) (exprMap f y)

Easy! We can keep writing functions all day without breaking a sweat! Algebraic data types are awesome!

In fact, they're so awesome, we want to make our expression type more, errh, expressive. Let's extend it to support multiplication, we'll just... uhh... oh dear, that's going to be awkward, isn't it? We have to modify every function we just wrote. Despair!

In fact, maybe extending the expressions themselves is more interesting than adding functions that use them. So, let's say we're willing to make the trade-off in the other direction. How might we do that?

Well, no sense doing things halfway. Let's up-end everything and invert the whole program. What does that mean? Well, this is functional programming, and what's more functional than higher-order functions? What we'll do is replace the data type representing expression values with one representing actions on the expression. Instead of choosing a constructor we'll need a record of all possible actions, something like this:

data Actions a = Actions {
    actEval :: a,
    actMap  :: (a -> a) -> Actions a }

So how do we create an expression without a data type? Well, our functions are data now, so I guess our data needs to be functions. We'll make "constructors" using regular functions, returning a record of actions:

mkLit x = Actions x (\f -> mkLit (f x))

mkSum x y = Actions 
    (actEval x + actEval y) 
    (\f -> mkSum (actMap x f) (actMap y f))

Can we add multiplication more easily now? Sure can!

mkProd x y = Actions 
    (actEval x * actEval y) 
    (\f -> mkProd (actMap x f) (actMap y f))

Oh, but wait--we forgot to add an actShow action earlier, let's add that in, we'll just... errh, well.

At any rate, what does it look like to use the two different styles?

expr1plus1 = Sum (Lit 1) (Lit 1)
action1plus1 = mkSum (mkLit 1) (mkLit 1)
action1times1 = mkProd (mkLit 1) (mkLit 1)

Pretty much the same, when you're not extending them.

As an interesting side note, consider that in the "actions" style, the actual values in the expression are completely hidden--the actEval field only promises to give us something of the correct type, how it provides it is its own business. Thanks to lazy evaluation, the contents of the field may even be an elaborate computation, performed only on demand. An Actions a value is completely opaque to external inspection, presenting only the defined actions to the outside world.

This programming style--replacing simple data with a bundle of "actions" while hiding the actual implementation details in a black box, using constructor-like functions to build new bits of data, being able to interchange very different "values" with the same set of "actions", and so on--is interesting. There's probably a name for it, but I can't quite seem to recall...

Cheloid answered 11/5, 2010 at 5:1 Comment(5)
It seems much more obvious now... in FP if you a problem just use more functions as indirection :) The problem I think I have is that use OOP languages so much for my day job I forget how think "functional". Thanks a ton for the excellent answer.Chide
@Adam Gent: The amusing irony of it all is that, regarding an "object" as a bundle of methods--like with Actions a in my answer--day-to-day programming in OOP makes far more extensive use of closures and higher-order functions than is common in FP, particularly ML-family languages that emphasize pattern matching and algebraic data types.Cheloid
@Adam Gent: I should also note that my second example was constructed to highlight the symmetry and show what the typically-OOP end of the expression problem looks like in Haskell. Actually using that style in Haskell is somewhat painful, in part because of Haskell's terrible record system. For practical purposes, though, the general point holds: don't assume you should replace objects with algebraic data types, but instead consider partially-applied higher-order functions that encapsulate the object's purpose.Cheloid
In the function mkProd, does the formula "(actEval x * actEval y)" on the second line get passed in as the parameter "\f" on the third line? I'm diverting from the main question, but trying to learn Haskell from the discussion here? Should I be opening a new question?Uproarious
@Tim Perry: Nothing gets passed into the third line at that point; those are the two arguments to the data constructor Actions. The second argument is the actMap field, which is a pseudo-method that you'd call with something like actMap action1times1 (+2), which would return an Actions value representing the expression 3 * 3. Feel free to open a new question if you're still unclear, though do be warned that such code is probably not good Haskell style...Cheloid
L
7

I've heard this complaint more than a few times, and it always confuses me. The questioner wrote:

In Functional Programming it seems that adding new data properties requires lots of code changes because of pattern matching and data constructors (Haskell, ML).

But this is by and large a feature, and not a bug! When you change the possibilities in a variant, for example, the code that accesses that variant via pattern-matching is forced to consider the fact that new possibilities have arisen. This is useful, because indeed you do need to consider whether that code needs to change to react to the semantic changes in the types it manipulates.

I would argue with the claim that "lots of code changes" are required. With well-written code, the type system usually does an impressively good job of bringing to the fore the code that needs to be considered, and not much more.

Perhaps the problem here is that it's hard to answer the question without a more concrete example. Consider providing a piece of code in Haskell or ML that you're not sure how to evolve cleanly. I imagine you'll get more precise and useful answers that way.

Lovering answered 11/5, 2010 at 2:56 Comment(2)
I agree with you to some extent that this is some times a feature. This works great for small team projects where you are not building a framework/api (the only annoyance here would be lack of refactoring tools for Haskell). However if you need to make an API/Framework change to add an additional field to a record that is not needed that often and even has a default value it will not be backward compatible with previous releases. It would require everyone that used the API to recompile and released for a simple optional data change.Chide
I still think you'll get a better answer if you ask this in the context of a concrete problem. I think by abstracting out appropriately, you should be able to get the low-impact code evolution you want. But it's hard to discuss such things in the abstract.Lovering
T
5

This tradeoff is known in the programming-language-theory literature as the expression problem:

The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

Solutions have been put forward, but I haven't studied them. (Much discussion at Lambda The Ultimate.)

Teniacide answered 11/5, 2010 at 2:11 Comment(2)
Yes I have seen Scala solution to this. I feel retarted that I didn't "Problem Match" this to the Expression Problem.Chide
I feel like a bit of a fraud myself for not looking into this much, yet!Teniacide
I
3

In Haskell at least I would make an abstract data type. That is create a type that doesn't export constructors. Users of the type loose the ability to pattern match on the type and you have to provide functions for working with the type. In return you get a type that is easier to modify without changing code written by users of the type.

Immunotherapy answered 14/5, 2010 at 14:36 Comment(1)
Oops, I downvoted absolutely accidentally, but somehow my vote is locked. Could you please edit the question so that I can undo the downvote?Wildebeest
A
0

If the new data imply no new behaviour, as in an application where we are asked to add a "birthdate" field to a "person" resource and then all we have to do is to add it to a list of fields that are part of the person resource, then it's easy to solve in both the functional and OOP world. Just don't treat the "birthdate" as part of your code; it's just part of your data.

Let me explain: if the birthdate is something that implies a different application behaviour, e.g. that we do something differently if the person is underage, then in OOP we would add a birthdate field to the person class, and in FP we would add similarly a birthdate field to a person data structure.

If there is no behaviour attached to "birthdate" then there should be no field named "birthdate" in the code. A data structure such as a dictionary (a map) would hold the various fields. Adding a new one would require no program changes, no matter if you it's OOP or FP. Validations would be added similarly, by attaching a validation regexp or using a similar validation little language to express in data what the validation behaviour should be.

Actinopod answered 2/2, 2014 at 7:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.