Trouble with copying dictionaries and using deepcopy on an SQLAlchemy ORM object

Asked 3/6, 2010 at 23:13 Answered 7/6, 2010 at 5:49

Solved python sqlalchemy deep-copy simulated-annealing

I'm doing a Simulated Annealing algorithm to optimise a given allocation of students and projects.

This is language-agnostic pseudocode from Wikipedia:

s ← s0; e ← E(s)                                // Initial state, energy.
sbest ← s; ebest ← e                            // Initial "best" solution
k ← 0                                           // Energy evaluation count.
while k < kmax and e > emax                     // While time left & not good enough:
  snew ← neighbour(s)                           // Pick some neighbour.
  enew ← E(snew)                                // Compute its energy.
  if enew < ebest then                          // Is this a new best?
    sbest ← snew; ebest ← enew                  // Save 'new neighbour' to 'best found'.
  if P(e, enew, temp(k/kmax)) > random() then   // Should we move to it?
    s ← snew; e ← enew                          // Yes, change state.
  k ← k + 1                                     // One more evaluation done
return sbest                                    // Return the best solution found.

The following is an adaptation of the technique. My supervisor said the idea is fine in theory.

First I pick up some allocation (i.e. an entire dictionary of students and their allocated projects, including the ranks for the projects) from entire set of randomised allocations, copy it and pass it to my function. Let's call this allocation aOld (it is a dictionary). aOld has a weight related to it called wOld. The weighting is described below.

The function does the following:

Let this allocation, aOld be the best_node
From all the students, pick a random number of students and stick in a list
Strip (DEALLOCATE) them of their projects ++ reflect the changes for projects (allocated parameter is now False) and lecturers (free up slots if one or more of their projects are no longer allocated)
Randomise that list
Try assigning (REALLOCATE) everyone in that list projects again
Calculate the weight (add up ranks, rank 1 = 1, rank 2 = 2... and no project rank = 101)
For this new allocation aNew, if the weight wNew is smaller than the allocation weight wOld I picked up at the beginning, then this is the best_node (as defined by the Simulated Annealing algorithm above). Apply the algorithm to aNew and continue.
If wOld < wNew, then apply the algorithm to aOld again and continue.

The allocations/data-points are expressed as "nodes" such that a node = (weight, allocation_dict, projects_dict, lecturers_dict)

Right now, I can only perform this algorithm once, but I'll need to try for a number N (denoted by kmax in the Wikipedia snippet) and make sure I always have with me, the previous node and the best_node.

So that I don't modify my original dictionaries (which I might want to reset to), I've done a shallow copy of the dictionaries. From what I've read in the docs, it seems that it only copies the references and since my dictionaries contain objects, changing the copied dictionary ends up changing the objects anyway. So I tried to use copy.deepcopy().These dictionaries refer to objects that have been mapped with SQLA.

Questions:

I've been given some solutions to the problems faced but due to my über green-ness with using Python, they all sound rather cryptic to me.

Deepcopy isn't playing nicely with SQLA. I've been told thatdeepcopy on ORM objects probably has issues that prevent it from working as you'd expect. Apparently I'd be better off "building copy constructors, i.e. def copy(self): return FooBar(....)." Can someone please explain what that means?
I checked and found out that deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that I wouldn't want in the copy but is necessary for the object to have. I've been told: "There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new object with __init__() and set up the attributes that are significant, instead of doing a full deep copy." What exactly does that mean? Do I create a new, unmapped class similar to the old, mapped one?
An alternate solution is that I'd have to "implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up, there are functions in sqlalchemy.orm.attributes which can help with that." Once again this is beyond me so could someone kindly explain what it means?
A more general question: given the above information are there any suggestions on how I can maintain the information/state for the best_node (which must always persist through my while loop) and the previous_node, if my actual objects (referenced by the dictionaries, therefore the nodes) are changing due to the deallocation/reallocation taking place? That is, without using copy?

Compellation answered 3/6, 2010 at 23:13 Comment(0)

I have another possible solution: use transactions. This probably still isn't the best solution but implementing it should be faster.

Firstly create your session like this:

# transactional session
Session = sessionmaker(transactional=True)
sess = Session()

That way it will be transactional. The way transactions work is that sess.commit() will make your changes permanent while sess.rollback() will revert them.

In the case of simulated annealing you want to commit when you find a new best solution. At any later point, you can invoke rollback() to revert the status back to that position.

Ligature answered 7/6, 2010 at 5:49 Comment(3)

This looks interesting. What does sess.rollback() revert to? The original or the previous state? Additionally, this means I wouldn't actually have to change any of my Classes would I? Just the point where I commit them changes right? – Compellation 7/6, 2010 at 20:10

sess.rollback() will revert back to the state as of the last call to sess.commit(). Your dictionaries aren't SQLAlchemy objects so they will not be reverted. You'll have to handle those yourself. Other then that, you should not require any additional changes. – Ligature 8/6, 2010 at 5:57

It doesn't significantly change anything but the rollback() ability is quite useful since I'm hopping from "state" to "state". – Compellation 9/6, 2010 at 0:2

You don't want to copy sqlalchemy objects like that. You could implement your own methods which make the copies easily enough, but that is probably not want you want. You don't want copies of students and projects in your database do you? So don't copy that data.

So you have a dictionary which holds your allocations. During the process you should never modify the SQLAlchemy objects. All information that can be modified should be stored in those dictionaries. If you need to modify the objects to take that into account, copy the data back at the end.

Ligature answered 6/6, 2010 at 7:24 Comment(7)

@Winston: I wish I could implement my own methods to make the copying easier but I'm almost out of time and still quite a beginner at Python programming (and programming in general) so that seems a bit daunting at this time :). Yup, I have a dictionary that holds the allocations -- I only ever add rows to it and then never touch it save for querying information. It's the student, project and lecturers dictionaries that I'm changing for the Simulated Annealing. – Compellation 6/6, 2010 at 7:33

[continued] So far, SQLAlchemy hasn't actually complained although what I'm doing is -- according to responses to my questions -- wrong (changing a mapped object after it's been added to the session). My main issue is that I need to use the concept for deepcopy but can't. The reason I mapped Student, et al is because I had to keep my Allocation mapping very simple. Thus the student id's in Allocation are foreign keys in to the Student-mapped table (so I can query info about the students when needed). It's a bit crazy :) – Compellation 6/6, 2010 at 7:38

@Winston: Now that you mention it, I just remembered that I have a function called resetSPL() that basically resets the attributes of students, projects and lecturers... I should not be doing that should I? – Compellation 6/6, 2010 at 8:8

No, essentially when doing an algorithm like simulated annealing you really shouldn't be using sqlalchemy. SQLAlchemy is for persistence which is not something you want. Your use of it, and its intended use are too different: you are simply going to get headaches. – Ligature 7/6, 2010 at 5:44

The SQLA works nicely for my Monte-Carlo simulation since I keep adding all the stuff to the DB and later use it for calculations -- I guess that's where I need persistence the most. Additionally, I'd like to store all my simulations to the database for later reference... but then I need to need to generate a key that's unique to each simulation run... sigh – Compellation 7/6, 2010 at 20:9

" I keep adding all the stuff to the DB and later use it for calculations" Are you adding stuff to the DB while doing the annealing or beforehand? – Ligature 8/6, 2010 at 6:1

@Winston: For the Monte-Carlo I add instances of an object that are different from either students, supervisors or lecturers. Think of it as the bare minimum needed to express my solution within a database (so an incrementing key, a session_id, relevant student, project id's (purely numerical) and rank). For the annealing: all object manipulations are now done on unmapped objects and passed over to a mapped object so that I can preserve my "best" output :) – Compellation 8/6, 2010 at 23:59

Recommended topics

Hot tags