Migrating data when changing an NDB field's property type
Asked Answered
C

2

7

Suppose I initially create an ndb.Model and wanted to change a field's ndb property type (e.g. IntegerProperty to StringProperty), but wanted to cast the current data stored in that field so that I don't lose that data. One method would be to simply create a new field name and then migrate the data over with a script, but are there other more convenient ways of accomplishing this?

For example, suppose I had the following model:

class Car(ndb.Model):
    name = ndb.StringProperty()
    production_year = ndb.IntegerProperty()

And I stored an instance of the entity:

c = new Car()
c.name = "Porsche"
c.production_year = 2013 

And wanted to change production_year to an ndb.StringProperty() without "losing" the value I set (it would still exist, but would not be retrievable). If I just change production_year to an instance of ndb.StringProperty(), the field value does not report a value which makes sense since the type doesn't match.

So if I changed the model to:

class Car(ndb.Model):
    name = ndb.StringProperty()
    production_year = ndb.StringProperty()

Attempting to retrieve the field with dot notation would result in a value of None. Anyone run into this situation, and could you explain what you did to solve it? Thanks.

Caston answered 7/11, 2013 at 17:30 Comment(1)
also see my answer here: #29527665Synesthesia
K
11

How you approach this will depend on how many entities you have. If you a relatively small number of entities say in the 10000's I would just use the remote_api and retrieve the raw underlying data from the datastore and manipulate the data directly then write it back, not using the models. For instance this will fetch raw entities as and properties can be accessed like a dictionary. This code is pretty much lifted from the lower level appengine SDK code .

from google.appengine.api import datastore
from google.appengine.api import datastore_errors

def get_entities(keys):
    rpc = datastore.GetRpcFromKwargs({})
    keys, multiple = datastore.NormalizeAndTypeCheckKeys(keys)
    entities = None
    try:
        entities = datastore.Get(keys, rpc=rpc)
    except datastore_errors.EntityNotFoundError:
        assert not multiple

    return entities

def put_entities(entities):
    rpc = datastore.GetRpcFromKwargs({})
    keys = datastore.Put(entities, rpc=rpc)
    return keys

You would use this as follows (I am using fetch to simplify things a bit code wise for this example)

x = Car.query(keys_only=True).fetch(100)
results = get_entities([i.to_old_key() for i in x])

for i in results:
    i['production_year'] = unicode(i['production_year'])

put_entities(results)

This is old code I have and datastore.NormalizeAndTypeCheckKeys takes the old db style key, I haven't looked to see of there is an equivalent function for ndb style keys, but this does work. (Just tested it ;-)

This approach allows you to migrate data without deploying any new code.
If you have millions of entities then you should look at other approaches for processing, ie using this code and using mapreduce.

Knew answered 7/11, 2013 at 23:40 Comment(1)
Thanks. Although I haven't tried it yet, your solution looks fairly straightforward. I guess that the keys retrieved by the RPC are then stored on the model's _properties field, so if I thought that I suspect I'll be changing the Property type of the field, I should probably just keep my fields as StringProperty types and write a wrapper to retrieve/store values and not have to worry about migrating the data.Caston
K
2

Just adding to Tim's answer, if you want to change your property to Text, you can:

from google.appengine.api import datastore_types

(...)

for i in results:
    i['production_year'] = datastore_types.Text(i['production_year'])
Khiva answered 13/1, 2015 at 11:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.