Need to store LOTS of data on Android device, thinking of going OODB
Asked Answered
A

4

11

I'm currently working on a project that's based on Android. Without getting into many details, the software will run on a custom built device. The hardware will never change and will always be the same. That's a definite plus :)

With that said, this project is requiring us to store loads and loads of data on the device - Upwards of 3m rows in some tables. SQLite handles scanning this many rows just fine for us, the problem comes in when we start doing complex joins to bring back all the related data we need. We've thought about denormalizing the database but are afraid that will push the database outside the realm of usable.

We are looking into using an Object Oriented database, something like db4o or NeoDatis. Our hope is that by storing objects we can get rid of our relations on a row level and store them on the object (just like OOP). The problem is we have not been able to find any performance related benchmarks (at least not recent ones) of these ODBs running and being used on Android.

Does anyone have any experience with OODBs on Android and/or with storing and accessing this large amount of data? If so any advice you could provide would be greatly appreciated.

-- Edit

Here's an example of the problem we're facing. It's not related to our app (my NDA says I can't post anything specific) but this example represents the problem well.

Imagine we're building an application to monitor every vehicle that's driving on the New Jersey Turnpike at any given time. For any given car we need to track the car Make and Model, how many people are in the car and what's the demographic of the people in the car. So basically you end up with data that looks something like -

car

id | color | make_id | in_toll_lane | model_id

make

id | name

model

id | name | make_id

car_person

id | age | sex | is_driver | car_id

toll_lanes

id | cars_in_line | ideal_cars_in_line | ideal_occupants

This data is going to be changing frequently. It's also going to get rather huge, as there are no doubts A LOT of people driving down the NJ Pike at any given time.

With this data we need to be able to a snap shot, on demand, of anyone who's driving on the pike. We also need to be able to take a snap shot of all the males who are driving, or all the females on the turnpike. We also need to be able to search by Age, Sex, Make, Model, etc.

Now imagine we need to figure out what toll lane each car should go into based on the number of people in the car, the ideal number of occupants, the number of cars already in line, and the ideal number of cars that should be in line.

This is a very simple example, though pretty representative of our problem.

-- End Edit

Thanks in advance!

Algonkian answered 1/12, 2010 at 1:41 Comment(4)
Sounds like a very interesting project! This does not directly address your question, but you might want to take a good look at the db4o licensing before you choose to use it: db4o.com/about/productinformation/whitepapers/#gplSalt
Thanks for the comment! It is def. an interesting project :) I've seen their licensing model and we are fully prepared to pay for a solution if we need to. I left that out of my original post because I was hoping to avoid BerkelyDB suggestions. (Personal preference there, really) =)Algonkian
well, if BDB is a possibility, then SQLite definitely is. you can use it as a simple key/object store, and some users swear that it greatly outperforms BDB in that space, specially for big datasets. (disclaimer: i haven't tried it personally)Hole
That's a great point Javier and it's something we're in the middle of testing. Though we're not going true key/value, we are adding a few other metadata columns to make getting at what we need a little easier. We do foresee a couple problems though. In the SQLite approach, how do we search on properties within the objects? Our solution (that we're testing, so not really a solution) is to add those fields as metadata columns. There is also the cross linking problem. We can find which categories an item has (it's on the object), but how do we find which items are part of a category?Algonkian
K
3

Here are some observations, though I suspect it will not help you directly.

I think the main questions are: Are you going to discover your complex relations via application runtime logic as events generate or change data or are you going to have to just dump data into a store and then discover un-anticipate relations via query?

If your business logic will populate the model then you can easily create model based views of your different slices of the data model e.g. collections which know all cars having male/female drivers. In this case, basically, your relations are semi-static rarely changing ( while data values on the other end of those relations are probably changing a lot ). If this is the case, then why try and store the data in a database technology which is forcing you to constantly recalculate the relations (JOIN). It is just a waste of CPU and is why you will see the poor performance as the model gets complex. So, once you answer these questions, it will be very clear if ODB or RDB is best choice.

Now the question becomes, what will run on Android and handle huge data? This is where I think I cannot help. I work at Versant who has ( db4o and Versant ) ODB. Now db4o will run on Android, but really is it right choice for huge data ... No. Not unless you have very isolated data which can be in separate databases and accessed only in isolation and it does not sound to me like it is your situation. Our other database, Versant is mean't to handle huge data in near real-time, but only the client is 100% Java, the server is written in C, so it will not run on Android.

I think you will need to do some research to see who has ODB which can handle huge data on Android.

Best, -Robert

Kanazawa answered 1/12, 2010 at 16:57 Comment(4)
Thank you very much for the very insightful post. We initially tried to discover the joins in real time, however we are now looking into compiling the results of these complex joins in the background after the data changes. We don't need the changes to be ready immediately. A few (< 10) second delay between change and update is not a deal breaker. If we do go this route we can store data in different db storages because we will only ever need to access one at a time in real time. Is db4o going to perform well in this scenario? And how about searching for a specific prop value on an object?Algonkian
Hi Jason, yes I think in this case db4o could work. I suggest you take up Carl's suggesion to give him a ping. Try the db4o forum to catch him .... developer.db4o.com/Forums.aspx When it comes to the query on specific property value, you will get excellent performance with indexes in place and have some great usability options like native query which will give you compile time checked refactorable query code. Best of luck on the project.Kanazawa
Oh ... also want to point out. I must have missed the 3M entry comment, it's what I get for scanning. I took "huge" to mean 100's of millions and/or more data than easily fits into a single file. Something like 3M instances will be no problem for db4o.Kanazawa
Understood and I apologize - I was using "Huge" in context to what we're actually building. For where our device is going to live 3M records of data is a lot =)Algonkian
O
3

You don't say much about your data access needs or the data loadout really.

If you've got 3M main rows, and then a bunch of smaller leaf tables, then you may just do well by caching all of the leaf tables in RAM, and "joining" to them by hand. Many systems have very small leaf tables (particularly compared to the main data), so loading them up in RAM and then simply looking them up when you load the row can be a big win.

Obviously, you don't do this with major parent->child relations, but if you can eliminate the leaf joins, then a read become a single join between the parent and child rather than a half dozen to parent, child, and leaf tables.

Even if this doesn't work for all of the leaf tables, if it works for a large majority, it may well be enough to get you over the hump.

Oxide answered 1/12, 2010 at 2:7 Comment(1)
Thanks for the response! You raise a good point, I didn't really go into our data needs. We have A LOT of relationships, much of which is constantly changing and can't really be cached (we would be invalidating the cache very often). There are also leaf tables, though some (2 or 3) of them will not be small (also upwards of 1m+ rows). These leaf tables contain things like Categories, and other child information.Algonkian
K
3

Here are some observations, though I suspect it will not help you directly.

I think the main questions are: Are you going to discover your complex relations via application runtime logic as events generate or change data or are you going to have to just dump data into a store and then discover un-anticipate relations via query?

If your business logic will populate the model then you can easily create model based views of your different slices of the data model e.g. collections which know all cars having male/female drivers. In this case, basically, your relations are semi-static rarely changing ( while data values on the other end of those relations are probably changing a lot ). If this is the case, then why try and store the data in a database technology which is forcing you to constantly recalculate the relations (JOIN). It is just a waste of CPU and is why you will see the poor performance as the model gets complex. So, once you answer these questions, it will be very clear if ODB or RDB is best choice.

Now the question becomes, what will run on Android and handle huge data? This is where I think I cannot help. I work at Versant who has ( db4o and Versant ) ODB. Now db4o will run on Android, but really is it right choice for huge data ... No. Not unless you have very isolated data which can be in separate databases and accessed only in isolation and it does not sound to me like it is your situation. Our other database, Versant is mean't to handle huge data in near real-time, but only the client is 100% Java, the server is written in C, so it will not run on Android.

I think you will need to do some research to see who has ODB which can handle huge data on Android.

Best, -Robert

Kanazawa answered 1/12, 2010 at 16:57 Comment(4)
Thank you very much for the very insightful post. We initially tried to discover the joins in real time, however we are now looking into compiling the results of these complex joins in the background after the data changes. We don't need the changes to be ready immediately. A few (< 10) second delay between change and update is not a deal breaker. If we do go this route we can store data in different db storages because we will only ever need to access one at a time in real time. Is db4o going to perform well in this scenario? And how about searching for a specific prop value on an object?Algonkian
Hi Jason, yes I think in this case db4o could work. I suggest you take up Carl's suggesion to give him a ping. Try the db4o forum to catch him .... developer.db4o.com/Forums.aspx When it comes to the query on specific property value, you will get excellent performance with indexes in place and have some great usability options like native query which will give you compile time checked refactorable query code. Best of luck on the project.Kanazawa
Oh ... also want to point out. I must have missed the 3M entry comment, it's what I get for scanning. I took "huge" to mean 100's of millions and/or more data than easily fits into a single file. Something like 3M instances will be no problem for db4o.Kanazawa
Understood and I apologize - I was using "Huge" in context to what we're actually building. For where our device is going to live 3M records of data is a lot =)Algonkian
G
3

Speaking for db4o: We run all our regression tests on Android because we think it will become a very important platform for db4o.

db4o works very good for the order of magnitude of 3 million objects.

We are doing benchmark testing against other databases on http://www.polepos.org/ and we will soon release a new version of the benchmark where we run a complex setup, also against SqlLite. Porting the benchmark to Android also is a consideration.

If joins are killing your performance and you have very heterogenous data, db4o could work better than a relational database.

Your app sounds interesting. If you need help evaluating db4o, just give me a shout.

Glitter answered 1/12, 2010 at 19:0 Comment(2)
Thanks for the response, Carl! I'm looking into evaluation db4o now. My problem right now is that I have a lot of questions about OODBs as I've never used them before. I appreciate the offer for help! I tried to send you a message here on StackOverflow but apparently I can't? =)Algonkian
We are monitoring the db4o tag on stackoverflow, so you can ask questions either here or in our forum: developer.db4o.com/Forums.aspx. My email: carl at db4o dot comGlitter
H
2

Jason: for reaching any db4o member you must use this pattern: firstname @ db4o.com Best!

Hushaby answered 2/12, 2010 at 0:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.