Best approach on handling large dataset for offline-first Mobile Apps (PouchDB)

Asked 7/10, 2016 at 4:29 Answered 9/2, 2017 at 6:25

So I'm using Ionic v2 and using Pouch for mobile development using sqlite. Data coming from a REST API which contains something like this:

{
  "record-id": "2332255",
  "record-name": "record-ABC-XTY",
  "record-items": [
    {
      "item-id": "456454",
      "item-name": "item-XADD",
      "category": "Cat1",
      "subcategory": "Subcat1",
      "location": "LocationXYZ",
      "owner": "Person1",
      "data-rows": [
        {
          "row-name": "sampleRowName1",
          "row-value": "ABC-XASS"
        },
        {
          "row-name": "sampleRowName2",
          "row-value": "ABC-XASS"
        }
      ]
    },
    {
      "item-id": "654645",
      "item-name": "item-BNSSA",
      "category": "Cat2",
      "subcategory": "Subcat2",
      "location": "LocationABC",
      "owner": "Person2",
      "data-rows": [
        {
          "row-name": "sampleRowName1",
          "row-value": "ABC-XASS"
        },
        {
          "row-name": "sampleRowName2",
          "row-value": "ABC-XASS"
        }
      ]
    }
  ]
}

Now as you can see, the record-items could contain 100,000 items or more (est json size: 32mb). Right now I'm lost on which approach should I take. Optimized data handling is crucial and I don't know what PouchDB approach is better. Here are some of my thoughts.

Save the whole JSON data as one entry for PouchDB. But I'm worried that it will take up a large memory when retrieved and will make that application slow.
Chunk the record-items by one pouch entry record and retrieve it individually. I'm not sure if this is better in terms of overall performance but PouchDB record will probably be larger (?).

Also, there will be sorting, fetching all data (only the _ids and few fields just to show a list of all results) and searching.

Linettelineup answered 7/10, 2016 at 4:29 Comment(7)

Nested documents are good but depending on your needs, you might one to separate some nested documents into other documents. With CouchDB, you need to plan your database according to your queries. Also, heavy network transactions increment the risk of issue/conflicts. To be able to give you more advices, could you please elaborate more on the type of application, type of queries, etc. The more we know, the more we can help ;) – Productive 7/10, 2016 at 16:28

I can't tell you the exact dataset, but it is pretty much what is in the example. The application will be heavily offline-first application since the app will be used in a remote connection to collect data. Basically as you can see in the "row-name" and "row-value", those datasets will be in an input form for data collection and then upload the data back to the server. There will also be searching, filtering and sorting. Take note that the data could give 100,000+/- items that could result on getting a 30+mb of json data. – Linettelineup 8/10, 2016 at 14:20

Can you split the nested "items" into Items collections? – Productive 8/10, 2016 at 14:37

@AlexisCôté I'm not sure what you meant by "splitting" but just to be clear, every item will be unique. The ID will be strictly unique. I can split it into individual documents that's for sure. – Linettelineup 8/10, 2016 at 19:17

I would create documents of type "items" or whatever you named it and add the id of the parent document if you still want to have a "relationship" with it. – Productive 8/10, 2016 at 19:59

I'm planning to do that too. The UI will be a form of each "item" and then there's a "next" and "prev" to traverse through records. The order will be determined by a sort. – Linettelineup 8/10, 2016 at 20:27

lets all increase the bounty , there's a few of us who wants to know the answer to this – Illegalize 2/2, 2017 at 7:45

We have a similar app that works in offline mode and stores the data locally using sqlite but the data we deal with may not be that huge. For us the data is downloaded as xml file from web service; the xml have attributes row, column, value, name etc. The app serializes the data and converts it into objects which are then inserted into sqlite (using "InsertAll"/"UpdateAll" the insert or update for items is quite fast). These xml's are loaded into UI and user can update "value" tags from UI.
Search is optimized by giving user filters so that the query is run on smaller data.

For your case I can think of 3 tables that you can use:-

1) Records (Fields:-RecordID, RecordName) 2) Items (Fields:- ItemID (PK), RecordID (FK), ItemName etc) 3) Rows (Fields:-ItemID (FK), RowName, RowValue)

After geting data from REST you can serialize the data and insert it into respective tables concurrently. Try giving users filters when it comes to search so that actual data set is smaller.

Hope it helps!

Beefwood answered 9/2, 2017 at 5:32 Comment(1)

PouchDB is a document-oriented NoSQL database. Breaking everything up into tables is fighting the nature of the db, and really doesn't seem warranted in this case. – Prae 9/2, 2017 at 6:28

Your basic decision is whether to embed the data or reference it. Here are some general rules for deciding:

Embed when:

Data typically queried together (example: user profile)
Child depends on parent
One-to-one relationship
One-to-few relationship
Changes occur at a similar rate

Reference when:

Unbounded one-to-many relationship exists
Many-to-many relationship
Same data repeated in many places
Data changes at different rates

You're correct that if you store everything as one record you may have problems with the size. The extra storage caused by splitting it up should be inconsequential.

You'll be using views to create indexes, which then feed into your queries. How you do that will probably dominate the efficiency.

Prae answered 9/2, 2017 at 6:25 Comment(0)

Recommended topics

Hot tags