Insertion of thousands of contact entries using applyBatch is slow
Asked Answered
W

7

36

I'm developing an application where I need to insert lots of Contact entries. At the current time approx 600 contacts with a total of 6000 phone numbers. The biggest contact has 1800 phone numbers.

Status as of today is that I have created a custom Account to hold the Contacts, so the user can select to see the contact in the Contacts view.

But the insertion of the contacts is painfully slow. I insert the contacts using ContentResolver.applyBatch. I've tried with different sizes of the ContentProviderOperation list(100, 200, 400), but the total running time is approx. the same. To insert all the contacts and numbers takes about 30 minutes!

Most issues I've found regarding slow insertion in SQlite brings up transactions. But since I use the ContentResolver.applyBatch-method I don't control this, and I would assume that the ContentResolver takes care of transaction management for me.

So, to my question: Am I doing something wrong, or is there anything I can do to speed this up?

Anders

Edit: @jcwenger: Oh, I see. Good explanation!

So then I will have to first insert into the raw_contacts table, and then the datatable with the name and numbers. What I'll lose is the back reference to the raw_id which I use in the applyBatch.

So I'll have to get all the id's of the newly inserted raw_contacts rows to use as foreign keys in the data table?

Windowpane answered 8/4, 2011 at 14:12 Comment(2)
@Anders, note you can comment on jcwenger's post directly with the add comment link under his post; it might not seem like it matters now, but questions with a dozen answers become unwieldy if every response were in the question. (Also posts become community wiki after many edits (ten for self-edits, five users for others), where it is impossible to get more reputation -- so updating posts to include new information is excellent, but sometimes just a comment will do.)Bullough
You have some kind of mistake in your code. For example if your 'backreference' always 0 - that means that you are inserting one item at a a time and locking db for it, is a downshift. I'll put example of inserting the same amount of items, that took 30 seconds.Pearl
A
53

Use ContentResolver.bulkInsert (Uri url, ContentValues[] values) instead of ApplyBatch()

ApplyBatch (1) uses transactions and (2) it locks the ContentProvider once for the whole batch instead locking/unlocking once per operation. because of this, it is slightly faster than doing them one at a time (non-batched).

However, since each Operation in the Batch can have a different URI and so on, there's a huge amount of overhead. "Oh, a new operation! I wonder what table it goes in... Here, I'll insert a single row... Oh, a new operation! I wonder what table it goes in..." ad infinitium. Since most of the work of turning URIs into tables involves lots of string comparisons, it's obviously very slow.

By contrast, bulkInsert applies a whole pile of values to the same table. It goes, "Bulk insert... find the table, okay, insert! insert! insert! insert! insert!" Much faster.

It will, of course, require your ContentResolver to implement bulkInsert efficiently. Most do, unless you wrote it yourself, in which case it will take a bit of coding.

Azevedo answered 8/4, 2011 at 15:42 Comment(7)
(Replying to in-question edit, agree w/ @sarnold's comments above) Yes, that is the downside. You don't get the individual row_ids back, you only get a rollup of "Number inserted" -- Keep in mind of course that depending on your table constraints it may not be an all-or-nothing answer. So, if you need to crossreference a foreign key, yes, you'll need to go through and query afterward. Thankfully, query is blazingly fast compared to insertions... Bulk-insert and subsequent query should still be much faster overall.Azevedo
@jswenger and @sarnold. I'm sorry but I didn't get the add comment option until now. I'm new here and made a mistake by creating the question as a unregistered user. Now back to the original question: I implemented the solution, and first it didn't seem to make a difference. On the emulator that is. Then I tried using my device (HTC Desire), and I'm down to 3 minutes. A remarkable difference, but I want more!;) I've seen some applications inserting the same amount of entries to "custom" Sqlite databases in under a minute. Any hope to do this with the Contacts database?Windowpane
Is bulkInsert() better than a ContentProviderOperation here?Inoue
@Igor Ganapolsky A big Yes is what @jcwenger's explaination saysVolturno
Can you point out where exactly transactions are started for applyBatch() ? The default applyBatch() method just calls apply on the passed ContentProviderOperations which in turn just call the insert/update/delete operations in the content provider.Arctogaea
You are wrong. This operation intended for a large filed amount processing. The reason it's slow for you is because you are not using it correctly. See example below (operation takes 30 seconds).Pearl
i have multiple operation like insert ,update and delete how can i resolve the same issue,currently i am using applyBatch() but it take approx 45-60 second.can you please suggest and other solution for thisForster
G
10

bulkInsert: For those interested, here is the code that I was able to experiment with. Pay attention to how we can avoid some allocations for int/long/floats :) this could save more time.

private int doBulkInsertOptimised(Uri uri, ContentValues values[]) {
    long startTime = System.currentTimeMillis();
    long endTime = 0;
    //TimingInfo timingInfo = new TimingInfo(startTime);

    SQLiteDatabase db = mOpenHelper.getWritableDatabase();

    DatabaseUtils.InsertHelper inserter =
        new DatabaseUtils.InsertHelper(db, Tables.GUYS); 

    // Get the numeric indexes for each of the columns that we're updating
    final int guiStrColumn = inserter.getColumnIndex(Guys.STRINGCOLUMNTYPE);
    final int guyDoubleColumn = inserter.getColumnIndex(Guys.DOUBLECOLUMNTYPE);
//...
    final int guyIntColumn = inserter.getColumnIndex(Guys.INTEGERCOLUMUNTYPE);

    db.beginTransaction();
    int numInserted = 0;
    try {
        int len = values.length;
        for (int i = 0; i < len; i++) {
            inserter.prepareForInsert();

            String guyID = (String)(values[i].get(Guys.GUY_ID)); 
            inserter.bind(guiStrColumn, guyID);


            // convert to double ourselves to save an allocation.
            double d = ((Number)(values[i].get(Guys.DOUBLECOLUMNTYPE))).doubleValue();
            inserter.bind(guyDoubleColumn, lat);


            // getting the raw Object and converting it int ourselves saves
            // an allocation (the alternative is ContentValues.getAsInt, which
            // returns a Integer object)

            int status = ((Number) values[i].get(Guys.INTEGERCOLUMUNTYPE)).intValue();
            inserter.bind(guyIntColumn, status);

            inserter.execute();
        }
        numInserted = len;
        db.setTransactionSuccessful();
    } finally {
        db.endTransaction();
        inserter.close();

        endTime = System.currentTimeMillis();

        if (LOGV) {
            long timeTaken = (endTime - startTime);
            Log.v(TAG, "Time taken to insert " + values.length + " records was " + timeTaken + 
                    " milliseconds " + " or " + (timeTaken/1000) + "seconds");
        }
    }
    getContext().getContentResolver().notifyChange(uri, null);
    return numInserted;
}
Galenic answered 27/7, 2011 at 10:12 Comment(1)
I used transactions in an overridden bulkInsert method and it sped up my 600 inserts from 31 seconds to under 1 second. I definitely recommend this approach.Inconsiderable
J
2

An example of on how to override the bulkInsert(), in order to speed up multiples insert, can be found here

Jewish answered 22/12, 2011 at 10:51 Comment(0)
P
1

@jcwenger At first, after read your post, I think that's the reason of bulkInsert is quicker than ApplyBatch, but after read the code of Contact Provider, I don't think so. 1.You said ApplyBatch use transactions, yes, but bulkInsert also use transactions. Here is the code of it:

public int bulkInsert(Uri uri, ContentValues[] values) {
    int numValues = values.length;
    mDb = mOpenHelper.getWritableDatabase();
    mDb.beginTransactionWithListener(this);
    try {
        for (int i = 0; i < numValues; i++) {
            Uri result = insertInTransaction(uri, values[i]);
            if (result != null) {
                mNotifyChange = true;
            }
            mDb.yieldIfContendedSafely();
        }
        mDb.setTransactionSuccessful();
    } finally {
        mDb.endTransaction();
    }
    onEndTransaction();
    return numValues;
}

That is to say, bulkInsert also use transations.So I don't think that's the reason. 2.You said bulkInsert applies a whole pile of values to the same table.I'm sorry I can't find related code in the source code of froyo.And I want to know how could you find that?Could you tell me?

The reason I think is that:

bulkInsert use mDb.yieldIfContendedSafely() while applyBatch use mDb.yieldIfContendedSafely(SLEEP_AFTER_YIELD_DELAY)/*SLEEP_AFTER_YIELD_DELAY = 4000*/

after reading the code of SQLiteDatabase.java, I find that, if set a time in yieldIfContendedSafely, it will do a sleep, but if you don't set the time, it will not sleep.You can refer to the code below which is a piece of code of SQLiteDatabase.java

private boolean yieldIfContendedHelper(boolean checkFullyYielded, long     sleepAfterYieldDelay) {
    if (mLock.getQueueLength() == 0) {
        // Reset the lock acquire time since we know that the thread was willing to yield
        // the lock at this time.
        mLockAcquiredWallTime = SystemClock.elapsedRealtime();
        mLockAcquiredThreadTime = Debug.threadCpuTimeNanos();
        return false;
    }
    setTransactionSuccessful();
    SQLiteTransactionListener transactionListener = mTransactionListener;
    endTransaction();
    if (checkFullyYielded) {
        if (this.isDbLockedByCurrentThread()) {
            throw new IllegalStateException(
                    "Db locked more than once. yielfIfContended cannot yield");
        }
    }
    if (sleepAfterYieldDelay > 0) {
        // Sleep for up to sleepAfterYieldDelay milliseconds, waking up periodically to
        // check if anyone is using the database.  If the database is not contended,
        // retake the lock and return.
        long remainingDelay = sleepAfterYieldDelay;
        while (remainingDelay > 0) {
            try {
                Thread.sleep(remainingDelay < SLEEP_AFTER_YIELD_QUANTUM ?
                        remainingDelay : SLEEP_AFTER_YIELD_QUANTUM);
            } catch (InterruptedException e) {
                Thread.interrupted();
            }
            remainingDelay -= SLEEP_AFTER_YIELD_QUANTUM;
            if (mLock.getQueueLength() == 0) {
                break;
            }
        }
    }
    beginTransactionWithListener(transactionListener);
    return true;
}

I think that's the reason of bulkInsert is quicker than applyBatch.

Any question please contact me.

Presage answered 20/4, 2012 at 9:56 Comment(1)
Which android version did you use. I have look up bulkInsert in android2.3.7 ContactsProvider, but haven't found it use transaction.Mime
A
1

I get the basic solution for you, use "yield points" in batch operation.

The flip side of using batched operations is that a large batch may lock up the database for a long time preventing other applications from accessing data and potentially causing ANRs ("Application Not Responding" dialogs.)

To avoid such lockups of the database, make sure to insert "yield points" in the batch. A yield point indicates to the content provider that before executing the next operation it can commit the changes that have already been made, yield to other requests, open another transaction and continue processing operations.

A yield point will not automatically commit the transaction, but only if there is another request waiting on the database. Normally a sync adapter should insert a yield point at the beginning of each raw contact operation sequence in the batch. See withYieldAllowed(boolean).

I hope it's may be useful for you.

Adscription answered 4/2, 2014 at 22:22 Comment(0)
P
1

Here is am example of inserting same data amount within 30 seconds.

 public void testBatchInsertion() throws RemoteException, OperationApplicationException {
    final SimpleDateFormat FORMATTER = new SimpleDateFormat("mm:ss.SSS");
    long startTime = System.currentTimeMillis();
    Log.d("BatchInsertionTest", "Starting batch insertion on: " + new Date(startTime));

    final int MAX_OPERATIONS_FOR_INSERTION = 200;
    ArrayList<ContentProviderOperation> ops = new ArrayList<>();
    for(int i = 0; i < 600; i++){
        generateSampleProviderOperation(ops);
        if(ops.size() >= MAX_OPERATIONS_FOR_INSERTION){
            getContext().getContentResolver().applyBatch(ContactsContract.AUTHORITY,ops);
            ops.clear();
        }
    }
    if(ops.size() > 0)
        getContext().getContentResolver().applyBatch(ContactsContract.AUTHORITY,ops);
    Log.d("BatchInsertionTest", "End of batch insertion, elapsed: " + FORMATTER.format(new Date(System.currentTimeMillis() - startTime)));

}
private void generateSampleProviderOperation(ArrayList<ContentProviderOperation> ops){
    int backReference = ops.size();
    ops.add(ContentProviderOperation.newInsert(ContactsContract.RawContacts.CONTENT_URI)
            .withValue(ContactsContract.RawContacts.ACCOUNT_NAME, null)
            .withValue(ContactsContract.RawContacts.ACCOUNT_TYPE, null)
            .withValue(ContactsContract.RawContacts.AGGREGATION_MODE, ContactsContract.RawContacts.AGGREGATION_MODE_DISABLED)
            .build()
    );
    ops.add(ContentProviderOperation.newInsert(ContactsContract.Data.CONTENT_URI)
                    .withValueBackReference(ContactsContract.Data.RAW_CONTACT_ID, backReference)
                    .withValue(ContactsContract.Data.MIMETYPE, ContactsContract.CommonDataKinds.StructuredName.CONTENT_ITEM_TYPE)
                    .withValue(ContactsContract.CommonDataKinds.StructuredName.GIVEN_NAME, "GIVEN_NAME " + (backReference + 1))
                    .withValue(ContactsContract.CommonDataKinds.StructuredName.FAMILY_NAME, "FAMILY_NAME")
                    .build()
    );
    for(int i = 0; i < 10; i++)
        ops.add(ContentProviderOperation.newInsert(ContactsContract.Data.CONTENT_URI)
                        .withValueBackReference(ContactsContract.Data.RAW_CONTACT_ID, backReference)
                        .withValue(ContactsContract.Data.MIMETYPE, ContactsContract.CommonDataKinds.Phone.CONTENT_ITEM_TYPE)
                        .withValue(ContactsContract.CommonDataKinds.Phone.TYPE, ContactsContract.CommonDataKinds.Phone.TYPE_MAIN)
                        .withValue(ContactsContract.CommonDataKinds.Phone.NUMBER, Integer.toString((backReference + 1) * 10 + i))
                        .build()
        );
}

The log: 02-17 12:48:45.496 2073-2090/com.vayosoft.mlab D/BatchInsertionTest﹕ Starting batch insertion on: Wed Feb 17 12:48:45 GMT+02:00 2016 02-17 12:49:16.446 2073-2090/com.vayosoft.mlab D/BatchInsertionTest﹕ End of batch insertion, elapsed: 00:30.951

Pearl answered 17/2, 2016 at 10:58 Comment(0)
S
0

Just for the information of the readers of this thread.

I was facing performance issue even if using applyBatch(). In my case there was database triggers written on one of the table. I deleted the triggers of the table and its boom. Now my app insert rows with blessing fast speed.

Sacramentalist answered 5/5, 2017 at 8:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.