Memory Optimized Tables - Slower INSERT than SSD
Asked Answered
H

1

10

I have observed that inserting data into a Memory-Optimized Table is much slower than an equivalent parallelized insert to a disk-based table on a 5-SSD stripe set.

--DDL for Memory-Optimized Table    
CREATE TABLE [MYSCHEMA].[WIDE_MEMORY_TABLE]
        (
        [TX_ID] BIGINT NOT NULL
        , [COLUMN_01] [NVARCHAR](10) NOT NULL
        , [COLUMN_02] [NVARCHAR] (10) NOT NULL
        --etc., about 100 columns
        --at least one index is required for Memory-Optimized Tables
        , INDEX IX_WIDE_MEMORY_TABLE_ENTITY_ID HASH (TX_ID) WITH (BUCKET_COUNT=10000000)
        )
        WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY)

--DDL for Disk-Based Table
CREATE TABLE [MYSCHEMA].[WIDE_DISK_TABLE]
        (
        [TX_ID] BIGINT NOT NULL
        , [COLUMN_01] [NVARCHAR](10) NOT NULL
        , [COLUMN_02] [NVARCHAR] (10) NOT NULL
        --etc., about 100 columns
        --No indexes
        ) ON [PRIMARY]

For this particular test, I am batching 10,000,000 rows into this table in sets of 25,000. The statement looks something like this for the Memory-Optimized Table:

    --Insert to Memory-Optimized Table 
    INSERT INTO
        WIDE_MEMORY_TABLE
        (
        TX_ID
        , COLUMN_01
        , COLUMN_02
        --etc., about 100 columns
        )
    SELECT
        S.COLUMN_01
        , S.COLUMN_02
        --etc., about 100 columns
    FROM
        [MYSCHEMA].[SOURCE_TABLE] AS S WITH(TABLOCK)
    WHERE
        S.TX_ID >= 1
        AND S.TX_ID < 25001
    OPTION (MAXDOP 4)

This process continues to load 10,000,000 rows. Each iteration just retrieves the next 25,000 rows. The SELECT performs a seek on a covering index on [MY_SCHEMA].[SOURCE_TABLE]. The query plan shows a serialized insert to BIG_MEMORY_TABLE. Each set of 25,000 rows takes around 1400ms.

If I do this to a disk-based table, hosted on a 5-SSD stripe (5,000 IOPS per disk, 200MB/sec throughput), the inserts progress much faster, averaging around 700ms. In the disk-based case, the query performs a parallel insert to [MY_SCHEMA].[WIDE_DISK_TABLE]. Note the TABLOCK hint on [MYSCHEMA].[WIDE_DISK_TABLE].

    --Insert to Disk-Based Table 
    INSERT INTO
        WIDE_DISK_TABLE WITH(TABLOCK)
        (
        TX_ID
        , COLUMN_01
        , COLUMN_02
        --etc., about 100 columns
        )
    SELECT
        S.COLUMN_01
        , S.COLUMN_02
        --etc., about 100 columns
    FROM
        [MYSCHEMA].[SOURCE_TABLE] AS S WITH(TABLOCK)
    WHERE
        S.TX_ID >= 1
        AND S.TX_ID < 25001
    OPTION (MAXDOP 4)

Granted, the disk-based table does not have an index, and the TABLOCK hint enables parallel insert, but I expect way more from an INSERT to RAM.

Any ideas?

Thanks!

Here is a comparison of 100 batches run in 3 modes: Disk-based, with Deferred Index creation, Disk-based with Index, and Memory-Optimized with Index (at least one index is required on Memory-Optimized tables).

100 Batches, 25K rows each

Hoffmann answered 3/2, 2018 at 22:26 Comment(13)
why not test with the index on the disk-based? It's easy to assume that's small, but often it isn't... double the work sometimes is double the work.Gunk
Is your in memory table using row overflow storage? Wonder if there might cause a performance hit.Ockham
This paragraph here looks relevant aboutsqlserver.com/2016/09/27/… For the specific experiment they did "The execution times of INSERT statements n my environment are 153 and 7,722 milliseconds respectively. With off-row storage, In-Memory OLTP needs to add data to twenty other internal tables, which is 40 times slower comparing to in-row storage."Ockham
@MartinSmith Good thinking, but we are not using off-row storage. It's just a bunch of NVARCHAR (mostly small ones) and a handful of DECIMAL. Here is an image of the output from sys.dm_db_xtp_memory_consumers for this table: imgur.com/a/Z99zyHoffmann
@MikeM I updated the post to show a new test of 100 batches, each of 25,000 rows. I already tried this dozens of different ways, which is what let to this post. As you can see, inserting to an SSD-based table, and constructing an index after the load, is the fastest method. Disk-based with index present spends 258% more time, and Memory-Optimized with Index present (mandatory) spends 102% additional time.Hoffmann
@PittsburghDBA -- great testing! Yeah, that was my concern... going from one index - "just the table" - to two is a big jump in relative workload :) . It sounds like you found a very good plan with separating the desired index build!Gunk
@MikeM The SELECT performance of the Memory-Optimized tables is insane, but the INSERT performance is less than stellar. I was hoping to be able to use Memory-Optimized in certain situations as replacements for smaller stage tables, or as substitutes for table variables and/or temp tables. Seeing this performance on INSERT is not impressive at all. I'm still hoping there is something else going on, because RAM chips ought to be way faster than SSD.Hoffmann
I have heard that to get the best performance out of them, they need to go hand in hand with native stored procs. Which doesn't hep with inserts unfortunately.Toughen
What’s the recovery model? To make all things equal, please add the tablock hint on the table that you are inserting into for the memory optimized table. You may be getting minimal logging in the ssd table by using that hint. Also, please let us know which version of sqlEndmost
@MarkWojciechowicz Memory-Optimized tables do not support TABLOCK. Also, the database is in Full recovery mode, and this cannot be changed (operational constraint).Hoffmann
With DURABILITY=SCHEMA_ONLY I wouldn't expect any logging anyway (though it's a while since I read up on in memory OLTP and may be wrong on that)Ockham
@MartinSmith Yes, that is the behavior, and that makes it even stranger.Hoffmann
@MartinSmith This comes down to the power of parallelism. Check out my self-posted answer and the related article by Niko Neugebauer.Hoffmann
H
9

UPDATE

After much testing and research, I believe this comes down to parallelism. At this time, SQL Server 2016, up to and including SP1 CU7, does not support parallel insert to Memory-Optimized Tables. This makes all INSERT statements to Memory-Optimized tables single-threaded.

Here is an insightful article from Niko Neugebauer regarding this issue: Niko Neugebauer - Parallelism in Hekaton (In-Memory OLTP)

This makes it much less useful for ETL/ELT ingestion. However, it is pretty amazing for OLTP DML (especially through natively compiled stored procedures), and stellar for aggregating data in BI queries. For ingestion, it is nearly impossible to beat SSD-based heaps without indexes, so long as you take the right steps to ensure that your INSERT will run in parallel.

Even if the database is in Full Recovery mode, the parallel INSERT to disk-based heap outperforms the INSERT to a Memory-Optimized table. This continues to remain true if comparable indexes are added to the disk-based table after the INSERT.

Hoffmann answered 10/2, 2018 at 15:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.