Memory Optimized Tables - Slower INSERT than SSD

I have observed that inserting data into a Memory-Optimized Table is much slower than an equivalent parallelized insert to a disk-based table on a 5-SSD stripe set.

--DDL for Memory-Optimized Table    
CREATE TABLE [MYSCHEMA].[WIDE_MEMORY_TABLE]
        (
        [TX_ID] BIGINT NOT NULL
        , [COLUMN_01] [NVARCHAR](10) NOT NULL
        , [COLUMN_02] [NVARCHAR] (10) NOT NULL
        --etc., about 100 columns
        --at least one index is required for Memory-Optimized Tables
        , INDEX IX_WIDE_MEMORY_TABLE_ENTITY_ID HASH (TX_ID) WITH (BUCKET_COUNT=10000000)
        )
        WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY)

--DDL for Disk-Based Table
CREATE TABLE [MYSCHEMA].[WIDE_DISK_TABLE]
        (
        [TX_ID] BIGINT NOT NULL
        , [COLUMN_01] [NVARCHAR](10) NOT NULL
        , [COLUMN_02] [NVARCHAR] (10) NOT NULL
        --etc., about 100 columns
        --No indexes
        ) ON [PRIMARY]

For this particular test, I am batching 10,000,000 rows into this table in sets of 25,000. The statement looks something like this for the Memory-Optimized Table:

    --Insert to Memory-Optimized Table 
    INSERT INTO
        WIDE_MEMORY_TABLE
        (
        TX_ID
        , COLUMN_01
        , COLUMN_02
        --etc., about 100 columns
        )
    SELECT
        S.COLUMN_01
        , S.COLUMN_02
        --etc., about 100 columns
    FROM
        [MYSCHEMA].[SOURCE_TABLE] AS S WITH(TABLOCK)
    WHERE
        S.TX_ID >= 1
        AND S.TX_ID < 25001
    OPTION (MAXDOP 4)

This process continues to load 10,000,000 rows. Each iteration just retrieves the next 25,000 rows. The SELECT performs a seek on a covering index on [MY_SCHEMA].[SOURCE_TABLE]. The query plan shows a serialized insert to BIG_MEMORY_TABLE. Each set of 25,000 rows takes around 1400ms.

If I do this to a disk-based table, hosted on a 5-SSD stripe (5,000 IOPS per disk, 200MB/sec throughput), the inserts progress much faster, averaging around 700ms. In the disk-based case, the query performs a parallel insert to [MY_SCHEMA].[WIDE_DISK_TABLE]. Note the TABLOCK hint on [MYSCHEMA].[WIDE_DISK_TABLE].

    --Insert to Disk-Based Table 
    INSERT INTO
        WIDE_DISK_TABLE WITH(TABLOCK)
        (
        TX_ID
        , COLUMN_01
        , COLUMN_02
        --etc., about 100 columns
        )
    SELECT
        S.COLUMN_01
        , S.COLUMN_02
        --etc., about 100 columns
    FROM
        [MYSCHEMA].[SOURCE_TABLE] AS S WITH(TABLOCK)
    WHERE
        S.TX_ID >= 1
        AND S.TX_ID < 25001
    OPTION (MAXDOP 4)

Granted, the disk-based table does not have an index, and the TABLOCK hint enables parallel insert, but I expect way more from an INSERT to RAM.

Any ideas?

Thanks!

Here is a comparison of 100 batches run in 3 modes: Disk-based, with Deferred Index creation, Disk-based with Index, and Memory-Optimized with Index (at least one index is required on Memory-Optimized tables).

UPDATE

After much testing and research, I believe this comes down to parallelism. At this time, SQL Server 2016, up to and including SP1 CU7, does not support parallel insert to Memory-Optimized Tables. This makes all INSERT statements to Memory-Optimized tables single-threaded.

Here is an insightful article from Niko Neugebauer regarding this issue: Niko Neugebauer - Parallelism in Hekaton (In-Memory OLTP)

This makes it much less useful for ETL/ELT ingestion. However, it is pretty amazing for OLTP DML (especially through natively compiled stored procedures), and stellar for aggregating data in BI queries. For ingestion, it is nearly impossible to beat SSD-based heaps without indexes, so long as you take the right steps to ensure that your INSERT will run in parallel.

Even if the database is in Full Recovery mode, the parallel INSERT to disk-based heap outperforms the INSERT to a Memory-Optimized table. This continues to remain true if comparable indexes are added to the disk-based table after the INSERT.

Recommended topics

Hot tags