LMDB random writes really slow for large data (~1MB/s)
Asked Answered
G

1

7

I am testing LMDB performance with the benchmark given in http://www.lmdb.tech/bench/ondisk/. And I noticed that LMDB random writes are really slow when the data goes beyond memory.

I am using a machine with 4GB DRAM with Intel PCIE SSD. The key size is 10 bytes and value size is 1KB. The benchmark code is given in http://www.lmdb.tech/bench/ondisk/, and the command line I used is "./db_bench_mdb --benchmarks=fillrandbatch --threads=1 --stats_interval=1024 --num=10000000 --value_size=1000 --use_existing_db=0 ".

For the first 1GB of data written, the average write rate is 140MB/s. The rate then drops significantly to 40MB/s for the first 2GB. At the end of the test, in which 10M values are written, the average rate is just 3MB/s, and the instant rate is 1MB/s. I know LMDB is not optimized for writes, but I didn't expect it to be this slow, given that I have a really high-end Intel SSD.

I also notice that the way LMDB access the SSD is really strange. At the beginning of the test, it writes the SSD at around 400MB/s, but performs no read, which is expected. But as we write more and more data, LMDB starts to read the SSD. As time goes on, the read throughput rises while the write throughput drops significantly. At the end of test, LMDB is constantly reading at around 190MB/s, while occationally issuing 100MB writes at around 10-20 second intervals.

1. Is it normal for LMDB to have such low write throughput (1MB/s at the end of test) for data stored on SSD?

2. Why is LMDB reading more data than it is writing (about 20MB data read per 1MB written) at the end of the test?

To my understanding, although we have more data than the DRAM can hold, the branch nodes of the B-tree should still be in the DRAM. So for every write, the only pages that we need to fetch from SSD is the leaf nodes. And when we write the leaf node, we might also need to write its parents. So there should be more writes than reads. But it turns out LMDB is reading much more than writing. I think it might be the reason why it is so slow at the end. But I really cannot understand why.

For your reference, here is part of the log given by the benchmark:

2018/03/12-10:36:30 ... thread 0: (1024,1024) ops and (54584.2,54584.2) ops/second in (0.018760,0.018760) seconds
2018/03/12-10:36:30 ... thread 0: (1024,2048) ops and (111231.8,73231.8) ops/second in (0.009206,0.027966) seconds
2018/03/12-10:36:30 ... thread 0: (1024,3072) ops and (125382.6,85019.2) ops/second in (0.008167,0.036133) seconds
2018/03/12-10:36:30 ... thread 0: (1024,4096) ops and (206202.2,99661.8) ops/second in (0.004966,0.041099) seconds
2018/03/12-10:36:30 ... thread 0: (1024,5120) ops and (259634.9,113669.2) ops/second in (0.003944,0.045043) seconds
2018/03/12-10:36:30 ... thread 0: (1024,6144) ops and (306495.1,126984.1) ops/second in (0.003341,0.048384) seconds
2018/03/12-10:36:30 ... thread 0: (1024,7168) ops and (339185.2,139447.1) ops/second in (0.003019,0.051403) seconds
2018/03/12-10:36:30 ... thread 0: (1024,8192) ops and (384240.2,151512.9) ops/second in (0.002665,0.054068) seconds
2018/03/12-10:36:30 ... thread 0: (1024,9216) ops and (385252.1,162465.2) ops/second in (0.002658,0.056726) seconds
2018/03/12-10:36:30 ... thread 0: (1024,10240) ops and (371553.0,172152.9) ops/second in (0.002756,0.059482) seconds
...
2018/03/12-10:36:37 ... thread 0: (1024,993280) ops and (70127.4,142518.0) ops/second in (0.014602,6.969505) seconds
2018/03/12-10:36:37 ... thread 0: (1024,994304) ops and (199415.8,142559.9) ops/second in (0.005135,6.974640) seconds
2018/03/12-10:36:37 ... thread 0: (1024,995328) ops and (75953.1,142431.4) ops/second in (0.013482,6.988122) seconds
2018/03/12-10:36:37 ... thread 0: (1024,996352) ops and (200823.7,142474.0) ops/second in (0.005099,6.993221) seconds
2018/03/12-10:36:37 ... thread 0: (1024,997376) ops and (71975.8,142330.8) ops/second in (0.014227,7.007448) seconds
2018/03/12-10:36:37 ... thread 0: (1024,998400) ops and (62117.1,142142.6) ops/second in (0.016485,7.023933) seconds
2018/03/12-10:36:37 ... thread 0: (1024,999424) ops and (36366.2,141720.2) ops/second in (0.028158,7.052091) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1000448) ops and (61914.3,141533.5) ops/second in (0.016539,7.068630) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1001472) ops and (60985.1,141342.6) ops/second in (0.016791,7.085421) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1002496) ops and (60466.5,141149.8) ops/second in (0.016935,7.102356) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1003520) ops and (60189.3,140956.3) ops/second in (0.017013,7.119369) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1004544) ops and (61731.4,140772.1) ops/second in (0.016588,7.135957) seconds
...
2018/03/12-10:40:15 ... thread 0: (1024,3236864) ops and (5620.5,14373.0) ops/second in (0.182189,225.203790) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3237888) ops and (6098.5,14366.9) ops/second in (0.167911,225.371701) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3238912) ops and (5469.5,14359.5) ops/second in (0.187221,225.558922) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3239936) ops and (5593.9,14352.4) ops/second in (0.183056,225.741978) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3240960) ops and (5806.9,14345.7) ops/second in (0.176342,225.918320) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3241984) ops and (5332.9,14338.1) ops/second in (0.192016,226.110336) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3243008) ops and (5532.3,14330.9) ops/second in (0.185096,226.295432) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3244032) ops and (6108.8,14324.8) ops/second in (0.167626,226.463058) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3245056) ops and (6074.7,14318.6) ops/second in (0.168567,226.631625) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3246080) ops and (5615.2,14311.6) ops/second in (0.182362,226.813987) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3247104) ops and (5529.3,14304.5) ops/second in (0.185194,226.999181) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3248128) ops and (5846.2,14298.0) ops/second in (0.175156,227.174337) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3249152) ops and (5741.5,14291.2) ops/second in (0.178351,227.352688) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3250176) ops and (5640.2,14284.3) ops/second in (0.181555,227.534243) seconds
...
2018/03/12-11:30:39 ... thread 0: (1024,9988096) ops and (1917.2,3074.3) ops/second in (0.534112,3248.860552) seconds
2018/03/12-11:30:39 ... thread 0: (1024,9989120) ops and (1858.9,3074.1) ops/second in (0.550851,3249.411403) seconds
2018/03/12-11:30:40 ... thread 0: (1024,9990144) ops and (1922.8,3073.9) ops/second in (0.532557,3249.943960) seconds
2018/03/12-11:30:40 ... thread 0: (1024,9991168) ops and (1857.2,3073.7) ops/second in (0.551382,3250.495342) seconds
2018/03/12-11:30:41 ... thread 0: (1024,9992192) ops and (1851.3,3073.5) ops/second in (0.553130,3251.048472) seconds
2018/03/12-11:30:41 ... thread 0: (1024,9993216) ops and (1941.0,3073.3) ops/second in (0.527568,3251.576040) seconds
2018/03/12-11:30:42 ... thread 0: (1024,9994240) ops and (1923.1,3073.2) ops/second in (0.532461,3252.108501) seconds
2018/03/12-11:30:42 ... thread 0: (1024,9995264) ops and (1987.6,3073.0) ops/second in (0.515200,3252.623701) seconds
2018/03/12-11:30:43 ... thread 0: (1024,9996288) ops and (1931.2,3072.8) ops/second in (0.530233,3253.153934) seconds
2018/03/12-11:30:43 ... thread 0: (1024,9997312) ops and (1918.9,3072.6) ops/second in (0.533633,3253.687567) seconds
2018/03/12-11:30:44 ... thread 0: (1024,9998336) ops and (1999.0,3072.4) ops/second in (0.512246,3254.199813) seconds
2018/03/12-11:30:44 ... thread 0: (1024,9999360) ops and (1853.3,3072.2) ops/second in (0.552533,3254.752346) seconds
fillrandbatch :     325.508 micros/op 3072 ops/sec;    3.0 MB/s

And here is the read/write rate dumpped from iostat at 1s interval:

Device       tps      MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb              73.00         0.12        25.52          0         25
sdb             531.00         0.00       495.21          0        495
sdb           15089.00         0.00       488.77          0        488
sdb           27431.00         0.01       463.55          0        463
sdb           13093.00         0.00       478.77          0        478
sdb           53676.00         0.00       413.79          0        413
sdb           16781.00         0.00       483.60          0        483
sdb           22267.00         0.00       323.32          0        323
sdb           23945.00         0.00       164.55          0        164
sdb           22867.00         0.00       152.25          0        152
sdb           22038.00         0.00       146.39          0        146
sdb           23825.00         0.00       263.61          0        263
...
sdb           20866.00        85.81        76.90         85         76
sdb            7684.00       101.75       115.19        101        115
sdb            3707.00       154.48         0.00        154          0
sdb            4349.00       181.41         0.00        181          0
sdb            4373.00       184.70         0.00        184          0
sdb            4329.00       185.04         0.00        185          0
sdb            4338.00       182.30         0.01        182          0
sdb            4364.00       184.27         0.00        184          0
sdb            5310.00       177.32         4.99        177          4
sdb           32130.00        99.07       119.70         99        119
sdb           27010.00       103.26        99.25        103         99
sdb           11109.00        67.18        99.96         67         99
sdb            3931.00       172.51         0.00        172          0
sdb            4112.00       171.28         0.00        171          0
sdb            4202.00       183.03         0.00        183          0
sdb            4119.00       183.79         0.00        183          0
sdb            4232.00       182.77         0.02        182          0
sdb            4224.00       185.90         0.00        185          0
sdb            4304.00       186.17         0.00        186          0
sdb            4279.00       188.83         0.00        188          0
sdb            4087.00       184.38         0.00        184          0
sdb            7758.00       163.86        16.70        163         16
sdb           21309.00        68.95        80.11         68         80
sdb           21166.00        81.66        78.42         81         78
sdb           19328.00        71.56        71.55         71         71
sdb           20836.00        89.08        76.52         89         76
sdb            3211.00       112.01        82.21        112         82
sdb            3939.00       173.40         0.00        173          0
sdb            3992.00       178.03         0.00        178          0
sdb            4251.00       181.49         0.00        181          0
sdb            4148.00       185.63         0.00        185          0
sdb            4094.00       184.12         0.01        184          0
sdb            4241.00       187.38         0.00        187          0
sdb            4044.00       186.60         0.00        186          0
sdb            4049.00       185.47         0.00        185          0
sdb            4247.00       189.17         0.00        189          0
...
sdb           17457.00       105.45        64.05        105         64
sdb           16736.00        82.12        62.35         82         62
sdb           12074.00       108.76        66.21        108         66
sdb            2232.00       194.44         0.00        194          0
sdb            2171.00       187.27         0.02        187          0
sdb            2322.00       197.91         0.00        197          0
sdb            2311.00       194.65         0.00        194          0
sdb            2240.00       187.93         0.00        187          0
sdb            2189.00       191.38         0.00        191          0
sdb            2266.00       192.33         0.01        192          0
sdb            2312.00       198.95         0.00        198          0
sdb            2310.00       199.84         0.00        199          0
sdb            2350.00       198.83         0.00        198          0
sdb            2275.00       198.31         0.00        198          0
sdb            3952.00       185.05         6.79        185          6
sdb           15842.00        59.89        59.67         59         59
sdb           16676.00        88.24        61.79         88         61
sdb           14768.00        75.94        55.00         75         54
sdb            5677.00       141.71        35.03        141         35
sdb            2135.00       184.78         0.04        184          0
sdb            2301.00       197.18         0.00        197          0
sdb            2334.00       198.81         0.00        198          0
sdb            2304.00       198.83         0.00        198          0
sdb            2348.00       198.67         0.00        198          0
sdb            2352.00       198.42         0.01        198          0
sdb            2373.00       199.32         0.00        199          0
sdb            2363.00       197.55         0.00        197          0
sdb            2289.00       198.71         0.00        198          0
sdb            2246.00       189.31         0.00        189          0
sdb            2357.00       198.64         0.01        198          0
sdb            2338.00       197.96         0.00        197          0
sdb            6292.00       177.60        16.56        177         16
sdb           19374.00        93.72        72.16         93         72
sdb           16873.00       101.38        62.01        101         62
sdb           16960.00        98.99        76.84         98         76
sdb            2299.00       189.32         6.16        189          6
sdb            2285.00       195.82         0.00        195          0
sdb            2346.00       198.25         0.00        198          0
sdb            2325.00       198.91         0.00        198          0
sdb            2353.00       197.72         0.02        197          0
sdb            2320.00       198.82         0.00        198          0
sdb            2327.00       200.05         0.00        200          0
sdb            2340.00       198.35         0.00        198          0
sdb            2322.00       199.29         0.00        199          0
sdb            2316.00       197.43         0.01        197          0
sdb             690.00        51.17         0.00         51          0
Graveclothes answered 15/3, 2018 at 2:3 Comment(0)
G
8

With the help of @Howard Chu, I have identified the problem.

The excessive reads were caused by OS read-ahead. By default, when we read a page, the OS will prefetch the pages following that page into memory. However, this is usually undesirable in a DB, since we are usually doing random read. So we need to disable read-ahead to get optimal performance for random read.

For the benchmark, there is a command line option --readahead=0, which in turn sets MDB_NORDAHEAD option for LMDB.

After disabling read-ahead, the instant write rate grew from 1MB/s to 8MB/s at the end of the test. And the read/write amount observed in iostat is almost identical.

Graveclothes answered 19/3, 2018 at 2:18 Comment(2)
Great find! The read aheads can be so pernicious that lmdb becomes almost unusable. I wonder if MDB_NORDAHEAD shouldn't be a default setting for a memory-mapped database that's supposed to work well with high db size/memory ratios.Detradetract
I am having a similar issue, but with read-aheads already disabled.. Performance is terrible, it went from 3000 operations per second to 200 as the DB grewPhysical

© 2022 - 2024 — McMap. All rights reserved.