This query returns in 10 seconds most of the times, but occasionally it need 40 seconds or more.
There are two executer nodes in the swarm, and there is no remarkable difference between profiles of the two nodes, following is one of them:
HDFS_SCAN_NODE (id=0):(Total: 39s818ms, non-child: 39s818ms, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.07
- AverageScannerThreadConcurrency: 1.47
- BytesRead: 563.73 MB (591111366)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- CachedFileHandlesHitCount: 0 (0)
- CachedFileHandlesMissCount: 560 (560)
- CollectionItemsRead: 0 (0)
- DecompressionTime: 1s501ms
- MaterializeTupleTime(*): 11s685ms
- MaxCompressedTextFileLength: 0
- NumColumns: 9 (9)
- NumDictFilteredRowGroups: 0 (0)
- NumDisksAccessed: 1 (1)
- NumRowGroups: 56 (56)
- NumScannerThreadMemUnavailable: 0 (0)
- NumScannerThreadReservationsDenied: 0 (0)
- NumScannerThreadsStarted: 4 (4)
- NumScannersWithNoReads: 0 (0)
- NumStatsFilteredRowGroups: 0 (0)
- PeakMemoryUsage: 142.10 MB (149004861)
- PeakScannerThreadConcurrency: 2 (2)
- PerReadThreadRawHdfsThroughput: 151.39 MB/sec
- RemoteScanRanges: 1.68K (1680)
- RowBatchBytesEnqueued: 2.32 GB (2491334455)
- RowBatchQueueGetWaitTime: 39s786ms
- RowBatchQueuePeakMemoryUsage: 1.87 MB (1959936)
- RowBatchQueuePutWaitTime: 0.000ns
- RowBatchesEnqueued: 6.38K (6377)
- RowsRead: 73.99M (73994828)
- RowsReturned: 6.40M (6401849)
- RowsReturnedRate: 161.27 K/sec
- ScanRangesComplete: 56 (56)
- ScannerThreadsInvoluntaryContextSwitches: 99 (99)
- ScannerThreadsTotalWallClockTime: 1m10s
- ScannerThreadsSysTime: 630.808ms
- ScannerThreadsUserTime: 12s824ms
- ScannerThreadsVoluntaryContextSwitches: 1.25K (1248)
- TotalRawHdfsOpenFileTime(*): 9s396ms
- TotalRawHdfsReadTime(*): 3s789ms
- TotalReadThroughput: 11.70 MB/sec
Buffer pool:
- AllocTime: 1.240ms
- CumulativeAllocationBytes: 706.32 MB (740630528)
- CumulativeAllocations: 578 (578)
- PeakReservation: 140.00 MB (146800640)
- PeakUnpinnedBytes: 0
- PeakUsedReservation: 33.83 MB (35471360)
- ReadIoBytes: 0
- ReadIoOps: 0 (0)
- ReadIoWaitTime: 0.000ns
- WriteIoBytes: 0
- WriteIoOps: 0 (0)
- WriteIoWaitTime: 0.000ns
We can notice that RowBatchQueueGetWaitTime
is very high, almost 40 seconds, but I cannot figure out why, admitting that TotalRawHdfsOpenFileTime
takes 9 seconds and TotalRawHdfsReadTime
takes almost 4 seconds, I still cannot explain where are other 27 seconds spend on.
Can you suggest the possible issue and how can I solve it?