why is virtio-scsi much slower than virtio-blk in my experiment (over and ceph rbd image)?
Asked Answered
P

2

13

Hi I recently did a experiment of virtio-scsi over rbd through qemu target (for its DISCARD/TRIM support), and compared the throughput and iops with that of a virtio-blk over rbd setup on the same machine, using fio in the guest. Turnout the throughput in sequential read write is 7 times smaller (42.3MB/s vs 309MB/s) and the iops in random read write is 10 times smaller (546 vs 5705).

What I did is setting up a virtual machine using OpenStack Juno, which give me the virtio-blk over rbd setup. Then I modified the relevant part in libvirt configure xml, from this:

<disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='writeback'/>
  <auth username='cinder'>
    <secret type='ceph' uuid='482b83f9-be95-448e-87cc-9fa602196590'/>
  </auth>
  <source protocol='rbd' name='vms/c504ea8b-18e6-491e-9470-41c60aa50b81_disk'>
    <host name='192.168.20.105' port='6789'/>
  </source>
  <target dev='vda' bus='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>

to this:

<disk type='network' device='disk'>
  <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
  <auth username='cinder'>
    <secret type='ceph' uuid='482b83f9-be95-448e-87cc-9fa602196590'/>
  </auth>
  <source protocol='rbd' name='vms/c504ea8b-18e6-491e-9470-41c60aa50b81_disk'>
    <host name='192.168.20.105' port='6789'/>
  </source>
  <target dev='vda' bus='scsi'/>
  <controller type='scsi' model='virtio-scsi' index='0'/>
</disk>

The software versions are:

qemu 2.5.1

libvirt 1.2.2

kernel 3.18.0-031800-generic #201412071935 SMP Mon Dec 8 00:36:34 UTC 2014 x86_64 (a Ubuntu 14.04 kernel)

And the hypervisor is KVM.

I don't think the performance difference could be that large between virtio-scsi and virtio-blk. So please point out what I did wrong, and how to achieve a reasonable performance.

A constraint is that I want a solution that works for OpenStack (ideal if works for Juno) without many patching or coding around. E.g., I heard of virtio-scsi + vhost-scsi + scsi-mq, but that seems not available in OpenStack right now.

Plucky answered 19/8, 2016 at 5:5 Comment(0)
C
5

The simple answer is that VirtIO-SCSI is slightly more complex than VirtIO-Block. Borrowing the simple description from here:

VirtIO Block has the following layers:

guest: app -> Block Layer -> virtio-blk
host: QEMU -> Block Layer -> Block Device Driver -> Hardware

Whereas VirtIO SCSI has looks like this:

guest: app -> Block Layer -> SCSI Layer -> scsi_mod
host: QEMU -> Block Layer -> SCSI Layer -> Block Device Driver -> Hardware

In essence, VirtIO SCSI has to go through another translation layer compared to VirtIO Block.

For most cases using local devices, it will as a result be slower. There are a couple of odd specific cases where the reverse is sometimes true though, namely:

  • Direct passthrough of host SCSI LUN's to the VirtIO SCSI adapter. This is marginally faster because it bypasses the block layer on the host side.
  • QEMU native access to iSCSI devices. This is sometimes faster because it avoids the host block and SCSI layers entirely, and doesn't have to translate from VirtIO Block commands to SCSI commands.

For the record though, there are three non-performance related benefits to using VirtIO SCSI over VirtIO Block:

  1. It supports far more devices. VirtIO Block exposes one PCI device per block device, which limits things to around 21-24 devices, whereas VirtIO SCSI uses only one PCI device, and can handle an absolutely astronomical number of LUN's on that device.
  2. VirtIO SCSI supports the SCSI UNMAP command (TRIM in ATA terms, DISCARD in Linux kernel terms). This is important if you're on thinly provisioned storage.
  3. VirtIO SCSI exposes devices as regular SCSI nodes, whereas VirtIO Block uses a special device major. This isn't usually very important, but can be helpful when converting from a physical system.
Crucifixion answered 3/8, 2017 at 14:50 Comment(2)
in the link, the worst-case performance looks like about 3/4 of virtio-blk, and the results also suggest it might be improvable by increasing the number of iothreads in qemu. The question shows more like 1/10th performance. So your answer seems to leave the question of why there could be such a ten-fold difference in some cases, but practically no difference in other cases.Lalo
And the OP was using a very early version of virtio-scsi. Current performance is much better.Purree
D
-2

You enabled discard unmap in your modified configure.xml:

<driver name='qemu' type='raw' cache='writeback' discard='unmap' />

This scrubs the blocks on the fly.

Dolliedolloff answered 23/10, 2016 at 0:13 Comment(2)
And that answers the question because...?Extranuclear
No, it doesn't. It certainly doesn't do any unmap until you invoke fstrim in the guest or, if you mount with the discard option, rm.Tomasz

© 2022 - 2024 — McMap. All rights reserved.