virtio-comment

 View Only
  • 1.  virtio queue numbering and optional queues

    Posted 08-21-2023 22:19
    Hello virtio folks,

    I noticed a mismatch between the way the specification defines
    device-specific virtqueue indexes and the way device and driver
    implementers have interpreted the specification. As a practical example,
    consider the traditional memory balloon device [1]. The first two queues
    (indexes 0 and 1) are available as part of the baseline device, but the
    rest of the queues are tied to feature bits.

    Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from
    queue index to queue name/function, defining queue index 3 as free_page_vq
    and index 4 as reporting_vq, and declaring that "free_page_vq only exists
    if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if
    VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I
    assume "is set" means "is negotiated" (not just "advertised by the
    device"). Also presumably "exists" means something like "may only be used
    by the driver if the feature bit is negotiated" and "should be ignored by
    the device if the feature bit is not negotiated", although it would be nice
    to have a proper definition in the spec somewhere.

    Section 5.5.3, "Feature bits", gives definitions of the feature bits, with
    similar descriptions of the relationship between the feature bits and
    virtqueue availability, although the wording is slightly different
    ("present" rather than "exists"). No dependency between feature bits is
    defined, so it seems like it should be valid for a device or driver to
    support or accept one of the higher-numbered features while not supporting
    a lower-numbered one.


    Notably, there is no mention of queue index assignments changing based on
    negotiated features in either of these sections. Hence a reader can only
    assume that the queue index assignments are fixed (i.e. stats_vq will
    always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other
    feature bits).

    Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and
    VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but
    VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of
    the defined features but the driver only wants to use reporting_vq, not
    free_page_vq). In this case, what queue index should be used by the driver
    when enabling reporting_vq? My reading of the specification is that the
    reporting_vq is always queue index 4, independent of whether
    VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are
    negotiated, but this contradicts existing device and driver
    implementations, which will use queue index 3 (the next one after stats_vq
    = 2) as reporting_vq in this case.

    The qemu virtio-ballon device [2] assigns the next-highest unused queue
    index when calling virtio_add_queue(), and in the scenario presented above,
    free_page_vq will not be added since F_STATS_VQ is not negotiated, so
    reporting_vq will be assigned queue index 3, rather than 4. (Additionally,
    qemu always adds the stats_vq regardless of negotiated features, but that's
    irrelevant in this case since we are assuming the STATS_VQ feature is
    negotiated.)

    The Linux virtio driver code originally seemed to use the correct (by my
    reading) indexes, but it was changed to match the layout used by qemu in a
    2019 commit ("virtio_pci: use queue idx instead of array idx to set up the
    vq") [3] - in other words, it will now also expect queue index 3 to be
    reporting_vq in the scenario laid out above.

    I'm not sure how to resolve the mismatch between the specification and
    actual implementation behavior. The simplest change would probably be to
    rewrite the specification to drop the explicit queue indexes in section
    5.5.2 and add some wording about how queues are numbered based on
    negotiated feature bits (this would need to be applied to other device
    types that have specified queue indexes as well). However, this would also
    technically be an incompatible change of the specification. On the other
    hand, changing the device and driver implementations to match the
    specification would be even more challenging, since it would be an
    incompatible change in actual practice, not just a change of the spec to
    match consensus implementation behavior.


    Perhaps drivers could add a quirk to detect old versions of the qemu device
    and use the old behavior, while enabling the correct behavior only for
    other device vendors and newer qemu device revisions, and the qemu device
    could add an opt-in feature to enable the correct behavior that users would
    need to enable only when they know they have a sufficiently new driver with
    the fix.


    Or maybe there could be a new feature bit that would opt into following the
    spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to
    require devices to use the old behavior when that bit is not negotiated,
    but that also feels less than ideal to me.

    Any thoughts on how to proceed with this situation? Is my reading of the
    specification just wrong?

    Thanks,

    -- Daniel

    [1]:
    https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-3160002

    [2]:
    https://github.com/qemu/qemu/blob/f33c74576425fac2cbb0725229895fe096df4261/hw/virtio/virtio-balloon.c#L879-L897

    [3]:
    https://github.com/torvalds/linux/commit/ddbeac07a39a81d82331a312d0578fab94fccbf1



  • 2.  Re: [virtio-comment] virtio queue numbering and optional queues

    Posted 08-22-2023 13:40
    On Mon, Aug 21, 2023 at 03:18:50PM -0700, Daniel Verkamp wrote:
    > Hello virtio folks,

    Hi Daniel,
    I have CCed those involved in the free page hint and page reporting
    features.

    Stefan

    >
    > I noticed a mismatch between the way the specification defines
    > device-specific virtqueue indexes and the way device and driver
    > implementers have interpreted the specification. As a practical example,
    > consider the traditional memory balloon device [1]. The first two queues
    > (indexes 0 and 1) are available as part of the baseline device, but the
    > rest of the queues are tied to feature bits.
    >
    > Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from
    > queue index to queue name/function, defining queue index 3 as free_page_vq
    > and index 4 as reporting_vq, and declaring that "free_page_vq only exists
    > if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if
    > VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I
    > assume "is set" means "is negotiated" (not just "advertised by the
    > device"). Also presumably "exists" means something like "may only be used
    > by the driver if the feature bit is negotiated" and "should be ignored by
    > the device if the feature bit is not negotiated", although it would be nice
    > to have a proper definition in the spec somewhere.
    >
    > Section 5.5.3, "Feature bits", gives definitions of the feature bits, with
    > similar descriptions of the relationship between the feature bits and
    > virtqueue availability, although the wording is slightly different
    > ("present" rather than "exists"). No dependency between feature bits is
    > defined, so it seems like it should be valid for a device or driver to
    > support or accept one of the higher-numbered features while not supporting
    > a lower-numbered one.
    >
    >
    > Notably, there is no mention of queue index assignments changing based on
    > negotiated features in either of these sections. Hence a reader can only
    > assume that the queue index assignments are fixed (i.e. stats_vq will
    > always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other
    > feature bits).
    >
    > Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and
    > VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but
    > VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of
    > the defined features but the driver only wants to use reporting_vq, not
    > free_page_vq). In this case, what queue index should be used by the driver
    > when enabling reporting_vq? My reading of the specification is that the
    > reporting_vq is always queue index 4, independent of whether
    > VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are
    > negotiated, but this contradicts existing device and driver
    > implementations, which will use queue index 3 (the next one after stats_vq
    > = 2) as reporting_vq in this case.
    >
    > The qemu virtio-ballon device [2] assigns the next-highest unused queue
    > index when calling virtio_add_queue(), and in the scenario presented above,
    > free_page_vq will not be added since F_STATS_VQ is not negotiated, so
    > reporting_vq will be assigned queue index 3, rather than 4. (Additionally,
    > qemu always adds the stats_vq regardless of negotiated features, but that's
    > irrelevant in this case since we are assuming the STATS_VQ feature is
    > negotiated.)
    >
    > The Linux virtio driver code originally seemed to use the correct (by my
    > reading) indexes, but it was changed to match the layout used by qemu in a
    > 2019 commit ("virtio_pci: use queue idx instead of array idx to set up the
    > vq") [3] - in other words, it will now also expect queue index 3 to be
    > reporting_vq in the scenario laid out above.
    >
    > I'm not sure how to resolve the mismatch between the specification and
    > actual implementation behavior. The simplest change would probably be to
    > rewrite the specification to drop the explicit queue indexes in section
    > 5.5.2 and add some wording about how queues are numbered based on
    > negotiated feature bits (this would need to be applied to other device
    > types that have specified queue indexes as well). However, this would also
    > technically be an incompatible change of the specification. On the other
    > hand, changing the device and driver implementations to match the
    > specification would be even more challenging, since it would be an
    > incompatible change in actual practice, not just a change of the spec to
    > match consensus implementation behavior.
    >
    >
    > Perhaps drivers could add a quirk to detect old versions of the qemu device
    > and use the old behavior, while enabling the correct behavior only for
    > other device vendors and newer qemu device revisions, and the qemu device
    > could add an opt-in feature to enable the correct behavior that users would
    > need to enable only when they know they have a sufficiently new driver with
    > the fix.
    >
    >
    > Or maybe there could be a new feature bit that would opt into following the
    > spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to
    > require devices to use the old behavior when that bit is not negotiated,
    > but that also feels less than ideal to me.
    >
    > Any thoughts on how to proceed with this situation? Is my reading of the
    > specification just wrong?
    >
    > Thanks,
    >
    > -- Daniel
    >
    > [1]:
    > https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-3160002
    >
    > [2]:
    > https://github.com/qemu/qemu/blob/f33c74576425fac2cbb0725229895fe096df4261/hw/virtio/virtio-balloon.c#L879-L897
    >
    > [3]:
    > https://github.com/torvalds/linux/commit/ddbeac07a39a81d82331a312d0578fab94fccbf1



  • 3.  Re: [virtio-comment] virtio queue numbering and optional queues

    Posted 08-22-2023 14:29
    On 22.08.23 15:40, Stefan Hajnoczi wrote:
    > On Mon, Aug 21, 2023 at 03:18:50PM -0700, Daniel Verkamp wrote:
    >> Hello virtio folks,
    >
    > Hi Daniel,
    > I have CCed those involved in the free page hint and page reporting
    > features.
    >
    > Stefan
    >
    >>
    >> I noticed a mismatch between the way the specification defines
    >> device-specific virtqueue indexes and the way device and driver
    >> implementers have interpreted the specification. As a practical example,
    >> consider the traditional memory balloon device [1]. The first two queues
    >> (indexes 0 and 1) are available as part of the baseline device, but the
    >> rest of the queues are tied to feature bits.
    >>
    >> Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from
    >> queue index to queue name/function, defining queue index 3 as free_page_vq
    >> and index 4 as reporting_vq, and declaring that "free_page_vq only exists
    >> if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if
    >> VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I
    >> assume "is set" means "is negotiated" (not just "advertised by the
    >> device").

    Staring at QEMU: the queues are added when the virtio-balloon device is
    *created*. Queues are added based on feature configuration, if they are
    part of the device feature set.

    That should translate to "is advertised", not "is negotiated".

    The queue ordering is as follows:

    * inflate queue, baseline device
    * deflate queue, baseline device
    * driver memory statistics, VIRTIO_BALLOON_F_STATS_VQ
    * free page hinting, VIRTIO_BALLOON_F_FREE_PAGE_HINT
    * free page reporting, VIRTIO_BALLOON_F_REPORTING

    QEMU always supports the first 3, so they use number 0-2. The other two
    can be configured for the device.

    So the queue indices vary based on actual feature presence.

    >> Also presumably "exists" means something like "may only be used
    >> by the driver if the feature bit is negotiated" and "should be ignored by
    >> the device if the feature bit is not negotiated", although it would be nice
    >> to have a proper definition in the spec somewhere.
    >>
    >> Section 5.5.3, "Feature bits", gives definitions of the feature bits, with
    >> similar descriptions of the relationship between the feature bits and
    >> virtqueue availability, although the wording is slightly different
    >> ("present" rather than "exists"). No dependency between feature bits is
    >> defined, so it seems like it should be valid for a device or driver to
    >> support or accept one of the higher-numbered features while not supporting
    >> a lower-numbered one.

    Yes, that's my understanding.

    >>
    >>
    >> Notably, there is no mention of queue index assignments changing based on
    >> negotiated features in either of these sections. Hence a reader can only
    >> assume that the queue index assignments are fixed (i.e. stats_vq will
    >> always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other
    >> feature bits).

    And that does not seem to be the case. At least QEMU assigns them sequentially,
    based on actually configured features for the device.

    If I read the kernel code correctly (drivers/virtio/virtio_balloon.c:init_vqs)
    it also behaves that way: if the device has a certain feature
    "virtio_has_feature", it gets the next index. Otherwise, the next index goes
    to another feature:

    /*
    * Inflateq and deflateq are used unconditionally. The names[]
    * will be NULL if the related feature is not enabled, which will
    * cause no allocation for the corresponding virtqueue in find_vqs.
    */
    callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
    names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
    callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
    names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
    callbacks[VIRTIO_BALLOON_VQ_STATS] = NULL;
    names[VIRTIO_BALLOON_VQ_STATS] = NULL;
    callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
    names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
    names[VIRTIO_BALLOON_VQ_REPORTING] = NULL;


    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
    names[VIRTIO_BALLOON_VQ_STATS] = "stats";
    callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
    }

    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
    names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
    callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
    }

    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) {
    names[VIRTIO_BALLOON_VQ_REPORTING] = "reporting_vq";
    callbacks[VIRTIO_BALLOON_VQ_REPORTING] = balloon_ack;
    }

    err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
    callbacks, names, NULL);
    if (err)
    return err;

    >>
    >> Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and
    >> VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but
    >> VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of
    >> the defined features but the driver only wants to use reporting_vq, not
    >> free_page_vq). In this case, what queue index should be used by the driver
    >> when enabling reporting_vq? My reading of the specification is that the
    >> reporting_vq is always queue index 4, independent of whether
    >> VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are
    >> negotiated, but this contradicts existing device and driver
    >> implementations, which will use queue index 3 (the next one after stats_vq
    >> = 2) as reporting_vq in this case.

    Then the specification really needs updating :)

    >>
    >> The qemu virtio-ballon device [2] assigns the next-highest unused queue
    >> index when calling virtio_add_queue(), and in the scenario presented above,
    >> free_page_vq will not be added since F_STATS_VQ is not negotiated, so
    >> reporting_vq will be assigned queue index 3, rather than 4. (Additionally,
    >> qemu always adds the stats_vq regardless of negotiated features, but that's
    >> irrelevant in this case since we are assuming the STATS_VQ feature is
    >> negotiated.)
    >>
    >> The Linux virtio driver code originally seemed to use the correct (by my
    >> reading) indexes, but it was changed to match the layout used by qemu in a
    >> 2019 commit ("virtio_pci: use queue idx instead of array idx to set up the
    >> vq") [3] - in other words, it will now also expect queue index 3 to be
    >> reporting_vq in the scenario laid out above.

    Note that at the time of this commit, there was no support for "free page reporting".

    callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
    names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
    callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
    names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
    names[VIRTIO_BALLOON_VQ_STATS] = NULL;
    names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;

    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
    names[VIRTIO_BALLOON_VQ_STATS] = "stats";
    callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
    }

    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
    names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
    callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
    }

    And as QEMU always sets VIRTIO_BALLOON_F_STATS_VQ, that one always gets id=2.

    Consequently, VIRTIO_BALLOON_F_FREE_PAGE_HINT, if around, gets id=3.

    As we didn't support VIRTIO_BALLOON_F_REPORTING, it doesn't matter which id it gets.

    But as you note, once we have different implementations and more feature variability,
    it's a mess.

    A device that implements VIRTIO_BALLOON_F_FREE_PAGE_HINT but not VIRTIO_BALLOON_F_STATS_VQ
    might not work correctly with either old or new QEMU.

    Maybe it needs to be documented that any device that implements either
    VIRTIO_BALLOON_F_FREE_PAGE_HINT or VIRTIO_BALLOON_F_REPORTING *must* also implement
    VIRTIO_BALLOON_F_STATS_VQ, so old+new Linux drivers would continue working.

    >>
    >> I'm not sure how to resolve the mismatch between the specification and
    >> actual implementation behavior. The simplest change would probably be to
    >> rewrite the specification to drop the explicit queue indexes in section
    >> 5.5.2 and add some wording about how queues are numbered based on
    >> negotiated feature bits (this would need to be applied to other device

    Yes.

    >> types that have specified queue indexes as well). However, this would also
    >> technically be an incompatible change of the specification. On the other
    >> hand, changing the device and driver implementations to match the
    >> specification would be even more challenging, since it would be an
    >> incompatible change in actual practice, not just a change of the spec to
    >> match consensus implementation behavior.

    Changing drivers/devices is pretty much impossible.

    So we should document the queue assignment better, and maybe the implication
    of requiring VIRTIO_BALLOON_F_STATS_VQ when any new features are implemented.

    Does that make sense?

    >>
    >>
    >> Perhaps drivers could add a quirk to detect old versions of the qemu device
    >> and use the old behavior, while enabling the correct behavior only for
    >> other device vendors and newer qemu device revisions, and the qemu device
    >> could add an opt-in feature to enable the correct behavior that users would
    >> need to enable only when they know they have a sufficiently new driver with
    >> the fix.
    >>
    >>
    >> Or maybe there could be a new feature bit that would opt into following the
    >> spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to
    >> require devices to use the old behavior when that bit is not negotiated,
    >> but that also feels less than ideal to me.
    >>
    >> Any thoughts on how to proceed with this situation? Is my reading of the
    >> specification just wrong?

    I think you raised an important point. We should try documenting reality in
    the specification.

    --
    Cheers,

    David / dhildenb