OASIS Virtual I/O Device (VIRTIO) TC

Expand all | Collapse all

[PATCH] message framing: make ANY_LAYOUT implicit

  • 1.  [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-12-2013 16:43
    This resolves issue VIRTIO-10

    This also creates a new section for legacy feature
    bits which will be handy for VIRTIO-13 if that is
    accepted.

    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    ---
    virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++----------
    1 file changed, 26 insertions(+), 12 deletions(-)

    diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    index 2679b5b..bb75e35 100644
    --- a/virtio-v1.0-wd01-part1-specification.txt
    +++ b/virtio-v1.0-wd01-part1-specification.txt
    @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian.

    2.1.4.2. Message Framing
    -----------------------
    -The original intent of the specification was that message framing (the
    -particular layout of descriptors) be independent of the contents of
    +Generally, the intent of the specification is for message framing (the
    +particular layout of descriptors) to be independent of the contents of
    the buffers. For example, a network transmit buffer consists of a 12
    byte header followed by the network packet. This could be most simply
    placed in the descriptor table as a 12 byte output descriptor followed
    @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and
    packet are adjacent, or even three or more descriptors (possibly with
    loss of efficiency in that case).

    -Regrettably, initial driver implementations used simple layouts, and
    -devices came to rely on it, despite this specification wording[10]. It
    -is thus recommended that drivers be conservative in their assumptions,
    -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    +In addition, some
    implementations may have large-but-reasonable restrictions on total
    descriptor size (such as based on IOV_MAX in the host OS). This has
    not been a problem in practice: little sympathy will be given to
    drivers which create unreasonably-sized descriptors such as by
    dividing a network packet into 1500 single-byte descriptors!

    +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing
    +-----------------------
    +Regrettably, initial driver implementations used simple layouts, and
    +devices came to rely on it, despite this specification wording[10]. It
    +is thus recommended that when using legacy interfaces,
    +drivers should be conservative in their assumptions,
    +unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
    +
    2.1.4.3. The Virtqueue Descriptor Table
    --------------------------------------

    @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined:
    using a timer if the device interrupts it when all the packets
    are transmitted.

    - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    -
    VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    flag set, as described in "2.1.4.3.1. Indirect Descriptors".
    @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined:
    compliant with this specification, and acknowledged by all device
    drivers.

    -In addition, bit 30 is used by qemu's implementation to check for experimental
    -early versions of virtio which did not perform correct feature negotiation,
    -and should not be used.
    +2.5.1 Legacy Interface: A Note on Reserved Feature Bits
    +-------------------------
    +
    +When used through the legacy interface, transitional
    +devices should advertise and recognize the following
    +feature bits:
    +
    +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    +descriptor layouts, as described in Section
    +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing".
    +
    +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices
    +to check for experimental early versions of virtio
    +drivers which did not perform correct feature negotiation,
    +and should not be supported or activated by drivers.

    2.6. virtio_ring.h
    =================
    --
    MST



  • 2.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-16-2013 07:43
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > This resolves issue VIRTIO-10
    >
    > This also creates a new section for legacy feature
    > bits which will be handy for VIRTIO-13 if that is
    > accepted.
    >
    > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

    This is very similar to the patch I came up with: I'm too slow!

    Pasted below, in particular the 2.4.2.5.1 header needs to move up a few
    lines, too.

    Cheers,
    Rusty.

    > ---
    > virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++----------
    > 1 file changed, 26 insertions(+), 12 deletions(-)
    >
    > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    > index 2679b5b..bb75e35 100644
    > --- a/virtio-v1.0-wd01-part1-specification.txt
    > +++ b/virtio-v1.0-wd01-part1-specification.txt
    > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian.
    >
    > 2.1.4.2. Message Framing
    > -----------------------
    > -The original intent of the specification was that message framing (the
    > -particular layout of descriptors) be independent of the contents of
    > +Generally, the intent of the specification is for message framing (the
    > +particular layout of descriptors) to be independent of the contents of
    > the buffers. For example, a network transmit buffer consists of a 12
    > byte header followed by the network packet. This could be most simply
    > placed in the descriptor table as a 12 byte output descriptor followed
    > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and
    > packet are adjacent, or even three or more descriptors (possibly with
    > loss of efficiency in that case).
    >
    > -Regrettably, initial driver implementations used simple layouts, and
    > -devices came to rely on it, despite this specification wording[10]. It
    > -is thus recommended that drivers be conservative in their assumptions,
    > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    > +In addition, some
    > implementations may have large-but-reasonable restrictions on total
    > descriptor size (such as based on IOV_MAX in the host OS). This has
    > not been a problem in practice: little sympathy will be given to
    > drivers which create unreasonably-sized descriptors such as by
    > dividing a network packet into 1500 single-byte descriptors!
    >
    > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing
    > +-----------------------
    > +Regrettably, initial driver implementations used simple layouts, and
    > +devices came to rely on it, despite this specification wording[10]. It
    > +is thus recommended that when using legacy interfaces,
    > +drivers should be conservative in their assumptions,
    > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
    > +
    > 2.1.4.3. The Virtqueue Descriptor Table
    > --------------------------------------
    >
    > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined:
    > using a timer if the device interrupts it when all the packets
    > are transmitted.
    >
    > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    > - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    > -
    > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    > that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    > flag set, as described in "2.1.4.3.1. Indirect Descriptors".
    > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined:
    > compliant with this specification, and acknowledged by all device
    > drivers.
    >
    > -In addition, bit 30 is used by qemu's implementation to check for experimental
    > -early versions of virtio which did not perform correct feature negotiation,
    > -and should not be used.
    > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits
    > +-------------------------
    > +
    > +When used through the legacy interface, transitional
    > +devices should advertise and recognize the following
    > +feature bits:
    > +
    > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    > +descriptor layouts, as described in Section
    > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing".
    > +
    > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices
    > +to check for experimental early versions of virtio
    > +drivers which did not perform correct feature negotiation,
    > +and should not be supported or activated by drivers.
    >
    > 2.6. virtio_ring.h
    > =================
    > --
    > MST

    commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e
    Author: Rusty Russell <rusty@au1.ibm.com>
    Date: Mon Sep 9 17:40:32 2013 +0930

    Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT)

    As per issue virtio-10.

    Signed-off-by: Rusty Russell <rusty@au1.ibm.com>

    diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    index 1e424d0..ceac80a 100644
    --- a/virtio-v1.0-wd01-part1-specification.txt
    +++ b/virtio-v1.0-wd01-part1-specification.txt
    @@ -268,25 +268,33 @@ is not a significant issue.

    2.1.4.2. Message Framing
    -----------------------
    -The original intent of the specification was that message framing (the
    -particular layout of descriptors) be independent of the contents of
    -the buffers. For example, a network transmit buffer consists of a 12
    -byte header followed by the network packet. This could be most simply
    -placed in the descriptor table as a 12 byte output descriptor followed
    -by a 1514 byte output descriptor, but it could also consist of a
    -single 1526 byte output descriptor in the case where the header and
    -packet are adjacent, or even three or more descriptors (possibly with
    -loss of efficiency in that case).
    +The message framing (the particular layout of descriptors) is
    +independent of the contents of the buffers. For example, a network
    +transmit buffer consists of a 12 byte header followed by the network
    +packet. This could be most simply placed in the descriptor table as a
    +12 byte output descriptor followed by a 1514 byte output descriptor,
    +but it could also consist of a single 1526 byte output descriptor in
    +the case where the header and packet are adjacent, or even three or
    +more descriptors (possibly with loss of efficiency in that case).
    +
    +Note that, some implementations may have large-but-reasonable
    +restrictions on total descriptor size (such as based on IOV_MAX in the
    +host OS). This has not been a problem in practice: little sympathy
    +will be given to drivers which create unreasonably-sized descriptors
    +such as by dividing a network packet into 1500 single-byte
    +descriptors!
    +
    +2.1.4.2.1. Legacy Interface: Message Framing
    +-----------------------

    Regrettably, initial driver implementations used simple layouts, and
    -devices came to rely on it, despite this specification wording[10]. It
    -is thus recommended that drivers be conservative in their assumptions,
    -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    -implementations may have large-but-reasonable restrictions on total
    -descriptor size (such as based on IOV_MAX in the host OS). This has
    -not been a problem in practice: little sympathy will be given to
    -drivers which create unreasonably-sized descriptors such as by
    -dividing a network packet into 1500 single-byte descriptors!
    +devices came to rely on it, despite this specification wording. In
    +addition, the specification for virtio_blk SCSI commands required
    +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
    +Interface: Device Operation")
    +
    +It is thus recommended that drivers be conservative in their
    +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted.

    2.1.4.3. The Virtqueue Descriptor Table
    --------------------------------------
    @@ -1193,7 +1201,7 @@ features.
    VIRTIO_NET_F_MAC (5) Device has given MAC address.

    VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
    - any GSO type.[12]
    + any GSO type.[11]

    VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.

    @@ -1267,16 +1275,16 @@ features.
    packets by negotating the VIRTIO_NET_F_CSUM feature. This “
    checksum offload” is a common feature on modern network cards.

    -7. If that feature is negotiated[13], a driver can use TCP or UDP
    +7. If that feature is negotiated[12], a driver can use TCP or UDP
    segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
    TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
    (UDP fragmentation) features. It should not send TCP packets
    requiring segmentation offload which have the Explicit Congestion
    Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
    - negotiated.[14]
    + negotiated.[13]

    8. The converse features are also available: a driver can save
    - the virtual device some work by negotiating these features.[15]
    + the virtual device some work by negotiating these features.[14]
    The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
    checksummed packets can be received, and if it can do that then
    the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
    @@ -1328,7 +1336,7 @@ the different features the driver negotiated.
    and

    • csum_offset indicates how many bytes after the csum_start the
    - new (16 bit ones' complement) checksum should be placed.[16]
    + new (16 bit ones' complement) checksum should be placed.[15]

    2. If the driver negotiated
    VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
    @@ -1341,21 +1349,21 @@ the different features the driver negotiated.

    • hdr_len is a hint to the device as to how much of the header
    needs to be kept to copy into each packet, usually set to the
    - length of the headers, including the transport header.[17]
    + length of the headers, including the transport header.[16]

    • gso_size is the maximum size of each packet beyond that
    header (ie. MSS).

    • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
    the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
    - well, indicating that the TCP packet has the ECN bit set.[18]
    + well, indicating that the TCP packet has the ECN bit set.[17]

    3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
    the num_buffers field is set to zero.

    4. The header and packet are added as one output buffer to the
    transmitq, and the device is notified of the new entry (see "2.4.1.4.
    - Notifying The Device").[19]
    + Notifying The Device").[18]

    2.4.1.5.1.1. Packet Transmission Interrupt
    -----------------------------------------
    @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
    accept packets of up to 65550 bytes long (the maximum size of a
    TCP or UDP packet, plus the 14 byte ethernet header), otherwise
    1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
    -buffer in the receive queue needs to be at least this length [20]
    +buffer in the receive queue needs to be at least this length [19]

    If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
    least the size of the struct virtio_net_hdr.
    @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or
    #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0

    The device can filter incoming packets by any number of destination
    -MAC addresses.[21] This table is set using the class
    +MAC addresses.[20] This table is set using the class
    VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
    command-specific-data is two variable length tables of 6-byte MAC
    addresses. The first table contains unicast addresses, and the second
    @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form:

    The type of the request is either a read (VIRTIO_BLK_T_IN), a write
    (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or
    -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER
    +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER
    feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
    request acts as a barrier and that all preceeding requests must be
    complete before this one, and all following requests must not be
    @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
    #define VIRTIO_BLK_S_IOERR 1
    #define VIRTIO_BLK_S_UNSUPP 2

    +2.4.2.5.1 Legacy Interface: Device Operation
    +------------------------
    Historically, devices assumed that the fields type, ioprio and sector
    reside in a single, separate read-only buffer and the status field is
    a separate read-only buffer of size 1 byte, by itself.

    -2.4.2.5.1 Legacy Interface: Device Operation
    -------------------------
    If the device has VIRTIO_BLK_F_SCSI feature, it can also support
    scsi packet command requests, each of these requests is of form:

    @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue.
    ------------------------

    1. For output, a buffer containing the characters is placed in
    - the port's transmitq.[23]
    + the port's transmitq.[22]

    2. When a buffer is used in the receiveq (signalled by an
    interrupt), the contents is the input to the port associated
    @@ -1970,7 +1978,7 @@ configuration change interrupt.
    2. To supply memory to the balloon (aka. inflate):

    (a) The driver constructs an array of addresses of unused memory
    - pages. These addresses are divided by 4096[24] and the descriptor
    + pages. These addresses are divided by 4096[23] and the descriptor
    describing the resulting 32-bit array is added to the inflateq.

    3. To remove memory from the balloon (aka. deflate):
    @@ -1985,11 +1993,11 @@ configuration change interrupt.

    (c) Otherwise, the guest may begin to re-use pages previously
    given to the balloon before the device has acknowledged their
    - withdrawl. [25]
    + withdrawl. [24]

    4. In either case, once the device has completed the inflation or
    deflation, the “actual” field of the configuration should be
    - updated to reflect the new number of pages in the balloon.[26]
    + updated to reflect the new number of pages in the balloon.[25]

    2.4.5.6.1. Memory Statistics
    ---------------------------
    @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined:
    2.5. Reserved Feature Bits
    =========================

    -Currently there are four device-independent feature bits defined:
    -
    - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    +Currently there are three device-independent feature bits defined:

    VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
    using a timer if the device interrupts it when all the packets
    are transmitted.

    +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
    + accepts arbitrary descriptor layouts, as described in Section
    + "2.1.4.2.1. Legacy Interface: Message Framing".
    +
    2.6. virtio_ring.h
    =================

    @@ -2851,7 +2860,7 @@ altogether.

    Any change to configuration space, or new virtqueues, or
    behavioural changes, should be indicated by negotiation of a new
    -feature bit. This establishes clarity[11] and avoids future expansion problems.
    +feature bit. This establishes clarity[10] and avoids future expansion problems.

    Clusters of functionality which are always implemented together
    can use a single bit, but if one feature makes sense without the
    @@ -2895,34 +2904,28 @@ of this expected condition is necessary.

    [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization

    -[10] It was previously asserted that framing should be independent of message
    -contents, yet invariably drivers layed out messages in reliable ways and
    -devices assumed it.
    -In addition, the specifications for virtio_blk and virtio_scsi require
    -intuiting field lengths from frame boundaries.
    -
    -[11] Even if it does mean documenting design or implementation
    +[10] Even if it does mean documenting design or implementation
    mistakes!


    -[12] It was supposed to indicate segmentation offload support, but
    +[11] It was supposed to indicate segmentation offload support, but
    upon further investigation it became clear that multiple bits
    were required.

    -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
    +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
    dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
    features must offer the checksum feature, and a driver which
    accepts the offload features must accept the checksum feature.
    Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
    depending on VIRTIO_NET_F_GUEST_CSUM.

    -[14] This is a common restriction in real, older network cards.
    +[13] This is a common restriction in real, older network cards.

    -[15] For example, a network packet transported between two guests on
    +[14] For example, a network packet transported between two guests on
    the same system may not require checksumming at all, nor segmentation,
    if both guests are amenable.

    -[16] For example, consider a partially checksummed TCP (IPv4) packet.
    +[15] For example, consider a partially checksummed TCP (IPv4) packet.
    It will have a 14 byte ethernet header and 20 byte IP header
    followed by the TCP header (with the TCP checksum field 16 bytes
    into that header). csum_start will be 14+20 = 34 (the TCP
    @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones'
    complement checksum of the TCP header and body will give the
    correct result.

    -[17] Due to various bugs in implementations, this field is not useful
    +[16] Due to various bugs in implementations, this field is not useful
    as a guarantee of the transport header size.

    -[18] This case is not handled by some older hardware, so is called out
    +[17] This case is not handled by some older hardware, so is called out
    specifically in the protocol.

    -[19] Note that the header will be two bytes longer for the
    +[18] Note that the header will be two bytes longer for the
    VIRTIO_NET_F_MRG_RXBUF case.

    -[20] Obviously each one can be split across multiple descriptor
    +[19] Obviously each one can be split across multiple descriptor
    elements.

    -[21] Since there are no guarentees, it can use a hash filter or
    +[20] Since there are no guarentees, it can use a hash filter or
    silently switch to allmulti or promiscuous mode if it is given too
    many addresses.

    -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not
    +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not
    distinguish between them

    -[23] Because this is high importance and low bandwidth, the current
    +[22] Because this is high importance and low bandwidth, the current
    Linux implementation polls for the buffer to be used, rather than
    waiting for an interrupt, simplifying the implementation
    significantly. However, for generic serial ports with the
    @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the
    consumed buffers are freed upon the next write or poll call or
    when a port is closed or hot-unplugged.

    -[24] This is historical, and independent of the guest page size
    +[23] This is historical, and independent of the guest page size

    -[25] In this case, deflation advice is merely a courtesy
    +[24] In this case, deflation advice is merely a courtesy

    -[26] As updates to configuration space are not atomic, this field
    +[25] As updates to configuration space are not atomic, this field
    isn't particularly reliable, but can be used to diagnose buggy guests.




  • 3.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-16-2013 07:50
    "Michael S. Tsirkin" <mst@redhat.com> writes: > This resolves issue VIRTIO-10 > > This also creates a new section for legacy feature > bits which will be handy for VIRTIO-13 if that is > accepted. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> This is very similar to the patch I came up with: I'm too slow! Pasted below, in particular the 2.4.2.5.1 header needs to move up a few lines, too. Cheers, Rusty. > --- > virtio-v1.0-wd01-part1-specification.txt 38 ++++++++++++++++++++++---------- > 1 file changed, 26 insertions(+), 12 deletions(-) > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > index 2679b5b..bb75e35 100644 > --- a/virtio-v1.0-wd01-part1-specification.txt > +++ b/virtio-v1.0-wd01-part1-specification.txt > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian. > > 2.1.4.2. Message Framing > ----------------------- > -The original intent of the specification was that message framing (the > -particular layout of descriptors) be independent of the contents of > +Generally, the intent of the specification is for message framing (the > +particular layout of descriptors) to be independent of the contents of > the buffers. For example, a network transmit buffer consists of a 12 > byte header followed by the network packet. This could be most simply > placed in the descriptor table as a 12 byte output descriptor followed > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and > packet are adjacent, or even three or more descriptors (possibly with > loss of efficiency in that case). > > -Regrettably, initial driver implementations used simple layouts, and > -devices came to rely on it, despite this specification wording[10]. It > -is thus recommended that drivers be conservative in their assumptions, > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some > +In addition, some > implementations may have large-but-reasonable restrictions on total > descriptor size (such as based on IOV_MAX in the host OS). This has > not been a problem in practice: little sympathy will be given to > drivers which create unreasonably-sized descriptors such as by > dividing a network packet into 1500 single-byte descriptors! > > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing > +----------------------- > +Regrettably, initial driver implementations used simple layouts, and > +devices came to rely on it, despite this specification wording[10]. It > +is thus recommended that when using legacy interfaces, > +drivers should be conservative in their assumptions, > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted. > + > 2.1.4.3. The Virtqueue Descriptor Table > -------------------------------------- > > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined: > using a timer if the device interrupts it when all the packets > are transmitted. > > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > - descriptor layouts, as described in Section "2.1.4.2. Message Framing". > - > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates > that the driver can use descriptors with the VRING_DESC_F_INDIRECT > flag set, as described in "2.1.4.3.1. Indirect Descriptors". > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined: > compliant with this specification, and acknowledged by all device > drivers. > > -In addition, bit 30 is used by qemu's implementation to check for experimental > -early versions of virtio which did not perform correct feature negotiation, > -and should not be used. > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits > +------------------------- > + > +When used through the legacy interface, transitional > +devices should advertise and recognize the following > +feature bits: > + > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > +descriptor layouts, as described in Section > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing". > + > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices > +to check for experimental early versions of virtio > +drivers which did not perform correct feature negotiation, > +and should not be supported or activated by drivers. > > 2.6. virtio_ring.h > ================= > -- > MST commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e Author: Rusty Russell <rusty@au1.ibm.com> Date: Mon Sep 9 17:40:32 2013 +0930 Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT) As per issue virtio-10. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt index 1e424d0..ceac80a 100644 --- a/virtio-v1.0-wd01-part1-specification.txt +++ b/virtio-v1.0-wd01-part1-specification.txt @@ -268,25 +268,33 @@ is not a significant issue. 2.1.4.2. Message Framing ----------------------- -The original intent of the specification was that message framing (the -particular layout of descriptors) be independent of the contents of -the buffers. For example, a network transmit buffer consists of a 12 -byte header followed by the network packet. This could be most simply -placed in the descriptor table as a 12 byte output descriptor followed -by a 1514 byte output descriptor, but it could also consist of a -single 1526 byte output descriptor in the case where the header and -packet are adjacent, or even three or more descriptors (possibly with -loss of efficiency in that case). +The message framing (the particular layout of descriptors) is +independent of the contents of the buffers. For example, a network +transmit buffer consists of a 12 byte header followed by the network +packet. This could be most simply placed in the descriptor table as a +12 byte output descriptor followed by a 1514 byte output descriptor, +but it could also consist of a single 1526 byte output descriptor in +the case where the header and packet are adjacent, or even three or +more descriptors (possibly with loss of efficiency in that case). + +Note that, some implementations may have large-but-reasonable +restrictions on total descriptor size (such as based on IOV_MAX in the +host OS). This has not been a problem in practice: little sympathy +will be given to drivers which create unreasonably-sized descriptors +such as by dividing a network packet into 1500 single-byte +descriptors! + +2.1.4.2.1. Legacy Interface: Message Framing +----------------------- Regrettably, initial driver implementations used simple layouts, and -devices came to rely on it, despite this specification wording[10]. It -is thus recommended that drivers be conservative in their assumptions, -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some -implementations may have large-but-reasonable restrictions on total -descriptor size (such as based on IOV_MAX in the host OS). This has -not been a problem in practice: little sympathy will be given to -drivers which create unreasonably-sized descriptors such as by -dividing a network packet into 1500 single-byte descriptors! +devices came to rely on it, despite this specification wording. In +addition, the specification for virtio_blk SCSI commands required +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy +Interface: Device Operation") + +It is thus recommended that drivers be conservative in their +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted. 2.1.4.3. The Virtqueue Descriptor Table -------------------------------------- @@ -1193,7 +1201,7 @@ features. VIRTIO_NET_F_MAC (5) Device has given MAC address. VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with - any GSO type.[12] + any GSO type.[11] VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4. @@ -1267,16 +1275,16 @@ features. packets by negotating the VIRTIO_NET_F_CSUM feature. This “ checksum offload” is a common feature on modern network cards. -7. If that feature is negotiated[13], a driver can use TCP or UDP +7. If that feature is negotiated[12], a driver can use TCP or UDP segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO (UDP fragmentation) features. It should not send TCP packets requiring segmentation offload which have the Explicit Congestion Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is - negotiated.[14] + negotiated.[13] 8. The converse features are also available: a driver can save - the virtual device some work by negotiating these features.[15] + the virtual device some work by negotiating these features.[14] The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially checksummed packets can be received, and if it can do that then the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, @@ -1328,7 +1336,7 @@ the different features the driver negotiated. and • csum_offset indicates how many bytes after the csum_start the - new (16 bit ones' complement) checksum should be placed.[16] + new (16 bit ones' complement) checksum should be placed.[15] 2. If the driver negotiated VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires @@ -1341,21 +1349,21 @@ the different features the driver negotiated. • hdr_len is a hint to the device as to how much of the header needs to be kept to copy into each packet, usually set to the - length of the headers, including the transport header.[17] + length of the headers, including the transport header.[16] • gso_size is the maximum size of each packet beyond that header (ie. MSS). • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as - well, indicating that the TCP packet has the ECN bit set.[18] + well, indicating that the TCP packet has the ECN bit set.[17] 3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature, the num_buffers field is set to zero. 4. The header and packet are added as one output buffer to the transmitq, and the device is notified of the new entry (see "2.4.1.4. - Notifying The Device").[19] + Notifying The Device").[18] 2.4.1.5.1.1. Packet Transmission Interrupt ----------------------------------------- @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to accept packets of up to 65550 bytes long (the maximum size of a TCP or UDP packet, plus the 14 byte ethernet header), otherwise 1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every -buffer in the receive queue needs to be at least this length [20] +buffer in the receive queue needs to be at least this length [19] If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the size of the struct virtio_net_hdr. @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0 The device can filter incoming packets by any number of destination -MAC addresses.[21] This table is set using the class +MAC addresses.[20] This table is set using the class VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The command-specific-data is two variable length tables of 6-byte MAC addresses. The first table contains unicast addresses, and the second @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form: The type of the request is either a read (VIRTIO_BLK_T_IN), a write (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this request acts as a barrier and that all preceeding requests must be complete before this one, and all following requests must not be @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host: #define VIRTIO_BLK_S_IOERR 1 #define VIRTIO_BLK_S_UNSUPP 2 +2.4.2.5.1 Legacy Interface: Device Operation +------------------------ Historically, devices assumed that the fields type, ioprio and sector reside in a single, separate read-only buffer and the status field is a separate read-only buffer of size 1 byte, by itself. -2.4.2.5.1 Legacy Interface: Device Operation ------------------------- If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet command requests, each of these requests is of form: @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue. ------------------------ 1. For output, a buffer containing the characters is placed in - the port's transmitq.[23] + the port's transmitq.[22] 2. When a buffer is used in the receiveq (signalled by an interrupt), the contents is the input to the port associated @@ -1970,7 +1978,7 @@ configuration change interrupt. 2. To supply memory to the balloon (aka. inflate): (a) The driver constructs an array of addresses of unused memory - pages. These addresses are divided by 4096[24] and the descriptor + pages. These addresses are divided by 4096[23] and the descriptor describing the resulting 32-bit array is added to the inflateq. 3. To remove memory from the balloon (aka. deflate): @@ -1985,11 +1993,11 @@ configuration change interrupt. (c) Otherwise, the guest may begin to re-use pages previously given to the balloon before the device has acknowledged their - withdrawl. [25] + withdrawl. [24] 4. In either case, once the device has completed the inflation or deflation, the “actual” field of the configuration should be - updated to reflect the new number of pages in the balloon.[26] + updated to reflect the new number of pages in the balloon.[25] 2.4.5.6.1. Memory Statistics --------------------------- @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined: 2.5. Reserved Feature Bits ========================= -Currently there are four device-independent feature bits defined: - - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary - descriptor layouts, as described in Section "2.1.4.2. Message Framing". +Currently there are three device-independent feature bits defined: VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates that the driver can use descriptors with the VRING_DESC_F_INDIRECT @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature using a timer if the device interrupts it when all the packets are transmitted. +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device + accepts arbitrary descriptor layouts, as described in Section + "2.1.4.2.1. Legacy Interface: Message Framing". + 2.6. virtio_ring.h ================= @@ -2851,7 +2860,7 @@ altogether. Any change to configuration space, or new virtqueues, or behavioural changes, should be indicated by negotiation of a new -feature bit. This establishes clarity[11] and avoids future expansion problems. +feature bit. This establishes clarity[10] and avoids future expansion problems. Clusters of functionality which are always implemented together can use a single bit, but if one feature makes sense without the @@ -2895,34 +2904,28 @@ of this expected condition is necessary. [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization -[10] It was previously asserted that framing should be independent of message -contents, yet invariably drivers layed out messages in reliable ways and -devices assumed it. -In addition, the specifications for virtio_blk and virtio_scsi require -intuiting field lengths from frame boundaries. - -[11] Even if it does mean documenting design or implementation +[10] Even if it does mean documenting design or implementation mistakes! -[12] It was supposed to indicate segmentation offload support, but +[11] It was supposed to indicate segmentation offload support, but upon further investigation it became clear that multiple bits were required. -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload features must offer the checksum feature, and a driver which accepts the offload features must accept the checksum feature. Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features depending on VIRTIO_NET_F_GUEST_CSUM. -[14] This is a common restriction in real, older network cards. +[13] This is a common restriction in real, older network cards. -[15] For example, a network packet transported between two guests on +[14] For example, a network packet transported between two guests on the same system may not require checksumming at all, nor segmentation, if both guests are amenable. -[16] For example, consider a partially checksummed TCP (IPv4) packet. +[15] For example, consider a partially checksummed TCP (IPv4) packet. It will have a 14 byte ethernet header and 20 byte IP header followed by the TCP header (with the TCP checksum field 16 bytes into that header). csum_start will be 14+20 = 34 (the TCP @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones' complement checksum of the TCP header and body will give the correct result. -[17] Due to various bugs in implementations, this field is not useful +[16] Due to various bugs in implementations, this field is not useful as a guarantee of the transport header size. -[18] This case is not handled by some older hardware, so is called out +[17] This case is not handled by some older hardware, so is called out specifically in the protocol. -[19] Note that the header will be two bytes longer for the +[18] Note that the header will be two bytes longer for the VIRTIO_NET_F_MRG_RXBUF case. -[20] Obviously each one can be split across multiple descriptor +[19] Obviously each one can be split across multiple descriptor elements. -[21] Since there are no guarentees, it can use a hash filter or +[20] Since there are no guarentees, it can use a hash filter or silently switch to allmulti or promiscuous mode if it is given too many addresses. -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not distinguish between them -[23] Because this is high importance and low bandwidth, the current +[22] Because this is high importance and low bandwidth, the current Linux implementation polls for the buffer to be used, rather than waiting for an interrupt, simplifying the implementation significantly. However, for generic serial ports with the @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the consumed buffers are freed upon the next write or poll call or when a port is closed or hot-unplugged. -[24] This is historical, and independent of the guest page size +[23] This is historical, and independent of the guest page size -[25] In this case, deflation advice is merely a courtesy +[24] In this case, deflation advice is merely a courtesy -[26] As updates to configuration space are not atomic, this field +[25] As updates to configuration space are not atomic, this field isn't particularly reliable, but can be used to diagnose buggy guests.


  • 4.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-16-2013 08:13
    On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > This resolves issue VIRTIO-10 > > > > This also creates a new section for legacy feature > > bits which will be handy for VIRTIO-13 if that is > > accepted. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > This is very similar to the patch I came up with: I'm too slow! > > Pasted below, in particular the 2.4.2.5.1 header needs to move up a few > lines, too. > > Cheers, > Rusty. Cool. A couple of comments below. > > --- > > virtio-v1.0-wd01-part1-specification.txt 38 ++++++++++++++++++++++---------- > > 1 file changed, 26 insertions(+), 12 deletions(-) > > > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > > index 2679b5b..bb75e35 100644 > > --- a/virtio-v1.0-wd01-part1-specification.txt > > +++ b/virtio-v1.0-wd01-part1-specification.txt > > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian. > > > > 2.1.4.2. Message Framing > > ----------------------- > > -The original intent of the specification was that message framing (the > > -particular layout of descriptors) be independent of the contents of > > +Generally, the intent of the specification is for message framing (the > > +particular layout of descriptors) to be independent of the contents of > > the buffers. For example, a network transmit buffer consists of a 12 > > byte header followed by the network packet. This could be most simply > > placed in the descriptor table as a 12 byte output descriptor followed > > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and > > packet are adjacent, or even three or more descriptors (possibly with > > loss of efficiency in that case). > > > > -Regrettably, initial driver implementations used simple layouts, and > > -devices came to rely on it, despite this specification wording[10]. It > > -is thus recommended that drivers be conservative in their assumptions, > > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some > > +In addition, some > > implementations may have large-but-reasonable restrictions on total > > descriptor size (such as based on IOV_MAX in the host OS). This has > > not been a problem in practice: little sympathy will be given to > > drivers which create unreasonably-sized descriptors such as by > > dividing a network packet into 1500 single-byte descriptors! > > > > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing > > +----------------------- > > +Regrettably, initial driver implementations used simple layouts, and > > +devices came to rely on it, despite this specification wording[10]. It > > +is thus recommended that when using legacy interfaces, > > +drivers should be conservative in their assumptions, > > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted. > > + > > 2.1.4.3. The Virtqueue Descriptor Table > > -------------------------------------- > > > > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined: > > using a timer if the device interrupts it when all the packets > > are transmitted. > > > > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > > - descriptor layouts, as described in Section "2.1.4.2. Message Framing". > > - > > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates > > that the driver can use descriptors with the VRING_DESC_F_INDIRECT > > flag set, as described in "2.1.4.3.1. Indirect Descriptors". > > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined: > > compliant with this specification, and acknowledged by all device > > drivers. > > > > -In addition, bit 30 is used by qemu's implementation to check for experimental > > -early versions of virtio which did not perform correct feature negotiation, > > -and should not be used. > > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits > > +------------------------- > > + > > +When used through the legacy interface, transitional > > +devices should advertise and recognize the following > > +feature bits: > > + > > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > > +descriptor layouts, as described in Section > > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing". > > + > > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices > > +to check for experimental early versions of virtio > > +drivers which did not perform correct feature negotiation, > > +and should not be supported or activated by drivers. > > > > 2.6. virtio_ring.h > > ================= > > -- > > MST > > commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e > Author: Rusty Russell <rusty@au1.ibm.com> > Date: Mon Sep 9 17:40:32 2013 +0930 > > Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT) > > As per issue virtio-10. > > Signed-off-by: Rusty Russell <rusty@au1.ibm.com> > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > index 1e424d0..ceac80a 100644 > --- a/virtio-v1.0-wd01-part1-specification.txt > +++ b/virtio-v1.0-wd01-part1-specification.txt > @@ -268,25 +268,33 @@ is not a significant issue. > > 2.1.4.2. Message Framing > ----------------------- > -The original intent of the specification was that message framing (the > -particular layout of descriptors) be independent of the contents of > -the buffers. For example, a network transmit buffer consists of a 12 > -byte header followed by the network packet. This could be most simply > -placed in the descriptor table as a 12 byte output descriptor followed > -by a 1514 byte output descriptor, but it could also consist of a > -single 1526 byte output descriptor in the case where the header and > -packet are adjacent, or even three or more descriptors (possibly with > -loss of efficiency in that case). > +The message framing (the particular layout of descriptors) is > +independent of the contents of the buffers. For example, a network > +transmit buffer consists of a 12 byte header followed by the network > +packet. This could be most simply placed in the descriptor table as a > +12 byte output descriptor followed by a 1514 byte output descriptor, > +but it could also consist of a single 1526 byte output descriptor in > +the case where the header and packet are adjacent, or even three or > +more descriptors (possibly with loss of efficiency in that case). > + > +Note that, some implementations may have large-but-reasonable > +restrictions on total descriptor size (such as based on IOV_MAX in the > +host OS). This has not been a problem in practice: little sympathy > +will be given to drivers which create unreasonably-sized descriptors > +such as by dividing a network packet into 1500 single-byte > +descriptors! > + > +2.1.4.2.1. Legacy Interface: Message Framing > +----------------------- > > Regrettably, initial driver implementations used simple layouts, and > -devices came to rely on it, despite this specification wording[10]. It > -is thus recommended that drivers be conservative in their assumptions, > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some > -implementations may have large-but-reasonable restrictions on total > -descriptor size (such as based on IOV_MAX in the host OS). This has > -not been a problem in practice: little sympathy will be given to > -drivers which create unreasonably-sized descriptors such as by > -dividing a network packet into 1500 single-byte descriptors! > +devices came to rely on it, despite this specification wording. In > +addition, the specification for virtio_blk SCSI commands required > +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy > +Interface: Device Operation") > + > +It is thus recommended that Maybe add: " when using legacy interfaces, transitional " to stress that it does not need to be checked in non legacy setting. > drivers be conservative in their > +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted. > > 2.1.4.3. The Virtqueue Descriptor Table > -------------------------------------- > @@ -1193,7 +1201,7 @@ features. > VIRTIO_NET_F_MAC (5) Device has given MAC address. > > VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with > - any GSO type.[12] > + any GSO type.[11] > > VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4. > This generated a huge diff because of renumbering. What I generally do is take out a chunk and commit, then make renumbering a separate commit, it does not need review :) If I add a chunk, and numbering does not fit naturally, I add it with a random huge number like 123 - this makes it possible to add cross-references directly. Again commit then re-number separately. This also makes rebasing easier for people: they can see the conflict is with a re-numbering patch. > @@ -1267,16 +1275,16 @@ features. > packets by negotating the VIRTIO_NET_F_CSUM feature. This “ > checksum offload” is a common feature on modern network cards. > > -7. If that feature is negotiated[13], a driver can use TCP or UDP > +7. If that feature is negotiated[12], a driver can use TCP or UDP > segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 > TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO > (UDP fragmentation) features. It should not send TCP packets > requiring segmentation offload which have the Explicit Congestion > Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is > - negotiated.[14] > + negotiated.[13] > > 8. The converse features are also available: a driver can save > - the virtual device some work by negotiating these features.[15] > + the virtual device some work by negotiating these features.[14] > The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially > checksummed packets can be received, and if it can do that then > the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, > @@ -1328,7 +1336,7 @@ the different features the driver negotiated. > and > > • csum_offset indicates how many bytes after the csum_start the > - new (16 bit ones' complement) checksum should be placed.[16] > + new (16 bit ones' complement) checksum should be placed.[15] > > 2. If the driver negotiated > VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires > @@ -1341,21 +1349,21 @@ the different features the driver negotiated. > > • hdr_len is a hint to the device as to how much of the header > needs to be kept to copy into each packet, usually set to the > - length of the headers, including the transport header.[17] > + length of the headers, including the transport header.[16] > > • gso_size is the maximum size of each packet beyond that > header (ie. MSS). > > • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, > the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as > - well, indicating that the TCP packet has the ECN bit set.[18] > + well, indicating that the TCP packet has the ECN bit set.[17] > > 3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature, > the num_buffers field is set to zero. > > 4. The header and packet are added as one output buffer to the > transmitq, and the device is notified of the new entry (see "2.4.1.4. > - Notifying The Device").[19] > + Notifying The Device").[18] > > 2.4.1.5.1.1. Packet Transmission Interrupt > ----------------------------------------- > @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to > accept packets of up to 65550 bytes long (the maximum size of a > TCP or UDP packet, plus the 14 byte ethernet header), otherwise > 1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every > -buffer in the receive queue needs to be at least this length [20] > +buffer in the receive queue needs to be at least this length [19] > > If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at > least the size of the struct virtio_net_hdr. > @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or > #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0 > > The device can filter incoming packets by any number of destination > -MAC addresses.[21] This table is set using the class > +MAC addresses.[20] This table is set using the class > VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The > command-specific-data is two variable length tables of 6-byte MAC > addresses. The first table contains unicast addresses, and the second > @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form: > > The type of the request is either a read (VIRTIO_BLK_T_IN), a write > (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or > -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER > +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER > feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this > request acts as a barrier and that all preceeding requests must be > complete before this one, and all following requests must not be > @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host: > #define VIRTIO_BLK_S_IOERR 1 > #define VIRTIO_BLK_S_UNSUPP 2 > > +2.4.2.5.1 Legacy Interface: Device Operation > +------------------------ > Historically, devices assumed that the fields type, ioprio and sector > reside in a single, separate read-only buffer and the status field is > a separate read-only buffer of size 1 byte, by itself. > > -2.4.2.5.1 Legacy Interface: Device Operation > ------------------------- > If the device has VIRTIO_BLK_F_SCSI feature, it can also support > scsi packet command requests, each of these requests is of form: > > @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue. > ------------------------ > > 1. For output, a buffer containing the characters is placed in > - the port's transmitq.[23] > + the port's transmitq.[22] > > 2. When a buffer is used in the receiveq (signalled by an > interrupt), the contents is the input to the port associated > @@ -1970,7 +1978,7 @@ configuration change interrupt. > 2. To supply memory to the balloon (aka. inflate): > > (a) The driver constructs an array of addresses of unused memory > - pages. These addresses are divided by 4096[24] and the descriptor > + pages. These addresses are divided by 4096[23] and the descriptor > describing the resulting 32-bit array is added to the inflateq. > > 3. To remove memory from the balloon (aka. deflate): > @@ -1985,11 +1993,11 @@ configuration change interrupt. > > (c) Otherwise, the guest may begin to re-use pages previously > given to the balloon before the device has acknowledged their > - withdrawl. [25] > + withdrawl. [24] > > 4. In either case, once the device has completed the inflation or > deflation, the “actual” field of the configuration should be > - updated to reflect the new number of pages in the balloon.[26] > + updated to reflect the new number of pages in the balloon.[25] > > 2.4.5.6.1. Memory Statistics > --------------------------- > @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined: > 2.5. Reserved Feature Bits > ========================= > > -Currently there are four device-independent feature bits defined: > - > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > - descriptor layouts, as described in Section "2.1.4.2. Message Framing". > +Currently there are three device-independent feature bits defined: > > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates > that the driver can use descriptors with the VRING_DESC_F_INDIRECT > @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature > using a timer if the device interrupts it when all the packets > are transmitted. > > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device > + accepts arbitrary descriptor layouts, as described in Section > + "2.1.4.2.1. Legacy Interface: Message Framing". > + Is this on top of a patch making NOTIFY_ON_EMPTY legacy? We should also move BAD_FEATURE here. > 2.6. virtio_ring.h > ================= > > @@ -2851,7 +2860,7 @@ altogether. > > Any change to configuration space, or new virtqueues, or > behavioural changes, should be indicated by negotiation of a new > -feature bit. This establishes clarity[11] and avoids future expansion problems. > +feature bit. This establishes clarity[10] and avoids future expansion problems. > > Clusters of functionality which are always implemented together > can use a single bit, but if one feature makes sense without the > @@ -2895,34 +2904,28 @@ of this expected condition is necessary. > > [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization > > -[10] It was previously asserted that framing should be independent of message > -contents, yet invariably drivers layed out messages in reliable ways and > -devices assumed it. > -In addition, the specifications for virtio_blk and virtio_scsi require > -intuiting field lengths from frame boundaries. > - > -[11] Even if it does mean documenting design or implementation > +[10] Even if it does mean documenting design or implementation > mistakes! > > > -[12] It was supposed to indicate segmentation offload support, but > +[11] It was supposed to indicate segmentation offload support, but > upon further investigation it became clear that multiple bits > were required. > > -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are > +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are > dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload > features must offer the checksum feature, and a driver which > accepts the offload features must accept the checksum feature. > Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features > depending on VIRTIO_NET_F_GUEST_CSUM. > > -[14] This is a common restriction in real, older network cards. > +[13] This is a common restriction in real, older network cards. > > -[15] For example, a network packet transported between two guests on > +[14] For example, a network packet transported between two guests on > the same system may not require checksumming at all, nor segmentation, > if both guests are amenable. > > -[16] For example, consider a partially checksummed TCP (IPv4) packet. > +[15] For example, consider a partially checksummed TCP (IPv4) packet. > It will have a 14 byte ethernet header and 20 byte IP header > followed by the TCP header (with the TCP checksum field 16 bytes > into that header). csum_start will be 14+20 = 34 (the TCP > @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones' > complement checksum of the TCP header and body will give the > correct result. > > -[17] Due to various bugs in implementations, this field is not useful > +[16] Due to various bugs in implementations, this field is not useful > as a guarantee of the transport header size. > > -[18] This case is not handled by some older hardware, so is called out > +[17] This case is not handled by some older hardware, so is called out > specifically in the protocol. > > -[19] Note that the header will be two bytes longer for the > +[18] Note that the header will be two bytes longer for the > VIRTIO_NET_F_MRG_RXBUF case. > > -[20] Obviously each one can be split across multiple descriptor > +[19] Obviously each one can be split across multiple descriptor > elements. > > -[21] Since there are no guarentees, it can use a hash filter or > +[20] Since there are no guarentees, it can use a hash filter or > silently switch to allmulti or promiscuous mode if it is given too > many addresses. > > -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not > +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not > distinguish between them > > -[23] Because this is high importance and low bandwidth, the current > +[22] Because this is high importance and low bandwidth, the current > Linux implementation polls for the buffer to be used, rather than > waiting for an interrupt, simplifying the implementation > significantly. However, for generic serial ports with the > @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the > consumed buffers are freed upon the next write or poll call or > when a port is closed or hot-unplugged. > > -[24] This is historical, and independent of the guest page size > +[23] This is historical, and independent of the guest page size > > -[25] In this case, deflation advice is merely a courtesy > +[24] In this case, deflation advice is merely a courtesy > > -[26] As updates to configuration space are not atomic, this field > +[25] As updates to configuration space are not atomic, this field > isn't particularly reliable, but can be used to diagnose buggy guests.


  • 5.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-16-2013 08:15
    On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote:
    > "Michael S. Tsirkin" <mst@redhat.com> writes:
    > > This resolves issue VIRTIO-10
    > >
    > > This also creates a new section for legacy feature
    > > bits which will be handy for VIRTIO-13 if that is
    > > accepted.
    > >
    > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    >
    > This is very similar to the patch I came up with: I'm too slow!
    >
    > Pasted below, in particular the 2.4.2.5.1 header needs to move up a few
    > lines, too.
    >
    > Cheers,
    > Rusty.

    Cool. A couple of comments below.

    > > ---
    > > virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++----------
    > > 1 file changed, 26 insertions(+), 12 deletions(-)
    > >
    > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    > > index 2679b5b..bb75e35 100644
    > > --- a/virtio-v1.0-wd01-part1-specification.txt
    > > +++ b/virtio-v1.0-wd01-part1-specification.txt
    > > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian.
    > >
    > > 2.1.4.2. Message Framing
    > > -----------------------
    > > -The original intent of the specification was that message framing (the
    > > -particular layout of descriptors) be independent of the contents of
    > > +Generally, the intent of the specification is for message framing (the
    > > +particular layout of descriptors) to be independent of the contents of
    > > the buffers. For example, a network transmit buffer consists of a 12
    > > byte header followed by the network packet. This could be most simply
    > > placed in the descriptor table as a 12 byte output descriptor followed
    > > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and
    > > packet are adjacent, or even three or more descriptors (possibly with
    > > loss of efficiency in that case).
    > >
    > > -Regrettably, initial driver implementations used simple layouts, and
    > > -devices came to rely on it, despite this specification wording[10]. It
    > > -is thus recommended that drivers be conservative in their assumptions,
    > > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    > > +In addition, some
    > > implementations may have large-but-reasonable restrictions on total
    > > descriptor size (such as based on IOV_MAX in the host OS). This has
    > > not been a problem in practice: little sympathy will be given to
    > > drivers which create unreasonably-sized descriptors such as by
    > > dividing a network packet into 1500 single-byte descriptors!
    > >
    > > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing
    > > +-----------------------
    > > +Regrettably, initial driver implementations used simple layouts, and
    > > +devices came to rely on it, despite this specification wording[10]. It
    > > +is thus recommended that when using legacy interfaces,
    > > +drivers should be conservative in their assumptions,
    > > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
    > > +
    > > 2.1.4.3. The Virtqueue Descriptor Table
    > > --------------------------------------
    > >
    > > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined:
    > > using a timer if the device interrupts it when all the packets
    > > are transmitted.
    > >
    > > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    > > - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    > > -
    > > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    > > that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    > > flag set, as described in "2.1.4.3.1. Indirect Descriptors".
    > > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined:
    > > compliant with this specification, and acknowledged by all device
    > > drivers.
    > >
    > > -In addition, bit 30 is used by qemu's implementation to check for experimental
    > > -early versions of virtio which did not perform correct feature negotiation,
    > > -and should not be used.
    > > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits
    > > +-------------------------
    > > +
    > > +When used through the legacy interface, transitional
    > > +devices should advertise and recognize the following
    > > +feature bits:
    > > +
    > > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    > > +descriptor layouts, as described in Section
    > > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing".
    > > +
    > > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices
    > > +to check for experimental early versions of virtio
    > > +drivers which did not perform correct feature negotiation,
    > > +and should not be supported or activated by drivers.
    > >
    > > 2.6. virtio_ring.h
    > > =================
    > > --
    > > MST
    >
    > commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e
    > Author: Rusty Russell <rusty@au1.ibm.com>
    > Date: Mon Sep 9 17:40:32 2013 +0930
    >
    > Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT)
    >
    > As per issue virtio-10.
    >
    > Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
    >
    > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    > index 1e424d0..ceac80a 100644
    > --- a/virtio-v1.0-wd01-part1-specification.txt
    > +++ b/virtio-v1.0-wd01-part1-specification.txt
    > @@ -268,25 +268,33 @@ is not a significant issue.
    >
    > 2.1.4.2. Message Framing
    > -----------------------
    > -The original intent of the specification was that message framing (the
    > -particular layout of descriptors) be independent of the contents of
    > -the buffers. For example, a network transmit buffer consists of a 12
    > -byte header followed by the network packet. This could be most simply
    > -placed in the descriptor table as a 12 byte output descriptor followed
    > -by a 1514 byte output descriptor, but it could also consist of a
    > -single 1526 byte output descriptor in the case where the header and
    > -packet are adjacent, or even three or more descriptors (possibly with
    > -loss of efficiency in that case).
    > +The message framing (the particular layout of descriptors) is
    > +independent of the contents of the buffers. For example, a network
    > +transmit buffer consists of a 12 byte header followed by the network
    > +packet. This could be most simply placed in the descriptor table as a
    > +12 byte output descriptor followed by a 1514 byte output descriptor,
    > +but it could also consist of a single 1526 byte output descriptor in
    > +the case where the header and packet are adjacent, or even three or
    > +more descriptors (possibly with loss of efficiency in that case).
    > +
    > +Note that, some implementations may have large-but-reasonable
    > +restrictions on total descriptor size (such as based on IOV_MAX in the
    > +host OS). This has not been a problem in practice: little sympathy
    > +will be given to drivers which create unreasonably-sized descriptors
    > +such as by dividing a network packet into 1500 single-byte
    > +descriptors!
    > +
    > +2.1.4.2.1. Legacy Interface: Message Framing
    > +-----------------------
    >
    > Regrettably, initial driver implementations used simple layouts, and
    > -devices came to rely on it, despite this specification wording[10]. It
    > -is thus recommended that drivers be conservative in their assumptions,
    > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    > -implementations may have large-but-reasonable restrictions on total
    > -descriptor size (such as based on IOV_MAX in the host OS). This has
    > -not been a problem in practice: little sympathy will be given to
    > -drivers which create unreasonably-sized descriptors such as by
    > -dividing a network packet into 1500 single-byte descriptors!
    > +devices came to rely on it, despite this specification wording. In
    > +addition, the specification for virtio_blk SCSI commands required
    > +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
    > +Interface: Device Operation")
    > +
    > +It is thus recommended that

    Maybe add:
    " when using legacy interfaces, transitional "

    to stress that it does not need to be checked in non legacy
    setting.

    > drivers be conservative in their
    > +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted.

    >
    > 2.1.4.3. The Virtqueue Descriptor Table
    > --------------------------------------
    > @@ -1193,7 +1201,7 @@ features.
    > VIRTIO_NET_F_MAC (5) Device has given MAC address.
    >
    > VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
    > - any GSO type.[12]
    > + any GSO type.[11]
    >
    > VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
    >

    This generated a huge diff because of renumbering.
    What I generally do is take out a chunk and commit,
    then make renumbering a separate
    commit, it does not need review :)

    If I add a chunk, and numbering does not fit naturally,
    I add it with a random huge number like 123 - this makes
    it possible to add cross-references directly.
    Again commit then re-number separately.

    This also makes rebasing easier for people:
    they can see the conflict is with a re-numbering patch.

    > @@ -1267,16 +1275,16 @@ features.
    > packets by negotating the VIRTIO_NET_F_CSUM feature. This “
    > checksum offload” is a common feature on modern network cards.
    >
    > -7. If that feature is negotiated[13], a driver can use TCP or UDP
    > +7. If that feature is negotiated[12], a driver can use TCP or UDP
    > segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
    > TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
    > (UDP fragmentation) features. It should not send TCP packets
    > requiring segmentation offload which have the Explicit Congestion
    > Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
    > - negotiated.[14]
    > + negotiated.[13]
    >
    > 8. The converse features are also available: a driver can save
    > - the virtual device some work by negotiating these features.[15]
    > + the virtual device some work by negotiating these features.[14]
    > The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
    > checksummed packets can be received, and if it can do that then
    > the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
    > @@ -1328,7 +1336,7 @@ the different features the driver negotiated.
    > and
    >
    > • csum_offset indicates how many bytes after the csum_start the
    > - new (16 bit ones' complement) checksum should be placed.[16]
    > + new (16 bit ones' complement) checksum should be placed.[15]
    >
    > 2. If the driver negotiated
    > VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
    > @@ -1341,21 +1349,21 @@ the different features the driver negotiated.
    >
    > • hdr_len is a hint to the device as to how much of the header
    > needs to be kept to copy into each packet, usually set to the
    > - length of the headers, including the transport header.[17]
    > + length of the headers, including the transport header.[16]
    >
    > • gso_size is the maximum size of each packet beyond that
    > header (ie. MSS).
    >
    > • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
    > the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
    > - well, indicating that the TCP packet has the ECN bit set.[18]
    > + well, indicating that the TCP packet has the ECN bit set.[17]
    >
    > 3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
    > the num_buffers field is set to zero.
    >
    > 4. The header and packet are added as one output buffer to the
    > transmitq, and the device is notified of the new entry (see "2.4.1.4.
    > - Notifying The Device").[19]
    > + Notifying The Device").[18]
    >
    > 2.4.1.5.1.1. Packet Transmission Interrupt
    > -----------------------------------------
    > @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
    > accept packets of up to 65550 bytes long (the maximum size of a
    > TCP or UDP packet, plus the 14 byte ethernet header), otherwise
    > 1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
    > -buffer in the receive queue needs to be at least this length [20]
    > +buffer in the receive queue needs to be at least this length [19]
    >
    > If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
    > least the size of the struct virtio_net_hdr.
    > @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or
    > #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0
    >
    > The device can filter incoming packets by any number of destination
    > -MAC addresses.[21] This table is set using the class
    > +MAC addresses.[20] This table is set using the class
    > VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
    > command-specific-data is two variable length tables of 6-byte MAC
    > addresses. The first table contains unicast addresses, and the second
    > @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form:
    >
    > The type of the request is either a read (VIRTIO_BLK_T_IN), a write
    > (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or
    > -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER
    > +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER
    > feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
    > request acts as a barrier and that all preceeding requests must be
    > complete before this one, and all following requests must not be
    > @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
    > #define VIRTIO_BLK_S_IOERR 1
    > #define VIRTIO_BLK_S_UNSUPP 2
    >
    > +2.4.2.5.1 Legacy Interface: Device Operation
    > +------------------------
    > Historically, devices assumed that the fields type, ioprio and sector
    > reside in a single, separate read-only buffer and the status field is
    > a separate read-only buffer of size 1 byte, by itself.
    >
    > -2.4.2.5.1 Legacy Interface: Device Operation
    > -------------------------
    > If the device has VIRTIO_BLK_F_SCSI feature, it can also support
    > scsi packet command requests, each of these requests is of form:
    >
    > @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue.
    > ------------------------
    >
    > 1. For output, a buffer containing the characters is placed in
    > - the port's transmitq.[23]
    > + the port's transmitq.[22]
    >
    > 2. When a buffer is used in the receiveq (signalled by an
    > interrupt), the contents is the input to the port associated
    > @@ -1970,7 +1978,7 @@ configuration change interrupt.
    > 2. To supply memory to the balloon (aka. inflate):
    >
    > (a) The driver constructs an array of addresses of unused memory
    > - pages. These addresses are divided by 4096[24] and the descriptor
    > + pages. These addresses are divided by 4096[23] and the descriptor
    > describing the resulting 32-bit array is added to the inflateq.
    >
    > 3. To remove memory from the balloon (aka. deflate):
    > @@ -1985,11 +1993,11 @@ configuration change interrupt.
    >
    > (c) Otherwise, the guest may begin to re-use pages previously
    > given to the balloon before the device has acknowledged their
    > - withdrawl. [25]
    > + withdrawl. [24]
    >
    > 4. In either case, once the device has completed the inflation or
    > deflation, the “actual” field of the configuration should be
    > - updated to reflect the new number of pages in the balloon.[26]
    > + updated to reflect the new number of pages in the balloon.[25]
    >
    > 2.4.5.6.1. Memory Statistics
    > ---------------------------
    > @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined:
    > 2.5. Reserved Feature Bits
    > =========================
    >
    > -Currently there are four device-independent feature bits defined:
    > -
    > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    > - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    > +Currently there are three device-independent feature bits defined:
    >
    > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    > that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    > @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
    > using a timer if the device interrupts it when all the packets
    > are transmitted.
    >
    > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
    > + accepts arbitrary descriptor layouts, as described in Section
    > + "2.1.4.2.1. Legacy Interface: Message Framing".
    > +

    Is this on top of a patch making NOTIFY_ON_EMPTY legacy?

    We should also move BAD_FEATURE here.

    > 2.6. virtio_ring.h
    > =================
    >
    > @@ -2851,7 +2860,7 @@ altogether.
    >
    > Any change to configuration space, or new virtqueues, or
    > behavioural changes, should be indicated by negotiation of a new
    > -feature bit. This establishes clarity[11] and avoids future expansion problems.
    > +feature bit. This establishes clarity[10] and avoids future expansion problems.
    >
    > Clusters of functionality which are always implemented together
    > can use a single bit, but if one feature makes sense without the
    > @@ -2895,34 +2904,28 @@ of this expected condition is necessary.
    >
    > [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization
    >
    > -[10] It was previously asserted that framing should be independent of message
    > -contents, yet invariably drivers layed out messages in reliable ways and
    > -devices assumed it.
    > -In addition, the specifications for virtio_blk and virtio_scsi require
    > -intuiting field lengths from frame boundaries.
    > -
    > -[11] Even if it does mean documenting design or implementation
    > +[10] Even if it does mean documenting design or implementation
    > mistakes!
    >
    >
    > -[12] It was supposed to indicate segmentation offload support, but
    > +[11] It was supposed to indicate segmentation offload support, but
    > upon further investigation it became clear that multiple bits
    > were required.
    >
    > -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
    > +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
    > dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
    > features must offer the checksum feature, and a driver which
    > accepts the offload features must accept the checksum feature.
    > Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
    > depending on VIRTIO_NET_F_GUEST_CSUM.
    >
    > -[14] This is a common restriction in real, older network cards.
    > +[13] This is a common restriction in real, older network cards.
    >
    > -[15] For example, a network packet transported between two guests on
    > +[14] For example, a network packet transported between two guests on
    > the same system may not require checksumming at all, nor segmentation,
    > if both guests are amenable.
    >
    > -[16] For example, consider a partially checksummed TCP (IPv4) packet.
    > +[15] For example, consider a partially checksummed TCP (IPv4) packet.
    > It will have a 14 byte ethernet header and 20 byte IP header
    > followed by the TCP header (with the TCP checksum field 16 bytes
    > into that header). csum_start will be 14+20 = 34 (the TCP
    > @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones'
    > complement checksum of the TCP header and body will give the
    > correct result.
    >
    > -[17] Due to various bugs in implementations, this field is not useful
    > +[16] Due to various bugs in implementations, this field is not useful
    > as a guarantee of the transport header size.
    >
    > -[18] This case is not handled by some older hardware, so is called out
    > +[17] This case is not handled by some older hardware, so is called out
    > specifically in the protocol.
    >
    > -[19] Note that the header will be two bytes longer for the
    > +[18] Note that the header will be two bytes longer for the
    > VIRTIO_NET_F_MRG_RXBUF case.
    >
    > -[20] Obviously each one can be split across multiple descriptor
    > +[19] Obviously each one can be split across multiple descriptor
    > elements.
    >
    > -[21] Since there are no guarentees, it can use a hash filter or
    > +[20] Since there are no guarentees, it can use a hash filter or
    > silently switch to allmulti or promiscuous mode if it is given too
    > many addresses.
    >
    > -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not
    > +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not
    > distinguish between them
    >
    > -[23] Because this is high importance and low bandwidth, the current
    > +[22] Because this is high importance and low bandwidth, the current
    > Linux implementation polls for the buffer to be used, rather than
    > waiting for an interrupt, simplifying the implementation
    > significantly. However, for generic serial ports with the
    > @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the
    > consumed buffers are freed upon the next write or poll call or
    > when a port is closed or hot-unplugged.
    >
    > -[24] This is historical, and independent of the guest page size
    > +[23] This is historical, and independent of the guest page size
    >
    > -[25] In this case, deflation advice is merely a courtesy
    > +[24] In this case, deflation advice is merely a courtesy
    >
    > -[26] As updates to configuration space are not atomic, this field
    > +[25] As updates to configuration space are not atomic, this field
    > isn't particularly reliable, but can be used to diagnose buggy guests.



  • 6.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-17-2013 04:07
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote:
    >> "Michael S. Tsirkin" <mst@redhat.com> writes:
    >> > This resolves issue VIRTIO-10
    >> >
    >> > This also creates a new section for legacy feature
    >> > bits which will be handy for VIRTIO-13 if that is
    >> > accepted.
    >> >
    >> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    >>
    >> This is very similar to the patch I came up with: I'm too slow!
    >>
    >> Pasted below, in particular the 2.4.2.5.1 header needs to move up a few
    >> lines, too.
    >>
    >> Cheers,
    >> Rusty.
    >
    > Cool. A couple of comments below.

    >> +2.1.4.2.1. Legacy Interface: Message Framing
    >> +-----------------------
    >>
    >> Regrettably, initial driver implementations used simple layouts, and
    >> -devices came to rely on it, despite this specification wording[10]. It
    >> -is thus recommended that drivers be conservative in their assumptions,
    >> -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    >> -implementations may have large-but-reasonable restrictions on total
    >> -descriptor size (such as based on IOV_MAX in the host OS). This has
    >> -not been a problem in practice: little sympathy will be given to
    >> -drivers which create unreasonably-sized descriptors such as by
    >> -dividing a network packet into 1500 single-byte descriptors!
    >> +devices came to rely on it, despite this specification wording. In
    >> +addition, the specification for virtio_blk SCSI commands required
    >> +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
    >> +Interface: Device Operation")
    >> +
    >> +It is thus recommended that
    >
    > Maybe add:
    > " when using legacy interfaces, transitional "
    >
    > to stress that it does not need to be checked in non legacy
    > setting.

    Yep, good idea.

    >
    >> drivers be conservative in their
    >> +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
    >
    >>
    >> 2.1.4.3. The Virtqueue Descriptor Table
    >> --------------------------------------
    >> @@ -1193,7 +1201,7 @@ features.
    >> VIRTIO_NET_F_MAC (5) Device has given MAC address.
    >>
    >> VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
    >> - any GSO type.[12]
    >> + any GSO type.[11]
    >>
    >> VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
    >>
    >
    > This generated a huge diff because of renumbering.
    > What I generally do is take out a chunk and commit,
    > then make renumbering a separate
    > commit, it does not need review :)
    >
    > If I add a chunk, and numbering does not fit naturally,
    > I add it with a random huge number like 123 - this makes
    > it possible to add cross-references directly.
    > Again commit then re-number separately.

    Right, I'll start doing the same.

    >> 2.4.5.6.1. Memory Statistics
    >> ---------------------------
    >> @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined:
    >> 2.5. Reserved Feature Bits
    >> =========================
    >>
    >> -Currently there are four device-independent feature bits defined:
    >> -
    >> - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    >> - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    >> +Currently there are three device-independent feature bits defined:
    >>
    >> VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    >> that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    >> @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
    >> using a timer if the device interrupts it when all the packets
    >> are transmitted.
    >>
    >> +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
    >> + accepts arbitrary descriptor layouts, as described in Section
    >> + "2.1.4.2.1. Legacy Interface: Message Framing".
    >> +
    >
    > Is this on top of a patch making NOTIFY_ON_EMPTY legacy?
    >
    > We should also move BAD_FEATURE here.

    Yes, I have a working tree. I'm switching to per-topic branches, since
    merge order isn't determinate.

    Here is the result:

    commit b245c20be903005dd79dc5a23abb54d56ead73ca
    Author: Rusty Russell <rusty@au1.ibm.com>
    Date: Mon Sep 9 17:40:32 2013 +0930

    Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT)

    As per issue virtio-10.

    Signed-off-by: Rusty Russell <rusty@au1.ibm.com>

    diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    index 743467a..02bb789 100644
    --- a/virtio-v1.0-wd01-part1-specification.txt
    +++ b/virtio-v1.0-wd01-part1-specification.txt
    @@ -268,25 +268,34 @@ is not a significant issue.

    2.1.4.2. Message Framing
    -----------------------
    -The original intent of the specification was that message framing (the
    -particular layout of descriptors) be independent of the contents of
    -the buffers. For example, a network transmit buffer consists of a 12
    -byte header followed by the network packet. This could be most simply
    -placed in the descriptor table as a 12 byte output descriptor followed
    -by a 1514 byte output descriptor, but it could also consist of a
    -single 1526 byte output descriptor in the case where the header and
    -packet are adjacent, or even three or more descriptors (possibly with
    -loss of efficiency in that case).
    +The message framing (the particular layout of descriptors) is
    +independent of the contents of the buffers. For example, a network
    +transmit buffer consists of a 12 byte header followed by the network
    +packet. This could be most simply placed in the descriptor table as a
    +12 byte output descriptor followed by a 1514 byte output descriptor,
    +but it could also consist of a single 1526 byte output descriptor in
    +the case where the header and packet are adjacent, or even three or
    +more descriptors (possibly with loss of efficiency in that case).
    +
    +Note that, some implementations may have large-but-reasonable
    +restrictions on total descriptor size (such as based on IOV_MAX in the
    +host OS). This has not been a problem in practice: little sympathy
    +will be given to drivers which create unreasonably-sized descriptors
    +such as by dividing a network packet into 1500 single-byte
    +descriptors!
    +
    +2.1.4.2.1. Legacy Interface: Message Framing
    +-----------------------

    Regrettably, initial driver implementations used simple layouts, and
    -devices came to rely on it, despite this specification wording[10]. It
    -is thus recommended that drivers be conservative in their assumptions,
    -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
    -implementations may have large-but-reasonable restrictions on total
    -descriptor size (such as based on IOV_MAX in the host OS). This has
    -not been a problem in practice: little sympathy will be given to
    -drivers which create unreasonably-sized descriptors such as by
    -dividing a network packet into 1500 single-byte descriptors!
    +devices came to rely on it, despite this specification wording. In
    +addition, the specification for virtio_blk SCSI commands required
    +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
    +Interface: Device Operation")
    +
    +It is thus recommended that when using legacy interfaces, transitional
    +drivers be conservative in their assumptions, unless the
    +VIRTIO_F_ANY_LAYOUT feature is accepted.

    2.1.4.3. The Virtqueue Descriptor Table
    --------------------------------------
    @@ -1729,6 +1738,9 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
    #define VIRTIO_BLK_S_IOERR 1
    #define VIRTIO_BLK_S_UNSUPP 2

    +2.4.2.5.1 Legacy Interface: Device Operation
    +------------------------
    +
    Historically, devices assumed that the fields type, ioprio and
    sector reside in a single, separate read-only buffer; the fields
    errors, data_len, sense_len and residual reside in a single,
    @@ -2583,9 +2595,6 @@ Currently there are five device-independent feature bits defined:
    using a timer if the device interrupts it when all the packets
    are transmitted.

    - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
    - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
    -
    VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
    that the driver can use descriptors with the VRING_DESC_F_INDIRECT
    flag set, as described in "2.1.4.3.1. Indirect Descriptors".
    @@ -2609,6 +2618,16 @@ In addition, bit 30 is used by qemu's implementation to check for experimental
    early versions of virtio which did not perform correct feature negotiation,
    and should not be used.

    +2.5.1 Legacy Interface: Reserved Feature Bits
    +--------------------------------------------
    +
    +Legacy or transitional devices may offer the following:
    +
    +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
    + accepts arbitrary descriptor layouts, as described in Section
    + "2.1.4.2.1. Legacy Interface: Message Framing".
    +
    +
    2.6. virtio_ring.h
    =================

    @@ -2883,12 +2902,6 @@ of this expected condition is necessary.

    [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization

    -[10] It was previously asserted that framing should be independent of message
    -contents, yet invariably drivers layed out messages in reliable ways and
    -devices assumed it.
    -In addition, the specifications for virtio_blk and virtio_scsi require
    -intuiting field lengths from frame boundaries.
    -
    [11] Even if it does mean documenting design or implementation
    mistakes!





  • 7.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-17-2013 05:07
    "Michael S. Tsirkin" <mst@redhat.com> writes: > On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> > This resolves issue VIRTIO-10 >> > >> > This also creates a new section for legacy feature >> > bits which will be handy for VIRTIO-13 if that is >> > accepted. >> > >> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> >> >> This is very similar to the patch I came up with: I'm too slow! >> >> Pasted below, in particular the 2.4.2.5.1 header needs to move up a few >> lines, too. >> >> Cheers, >> Rusty. > > Cool. A couple of comments below. >> +2.1.4.2.1. Legacy Interface: Message Framing >> +----------------------- >> >> Regrettably, initial driver implementations used simple layouts, and >> -devices came to rely on it, despite this specification wording[10]. It >> -is thus recommended that drivers be conservative in their assumptions, >> -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some >> -implementations may have large-but-reasonable restrictions on total >> -descriptor size (such as based on IOV_MAX in the host OS). This has >> -not been a problem in practice: little sympathy will be given to >> -drivers which create unreasonably-sized descriptors such as by >> -dividing a network packet into 1500 single-byte descriptors! >> +devices came to rely on it, despite this specification wording. In >> +addition, the specification for virtio_blk SCSI commands required >> +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy >> +Interface: Device Operation") >> + >> +It is thus recommended that > > Maybe add: > " when using legacy interfaces, transitional " > > to stress that it does not need to be checked in non legacy > setting. Yep, good idea. > >> drivers be conservative in their >> +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted. > >> >> 2.1.4.3. The Virtqueue Descriptor Table >> -------------------------------------- >> @@ -1193,7 +1201,7 @@ features. >> VIRTIO_NET_F_MAC (5) Device has given MAC address. >> >> VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with >> - any GSO type.[12] >> + any GSO type.[11] >> >> VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4. >> > > This generated a huge diff because of renumbering. > What I generally do is take out a chunk and commit, > then make renumbering a separate > commit, it does not need review :) > > If I add a chunk, and numbering does not fit naturally, > I add it with a random huge number like 123 - this makes > it possible to add cross-references directly. > Again commit then re-number separately. Right, I'll start doing the same. >> 2.4.5.6.1. Memory Statistics >> --------------------------- >> @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined: >> 2.5. Reserved Feature Bits >> ========================= >> >> -Currently there are four device-independent feature bits defined: >> - >> - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary >> - descriptor layouts, as described in Section "2.1.4.2. Message Framing". >> +Currently there are three device-independent feature bits defined: >> >> VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates >> that the driver can use descriptors with the VRING_DESC_F_INDIRECT >> @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature >> using a timer if the device interrupts it when all the packets >> are transmitted. >> >> +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device >> + accepts arbitrary descriptor layouts, as described in Section >> + "2.1.4.2.1. Legacy Interface: Message Framing". >> + > > Is this on top of a patch making NOTIFY_ON_EMPTY legacy? > > We should also move BAD_FEATURE here. Yes, I have a working tree. I'm switching to per-topic branches, since merge order isn't determinate. Here is the result: commit b245c20be903005dd79dc5a23abb54d56ead73ca Author: Rusty Russell <rusty@au1.ibm.com> Date: Mon Sep 9 17:40:32 2013 +0930 Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT) As per issue virtio-10. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt index 743467a..02bb789 100644 --- a/virtio-v1.0-wd01-part1-specification.txt +++ b/virtio-v1.0-wd01-part1-specification.txt @@ -268,25 +268,34 @@ is not a significant issue. 2.1.4.2. Message Framing ----------------------- -The original intent of the specification was that message framing (the -particular layout of descriptors) be independent of the contents of -the buffers. For example, a network transmit buffer consists of a 12 -byte header followed by the network packet. This could be most simply -placed in the descriptor table as a 12 byte output descriptor followed -by a 1514 byte output descriptor, but it could also consist of a -single 1526 byte output descriptor in the case where the header and -packet are adjacent, or even three or more descriptors (possibly with -loss of efficiency in that case). +The message framing (the particular layout of descriptors) is +independent of the contents of the buffers. For example, a network +transmit buffer consists of a 12 byte header followed by the network +packet. This could be most simply placed in the descriptor table as a +12 byte output descriptor followed by a 1514 byte output descriptor, +but it could also consist of a single 1526 byte output descriptor in +the case where the header and packet are adjacent, or even three or +more descriptors (possibly with loss of efficiency in that case). + +Note that, some implementations may have large-but-reasonable +restrictions on total descriptor size (such as based on IOV_MAX in the +host OS). This has not been a problem in practice: little sympathy +will be given to drivers which create unreasonably-sized descriptors +such as by dividing a network packet into 1500 single-byte +descriptors! + +2.1.4.2.1. Legacy Interface: Message Framing +----------------------- Regrettably, initial driver implementations used simple layouts, and -devices came to rely on it, despite this specification wording[10]. It -is thus recommended that drivers be conservative in their assumptions, -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some -implementations may have large-but-reasonable restrictions on total -descriptor size (such as based on IOV_MAX in the host OS). This has -not been a problem in practice: little sympathy will be given to -drivers which create unreasonably-sized descriptors such as by -dividing a network packet into 1500 single-byte descriptors! +devices came to rely on it, despite this specification wording. In +addition, the specification for virtio_blk SCSI commands required +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy +Interface: Device Operation") + +It is thus recommended that when using legacy interfaces, transitional +drivers be conservative in their assumptions, unless the +VIRTIO_F_ANY_LAYOUT feature is accepted. 2.1.4.3. The Virtqueue Descriptor Table -------------------------------------- @@ -1729,6 +1738,9 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host: #define VIRTIO_BLK_S_IOERR 1 #define VIRTIO_BLK_S_UNSUPP 2 +2.4.2.5.1 Legacy Interface: Device Operation +------------------------ + Historically, devices assumed that the fields type, ioprio and sector reside in a single, separate read-only buffer; the fields errors, data_len, sense_len and residual reside in a single, @@ -2583,9 +2595,6 @@ Currently there are five device-independent feature bits defined: using a timer if the device interrupts it when all the packets are transmitted. - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary - descriptor layouts, as described in Section "2.1.4.2. Message Framing". - VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates that the driver can use descriptors with the VRING_DESC_F_INDIRECT flag set, as described in "2.1.4.3.1. Indirect Descriptors". @@ -2609,6 +2618,16 @@ In addition, bit 30 is used by qemu's implementation to check for experimental early versions of virtio which did not perform correct feature negotiation, and should not be used. +2.5.1 Legacy Interface: Reserved Feature Bits +-------------------------------------------- + +Legacy or transitional devices may offer the following: + +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device + accepts arbitrary descriptor layouts, as described in Section + "2.1.4.2.1. Legacy Interface: Message Framing". + + 2.6. virtio_ring.h ================= @@ -2883,12 +2902,6 @@ of this expected condition is necessary. [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization -[10] It was previously asserted that framing should be independent of message -contents, yet invariably drivers layed out messages in reliable ways and -devices assumed it. -In addition, the specifications for virtio_blk and virtio_scsi require -intuiting field lengths from frame boundaries. - [11] Even if it does mean documenting design or implementation mistakes!


  • 8.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-17-2013 06:12
    On Tue, Sep 17, 2013 at 01:36:54PM +0930, Rusty Russell wrote: > Yes, I have a working tree. I'm switching to per-topic branches, since > merge order isn't determinate. Any tips on using them from git-svn? BTW is it ok to push my tree to e.g. kernel.org? Should be fine as long as master is on oasis-open.org, right? -- MST


  • 9.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-17-2013 06:14
    On Tue, Sep 17, 2013 at 01:36:54PM +0930, Rusty Russell wrote:
    > Yes, I have a working tree. I'm switching to per-topic branches, since
    > merge order isn't determinate.

    Any tips on using them from git-svn?

    BTW is it ok to push my tree to e.g. kernel.org?
    Should be fine as long as master is on oasis-open.org, right?

    --
    MST



  • 10.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-18-2013 01:48
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > On Tue, Sep 17, 2013 at 01:36:54PM +0930, Rusty Russell wrote:
    >> Yes, I have a working tree. I'm switching to per-topic branches, since
    >> merge order isn't determinate.
    >
    > Any tips on using them from git-svn?
    >
    > BTW is it ok to push my tree to e.g. kernel.org?
    > Should be fine as long as master is on oasis-open.org, right?

    I haven't figured out a good way to do svn branches like git branches.
    So for the moment, I've got them in a personal tree: if anyone is
    interested they can see a copy on github:

    https://github.com/rustyrussell/virtio-wip

    Note, this is just a personal staging area for my convenience! All
    patches will go to the mailing list, get noted in the Proposal field of
    the appropriate bug report, and get merged into SVN (or not) once voted
    on.

    Cheers,
    Rusty.




  • 11.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-18-2013 04:11
    "Michael S. Tsirkin" <mst@redhat.com> writes: > On Tue, Sep 17, 2013 at 01:36:54PM +0930, Rusty Russell wrote: >> Yes, I have a working tree. I'm switching to per-topic branches, since >> merge order isn't determinate. > > Any tips on using them from git-svn? > > BTW is it ok to push my tree to e.g. kernel.org? > Should be fine as long as master is on oasis-open.org, right? I haven't figured out a good way to do svn branches like git branches. So for the moment, I've got them in a personal tree: if anyone is interested they can see a copy on github: https://github.com/rustyrussell/virtio-wip Note, this is just a personal staging area for my convenience! All patches will go to the mailing list, get noted in the Proposal field of the appropriate bug report, and get merged into SVN (or not) once voted on. Cheers, Rusty.


  • 12.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-18-2013 09:24
    On Wed, Sep 18, 2013 at 11:18:27AM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Tue, Sep 17, 2013 at 01:36:54PM +0930, Rusty Russell wrote: > >> Yes, I have a working tree. I'm switching to per-topic branches, since > >> merge order isn't determinate. > > > > Any tips on using them from git-svn? > > > > BTW is it ok to push my tree to e.g. kernel.org? > > Should be fine as long as master is on oasis-open.org, right? > > I haven't figured out a good way to do svn branches like git branches. > So for the moment, I've got them in a personal tree: if anyone is > interested they can see a copy on github: > > https://github.com/rustyrussell/virtio-wip > > Note, this is just a personal staging area for my convenience! All > patches will go to the mailing list, get noted in the Proposal field of > the appropriate bug report, and get merged into SVN (or not) once voted > on. > > Cheers, > Rusty. OK I did same. Here it is: git://git.kernel.org/pub/scm/virt/kvm/mst/virtio-text.git Note: same disclaimers as for Rusty's tree apply.


  • 13.  Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit

    Posted 09-18-2013 09:26
    On Wed, Sep 18, 2013 at 11:18:27AM +0930, Rusty Russell wrote:
    > "Michael S. Tsirkin" <mst@redhat.com> writes:
    > > On Tue, Sep 17, 2013 at 01:36:54PM +0930, Rusty Russell wrote:
    > >> Yes, I have a working tree. I'm switching to per-topic branches, since
    > >> merge order isn't determinate.
    > >
    > > Any tips on using them from git-svn?
    > >
    > > BTW is it ok to push my tree to e.g. kernel.org?
    > > Should be fine as long as master is on oasis-open.org, right?
    >
    > I haven't figured out a good way to do svn branches like git branches.
    > So for the moment, I've got them in a personal tree: if anyone is
    > interested they can see a copy on github:
    >
    > https://github.com/rustyrussell/virtio-wip
    >
    > Note, this is just a personal staging area for my convenience! All
    > patches will go to the mailing list, get noted in the Proposal field of
    > the appropriate bug report, and get merged into SVN (or not) once voted
    > on.
    >
    > Cheers,
    > Rusty.

    OK I did same. Here it is:
    git://git.kernel.org/pub/scm/virt/kvm/mst/virtio-text.git

    Note: same disclaimers as for Rusty's tree apply.