On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <
mst@redhat.com> writes:
> > This resolves issue VIRTIO-10
> >
> > This also creates a new section for legacy feature
> > bits which will be handy for VIRTIO-13 if that is
> > accepted.
> >
> > Signed-off-by: Michael S. Tsirkin <
mst@redhat.com>
>
> This is very similar to the patch I came up with: I'm too slow!
>
> Pasted below, in particular the 2.4.2.5.1 header needs to move up a few
> lines, too.
>
> Cheers,
> Rusty.
Cool. A couple of comments below.
> > ---
> > virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++----------
> > 1 file changed, 26 insertions(+), 12 deletions(-)
> >
> > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> > index 2679b5b..bb75e35 100644
> > --- a/virtio-v1.0-wd01-part1-specification.txt
> > +++ b/virtio-v1.0-wd01-part1-specification.txt
> > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian.
> >
> > 2.1.4.2. Message Framing
> > -----------------------
> > -The original intent of the specification was that message framing (the
> > -particular layout of descriptors) be independent of the contents of
> > +Generally, the intent of the specification is for message framing (the
> > +particular layout of descriptors) to be independent of the contents of
> > the buffers. For example, a network transmit buffer consists of a 12
> > byte header followed by the network packet. This could be most simply
> > placed in the descriptor table as a 12 byte output descriptor followed
> > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and
> > packet are adjacent, or even three or more descriptors (possibly with
> > loss of efficiency in that case).
> >
> > -Regrettably, initial driver implementations used simple layouts, and
> > -devices came to rely on it, despite this specification wording[10]. It
> > -is thus recommended that drivers be conservative in their assumptions,
> > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
> > +In addition, some
> > implementations may have large-but-reasonable restrictions on total
> > descriptor size (such as based on IOV_MAX in the host OS). This has
> > not been a problem in practice: little sympathy will be given to
> > drivers which create unreasonably-sized descriptors such as by
> > dividing a network packet into 1500 single-byte descriptors!
> >
> > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing
> > +-----------------------
> > +Regrettably, initial driver implementations used simple layouts, and
> > +devices came to rely on it, despite this specification wording[10]. It
> > +is thus recommended that when using legacy interfaces,
> > +drivers should be conservative in their assumptions,
> > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
> > +
> > 2.1.4.3. The Virtqueue Descriptor Table
> > --------------------------------------
> >
> > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined:
> > using a timer if the device interrupts it when all the packets
> > are transmitted.
> >
> > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> > - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
> > -
> > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
> > that the driver can use descriptors with the VRING_DESC_F_INDIRECT
> > flag set, as described in "2.1.4.3.1. Indirect Descriptors".
> > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined:
> > compliant with this specification, and acknowledged by all device
> > drivers.
> >
> > -In addition, bit 30 is used by qemu's implementation to check for experimental
> > -early versions of virtio which did not perform correct feature negotiation,
> > -and should not be used.
> > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits
> > +-------------------------
> > +
> > +When used through the legacy interface, transitional
> > +devices should advertise and recognize the following
> > +feature bits:
> > +
> > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> > +descriptor layouts, as described in Section
> > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing".
> > +
> > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices
> > +to check for experimental early versions of virtio
> > +drivers which did not perform correct feature negotiation,
> > +and should not be supported or activated by drivers.
> >
> > 2.6. virtio_ring.h
> > =================
> > --
> > MST
>
> commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e
> Author: Rusty Russell <
rusty@au1.ibm.com>
> Date: Mon Sep 9 17:40:32 2013 +0930
>
> Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT)
>
> As per issue virtio-10.
>
> Signed-off-by: Rusty Russell <
rusty@au1.ibm.com>
>
> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> index 1e424d0..ceac80a 100644
> --- a/virtio-v1.0-wd01-part1-specification.txt
> +++ b/virtio-v1.0-wd01-part1-specification.txt
> @@ -268,25 +268,33 @@ is not a significant issue.
>
> 2.1.4.2. Message Framing
> -----------------------
> -The original intent of the specification was that message framing (the
> -particular layout of descriptors) be independent of the contents of
> -the buffers. For example, a network transmit buffer consists of a 12
> -byte header followed by the network packet. This could be most simply
> -placed in the descriptor table as a 12 byte output descriptor followed
> -by a 1514 byte output descriptor, but it could also consist of a
> -single 1526 byte output descriptor in the case where the header and
> -packet are adjacent, or even three or more descriptors (possibly with
> -loss of efficiency in that case).
> +The message framing (the particular layout of descriptors) is
> +independent of the contents of the buffers. For example, a network
> +transmit buffer consists of a 12 byte header followed by the network
> +packet. This could be most simply placed in the descriptor table as a
> +12 byte output descriptor followed by a 1514 byte output descriptor,
> +but it could also consist of a single 1526 byte output descriptor in
> +the case where the header and packet are adjacent, or even three or
> +more descriptors (possibly with loss of efficiency in that case).
> +
> +Note that, some implementations may have large-but-reasonable
> +restrictions on total descriptor size (such as based on IOV_MAX in the
> +host OS). This has not been a problem in practice: little sympathy
> +will be given to drivers which create unreasonably-sized descriptors
> +such as by dividing a network packet into 1500 single-byte
> +descriptors!
> +
> +2.1.4.2.1. Legacy Interface: Message Framing
> +-----------------------
>
> Regrettably, initial driver implementations used simple layouts, and
> -devices came to rely on it, despite this specification wording[10]. It
> -is thus recommended that drivers be conservative in their assumptions,
> -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
> -implementations may have large-but-reasonable restrictions on total
> -descriptor size (such as based on IOV_MAX in the host OS). This has
> -not been a problem in practice: little sympathy will be given to
> -drivers which create unreasonably-sized descriptors such as by
> -dividing a network packet into 1500 single-byte descriptors!
> +devices came to rely on it, despite this specification wording. In
> +addition, the specification for virtio_blk SCSI commands required
> +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
> +Interface: Device Operation")
> +
> +It is thus recommended that
Maybe add:
" when using legacy interfaces, transitional "
to stress that it does not need to be checked in non legacy
setting.
> drivers be conservative in their
> +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
>
> 2.1.4.3. The Virtqueue Descriptor Table
> --------------------------------------
> @@ -1193,7 +1201,7 @@ features.
> VIRTIO_NET_F_MAC (5) Device has given MAC address.
>
> VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
> - any GSO type.[12]
> + any GSO type.[11]
>
> VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
>
This generated a huge diff because of renumbering.
What I generally do is take out a chunk and commit,
then make renumbering a separate
commit, it does not need review :)
If I add a chunk, and numbering does not fit naturally,
I add it with a random huge number like 123 - this makes
it possible to add cross-references directly.
Again commit then re-number separately.
This also makes rebasing easier for people:
they can see the conflict is with a re-numbering patch.
> @@ -1267,16 +1275,16 @@ features.
> packets by negotating the VIRTIO_NET_F_CSUM feature. This “
> checksum offload” is a common feature on modern network cards.
>
> -7. If that feature is negotiated[13], a driver can use TCP or UDP
> +7. If that feature is negotiated[12], a driver can use TCP or UDP
> segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
> TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
> (UDP fragmentation) features. It should not send TCP packets
> requiring segmentation offload which have the Explicit Congestion
> Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
> - negotiated.[14]
> + negotiated.[13]
>
> 8. The converse features are also available: a driver can save
> - the virtual device some work by negotiating these features.[15]
> + the virtual device some work by negotiating these features.[14]
> The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
> checksummed packets can be received, and if it can do that then
> the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
> @@ -1328,7 +1336,7 @@ the different features the driver negotiated.
> and
>
> • csum_offset indicates how many bytes after the csum_start the
> - new (16 bit ones' complement) checksum should be placed.[16]
> + new (16 bit ones' complement) checksum should be placed.[15]
>
> 2. If the driver negotiated
> VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
> @@ -1341,21 +1349,21 @@ the different features the driver negotiated.
>
> • hdr_len is a hint to the device as to how much of the header
> needs to be kept to copy into each packet, usually set to the
> - length of the headers, including the transport header.[17]
> + length of the headers, including the transport header.[16]
>
> • gso_size is the maximum size of each packet beyond that
> header (ie. MSS).
>
> • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
> the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
> - well, indicating that the TCP packet has the ECN bit set.[18]
> + well, indicating that the TCP packet has the ECN bit set.[17]
>
> 3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
> the num_buffers field is set to zero.
>
> 4. The header and packet are added as one output buffer to the
> transmitq, and the device is notified of the new entry (see "2.4.1.4.
> - Notifying The Device").[19]
> + Notifying The Device").[18]
>
> 2.4.1.5.1.1. Packet Transmission Interrupt
> -----------------------------------------
> @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
> accept packets of up to 65550 bytes long (the maximum size of a
> TCP or UDP packet, plus the 14 byte ethernet header), otherwise
> 1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
> -buffer in the receive queue needs to be at least this length [20]
> +buffer in the receive queue needs to be at least this length [19]
>
> If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
> least the size of the struct virtio_net_hdr.
> @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or
> #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0
>
> The device can filter incoming packets by any number of destination
> -MAC addresses.[21] This table is set using the class
> +MAC addresses.[20] This table is set using the class
> VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
> command-specific-data is two variable length tables of 6-byte MAC
> addresses. The first table contains unicast addresses, and the second
> @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form:
>
> The type of the request is either a read (VIRTIO_BLK_T_IN), a write
> (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or
> -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER
> +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER
> feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
> request acts as a barrier and that all preceeding requests must be
> complete before this one, and all following requests must not be
> @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
> #define VIRTIO_BLK_S_IOERR 1
> #define VIRTIO_BLK_S_UNSUPP 2
>
> +2.4.2.5.1 Legacy Interface: Device Operation
> +------------------------
> Historically, devices assumed that the fields type, ioprio and sector
> reside in a single, separate read-only buffer and the status field is
> a separate read-only buffer of size 1 byte, by itself.
>
> -2.4.2.5.1 Legacy Interface: Device Operation
> -------------------------
> If the device has VIRTIO_BLK_F_SCSI feature, it can also support
> scsi packet command requests, each of these requests is of form:
>
> @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue.
> ------------------------
>
> 1. For output, a buffer containing the characters is placed in
> - the port's transmitq.[23]
> + the port's transmitq.[22]
>
> 2. When a buffer is used in the receiveq (signalled by an
> interrupt), the contents is the input to the port associated
> @@ -1970,7 +1978,7 @@ configuration change interrupt.
> 2. To supply memory to the balloon (aka. inflate):
>
> (a) The driver constructs an array of addresses of unused memory
> - pages. These addresses are divided by 4096[24] and the descriptor
> + pages. These addresses are divided by 4096[23] and the descriptor
> describing the resulting 32-bit array is added to the inflateq.
>
> 3. To remove memory from the balloon (aka. deflate):
> @@ -1985,11 +1993,11 @@ configuration change interrupt.
>
> (c) Otherwise, the guest may begin to re-use pages previously
> given to the balloon before the device has acknowledged their
> - withdrawl. [25]
> + withdrawl. [24]
>
> 4. In either case, once the device has completed the inflation or
> deflation, the “actual” field of the configuration should be
> - updated to reflect the new number of pages in the balloon.[26]
> + updated to reflect the new number of pages in the balloon.[25]
>
> 2.4.5.6.1. Memory Statistics
> ---------------------------
> @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined:
> 2.5. Reserved Feature Bits
> =========================
>
> -Currently there are four device-independent feature bits defined:
> -
> - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> - descriptor layouts, as described in Section "2.1.4.2. Message Framing".
> +Currently there are three device-independent feature bits defined:
>
> VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
> that the driver can use descriptors with the VRING_DESC_F_INDIRECT
> @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
> using a timer if the device interrupts it when all the packets
> are transmitted.
>
> +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
> + accepts arbitrary descriptor layouts, as described in Section
> + "2.1.4.2.1. Legacy Interface: Message Framing".
> +
Is this on top of a patch making NOTIFY_ON_EMPTY legacy?
We should also move BAD_FEATURE here.
> 2.6. virtio_ring.h
> =================
>
> @@ -2851,7 +2860,7 @@ altogether.
>
> Any change to configuration space, or new virtqueues, or
> behavioural changes, should be indicated by negotiation of a new
> -feature bit. This establishes clarity[11] and avoids future expansion problems.
> +feature bit. This establishes clarity[10] and avoids future expansion problems.
>
> Clusters of functionality which are always implemented together
> can use a single bit, but if one feature makes sense without the
> @@ -2895,34 +2904,28 @@ of this expected condition is necessary.
>
> [9]
https://lists.linux-foundation.org/mailman/listinfo/virtualization>
> -[10] It was previously asserted that framing should be independent of message
> -contents, yet invariably drivers layed out messages in reliable ways and
> -devices assumed it.
> -In addition, the specifications for virtio_blk and virtio_scsi require
> -intuiting field lengths from frame boundaries.
> -
> -[11] Even if it does mean documenting design or implementation
> +[10] Even if it does mean documenting design or implementation
> mistakes!
>
>
> -[12] It was supposed to indicate segmentation offload support, but
> +[11] It was supposed to indicate segmentation offload support, but
> upon further investigation it became clear that multiple bits
> were required.
>
> -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
> +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
> dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
> features must offer the checksum feature, and a driver which
> accepts the offload features must accept the checksum feature.
> Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
> depending on VIRTIO_NET_F_GUEST_CSUM.
>
> -[14] This is a common restriction in real, older network cards.
> +[13] This is a common restriction in real, older network cards.
>
> -[15] For example, a network packet transported between two guests on
> +[14] For example, a network packet transported between two guests on
> the same system may not require checksumming at all, nor segmentation,
> if both guests are amenable.
>
> -[16] For example, consider a partially checksummed TCP (IPv4) packet.
> +[15] For example, consider a partially checksummed TCP (IPv4) packet.
> It will have a 14 byte ethernet header and 20 byte IP header
> followed by the TCP header (with the TCP checksum field 16 bytes
> into that header). csum_start will be 14+20 = 34 (the TCP
> @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones'
> complement checksum of the TCP header and body will give the
> correct result.
>
> -[17] Due to various bugs in implementations, this field is not useful
> +[16] Due to various bugs in implementations, this field is not useful
> as a guarantee of the transport header size.
>
> -[18] This case is not handled by some older hardware, so is called out
> +[17] This case is not handled by some older hardware, so is called out
> specifically in the protocol.
>
> -[19] Note that the header will be two bytes longer for the
> +[18] Note that the header will be two bytes longer for the
> VIRTIO_NET_F_MRG_RXBUF case.
>
> -[20] Obviously each one can be split across multiple descriptor
> +[19] Obviously each one can be split across multiple descriptor
> elements.
>
> -[21] Since there are no guarentees, it can use a hash filter or
> +[20] Since there are no guarentees, it can use a hash filter or
> silently switch to allmulti or promiscuous mode if it is given too
> many addresses.
>
> -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not
> +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not
> distinguish between them
>
> -[23] Because this is high importance and low bandwidth, the current
> +[22] Because this is high importance and low bandwidth, the current
> Linux implementation polls for the buffer to be used, rather than
> waiting for an interrupt, simplifying the implementation
> significantly. However, for generic serial ports with the
> @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the
> consumed buffers are freed upon the next write or poll call or
> when a port is closed or hot-unplugged.
>
> -[24] This is historical, and independent of the guest page size
> +[23] This is historical, and independent of the guest page size
>
> -[25] In this case, deflation advice is merely a courtesy
> +[24] In this case, deflation advice is merely a courtesy
>
> -[26] As updates to configuration space are not atomic, this field
> +[25] As updates to configuration space are not atomic, this field
> isn't particularly reliable, but can be used to diagnose buggy guests.