OASIS Virtual I/O Device (VIRTIO) TC

 View Only
Expand all | Collapse all

[PATCH 00/18] Feedback: conformance clauses and normative statements.

  • 1.  [PATCH 00/18] Feedback: conformance clauses and normative statements.

    Posted 02-19-2014 06:45
    Hi all, We are supposed to have real conformance clauses, which are supposed to refer to the normative requirements of the text. So I started trying to identify them, and I ended up with quite a large patchset which separates the explanatory from normative sections. (I haven't done SCSI yet, but I wanted to get this out today). I will work more on these. In particular, it labels all the normative requirements but I want to put each in their own sub(subsub) section for clarity and easy reference. I think the final result is clearer, but I appreciate comments. https://github.com/rustyrussell/virtio-wip feedback Thanks, Rusty. Rusty Russell (18): Feedback: Bug TAB-553 Feedback: move new device design section to Appendix. Feedback: use proper list in introduction. Feedback: add old draft to normative references. Feedback: hoist the one legacy-related requirement out of legacy section. Feedback: move legacy/transitional definitions into terminology. Feedback: 2.1 Device Status field: Separate description from normative. Feedback: add normative marker. Feedback: split Basic Facilities feature bits and config space into normative. Feedback: Normative split in Basic Facilities of a Virtio Device / Virtqueues Feedback: Normative split for Basic Facilities of a Virtio Device / Virtqueues / Message Framing Feedback: Separate the rest of chapter 2 into normative vs explanatory. PCI: Separate explanatory and normative text. MMIO: Separate normative and descriptive text. CCW: Separate normative and descriptive sections. Feedback: net: separate normative and instructional text. block: separate normative and descriptive text. Feedback: console & entropy: separate normative and descriptive texts. commands.tex 4 + content.tex 1515 ++++++++++++++++++++++++++++++++---------------------- headerfile.tex 8 + introduction.tex 47 +- main.tex 4 + newdevice.tex 66 +++ 6 files changed, 1020 insertions(+), 624 deletions(-) create mode 100644 headerfile.tex create mode 100644 newdevice.tex -- 1.8.3.2


  • 2.  [PATCH 14/18] MMIO: Separate normative and descriptive text.

    Posted 02-19-2014 06:45
    The section on initialization is now non-normative. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 134 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 74 insertions(+), 60 deletions(-) diff --git a/content.tex b/content.tex index b8e5cc8..b3dfccf 100644 --- a/content.tex +++ b/content.tex @@ -1694,14 +1694,9 @@ virtio_block@1e000 { MMIO virtio devices provides a set of memory mapped control registers followed by a device-specific configuration space, described in the table~
    ef{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}. -Driver MUST NOT access memory locations not explicitly described in the -table (or, in case of the configuration space, described in the device specification), -MUST NOT write to the read-only registers (direction R) and -MUST NOT read from the write-only registers (direction W). All register values are organized as Little Endian. -
    ewcommand{mmioreg}[5]{% Name Function Offset Direction Description {field{#1}}
    ewline #3
    ewline #4 & {f#2}
    ewline #5 \ } @@ -1726,23 +1721,20 @@ All register values are organized as Little Endian. endfoot endlastfoot mmioreg{MagicValue}{Magic value}{0x000}{R}{% - Device MUST return value 0x74726976 + 0x74726976 (a Little Endian equivalent of the "virt" string). - Driver MUST ignore device returning other values, - although it MAY report an error. } hline mmioreg{Version}{Device version number}{0x004}{R}{% - Devices compliant with this specification MUST return value 0x2. - Legacy device (see
    ef{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~
    ameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) MAY return value 0x1. - Driver MUST ignore device returning other values, - although it MAY report an error. + 0x2. + egin{note} + Legacy devices (see
    ef{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~
    ameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1. + end{note} } hline mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{% See
    ef{sec:Device Types}~
    ameref{sec:Device Types} for possible values. - Value zero (0x0) is invalid and driver MUST ignore such device - but MUST NOT report any error. This behaviour can be used to + Value zero (0x0) is used to define a system memory map with placeholder devices at static, well known addresses, assigning functions to them depending on user's needs. @@ -1762,9 +1754,7 @@ All register values are organized as Little Endian. hline mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{% Writing to this register selects a set of 32 device feature bits - accessible by reading from field{DeviceFeatures}. The driver - MUST write a value to field{DeviceFeaturesSel} before - reading from field{DeviceFeatures}. + accessible by reading from field{DeviceFeatures}. } hline mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{% @@ -1779,8 +1769,6 @@ All register values are organized as Little Endian. mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{% Writing to this register selects a set of 32 activated feature bits accessible by writing to field{DriverFeatures}. - The driver MUST write a value to the field{DriverFeaturesSel} - register before writing to the field{DriverFeatures} register. } hline mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{% @@ -1795,9 +1783,7 @@ All register values are organized as Little Endian. Reading from the register returns the maximum size (number of elements) of the queue the device is ready to process or zero (0x0) if the queue is not available. This applies to the - queue selected by writing to field{QueueSel}. The driver MUST NOT - access this register when the queue is in use (so when field{QueueReady} - is not zero). + queue selected by writing to field{QueueSel}. } hline mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{% @@ -1805,8 +1791,7 @@ All register values are organized as Little Endian. of the Descriptor Table and both Available and Used rings. Writing to this register notifies the device what size of the queue the driver will use. This applies to the queue selected by - writing to field{QueueSel}. The driver MUST NOT access this register when - the queue is in use (so when field{QueueReady} is not zero). + writing to field{QueueSel}. } hline mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{% @@ -1814,9 +1799,6 @@ All register values are organized as Little Endian. virtual queue is ready to be used. Reading from this register returns the last value written to it. Both read and write accesses apply to the queue selected by writing to field{QueueSel}. - When the driver wants to stop using the queue it MUST write - zero (0x0) to this register and MUST read the value back to - ensure synchronization. } hline mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{% @@ -1827,11 +1809,6 @@ All register values are organized as Little Endian. mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{% Reading from this register returns a bit mask of events that caused the device interrupt to be asserted. - From a moment when any of these events takes place, the - device MUST be returning a value with the related - bits set, ie. equal one (1), and all other bits cleared, - ie. equal zero (0), until the driver acknowledges the interrupt - by writing a corresponding bit mask to the InterruptACK register. The following events are possible: egin{description} item[Used Ring Update] - bit 0 - the interrupt was asserted @@ -1840,17 +1817,11 @@ All register values are organized as Little Endian. item [Configuration Change] - bit 1 - the interrupt was asserted because the configuration of the device has changed. end{description} - Other bits of the value are reserved for future use and the - driver MUST ignore them. } hline mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{% Writing to this register notifies the device that the interrupt - has been handled. - When the driver finishes handling an interrupt, it MUST write - a value to this register with bits corresponding to the handled - events (as defined for field{InterruptStatus}) set, ie. - equal one (1), and all other bits cleared, ie. equal zero (0). + has been handled, as per values for {InterruptStatus}. } hline mmioreg{Status}{Device status}{0x070}{RW}{% @@ -1858,9 +1829,7 @@ All register values are organized as Little Endian. flags. Writing non-zero values to this register sets the status flags, indicating the driver progress. Writing zero (0x0) to this - register triggers a device reset, including clearing all - bits in field{InterruptStatus} and ready bits in the - field{QueueReady} register for all queues in the device. + register triggers a device reset. See also p.
    ef{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~
    ameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}. } hline @@ -1868,34 +1837,25 @@ All register values are organized as Little Endian. Writing to these two registers (lower 32 bits of the address to field{QueueDescLow}, higher 32 bits to field{QueueDescHigh}) notifies the device about location of the Descriptor Table of the queue - selected by writing to field{QueueSel} register. The driver MUST NOT - access this register when the queue is in use (so when field{QueueReady} - is not zero). + selected by writing to field{QueueSel} register. } hline mmiodreg{QueueAvailLow}{QueueAvailHigh}{Virtual queue's Available Ring 64 bit long physical address}{0x090}{0x094}{W}{% Writing to these two registers (lower 32 bits of the address to field{QueueAvailLow}, higher 32 bits to field{QueueAvailHigh}) notifies the device about location of the Available Ring of the queue - selected by writing to field{QueueSel}. The driver MUST NOT - access this register when the queue is in use (so when field{QueueReady} - is not zero). + selected by writing to field{QueueSel}. } hline mmiodreg{QueueUsedLow}{QueueUsedHigh}{Virtual queue's Used Ring 64 bit long physical address}{0x0a0}{0x0a4}{W}{% Writing to these two registers (lower 32 bits of the address to field{QueueUsedLow}, higher 32 bits to field{QueueUsedHigh}) notifies the device about location of the Used Ring of the queue - selected by writing to field{QueueSel}. The driver MUST NOT - access this register when the queue is in use (so when field{QueueReady} - is not zero). + selected by writing to field{QueueSel}. } hline mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{ - Changes every time the configuration noticeably changes. This - means the device may only change the value after a configuration - read operation, but it MUST change if there is any risk of a - device seeing an inconsistent configuration state. + Changes every time the configuration noticeably changes (see
    ef {sec:Basic Facilities of a Virtio Device / Device Configuration Space}. } hline mmioreg{Config}{Configuration space}{0x100+}{RW}{ @@ -1906,10 +1866,65 @@ All register values are organized as Little Endian. hline end{longtable} +devicenormative{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout} + +The device MUST return 0x74726976 in field{MagicValue}. + +The device MUST return value 0x2 in field{Version}. + +The device MUST present each event by setting the corresponding bit in field{InterruptStatus} from the +moment it takes place, until the driver acknowledges the interrupt +by writing a corresponding bit mask to the InterruptACK register. Bits which +do not represent events which took place MUST be zero. + +Upon reset, the device MUST clear all bits in field{InterruptStatus} and ready bits in the +field{QueueReady} register for all queues in the device. + +The device MUST change field{ConfigGeneration} if there is any risk of a +device seeing an inconsistent configuration state, but it MAY only change the value +after a configuration read operation. + +drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout} +The driver MUST NOT access memory locations not explicitly described in the +table (or, in case of the configuration space, described in the device specification), +MUST NOT write to the read-only registers (direction R) and +MUST NOT read from the write-only registers (direction W). + +The driver MUST ignore a device with field{MagicValue} which is not 0x74726976, +although it MAY report an error. + +The driver MUST ignore a device with field{Version} which is not 0x2, +although it MAY report an error. + +The driver MUST ignore a device with field{DeviceID} 0x0, +but MUST NOT report any error. + +Before reading from field{DeviceFeatures}, the driver MUST write a value to field{DeviceFeaturesSel}. + +Before writing to the field{DriverFeatures} register, the driver MUST write a value to the field{DriverFeaturesSel} register. + +The driver MUST write a value to field{QueueNum} which is less than +or equal to the value presented by the device. + +When field{QueueReady} is not zero, the driver MUST NOT access +field{QueueNum}, field{QueueDescLow}, field{QueueDescHigh}, +field{QueueAvailLow}, field{QueueAvailHigh}, field{QueueUsedLow}, field{QueueUsedHigh} + +To stop using the queue the driver MUST write zero (0x0) to this +field{QueueReady} and MUST read the value back to ensure +synchronization. + +The driver MUST ignore undefined bits in field{InterruptStatus}. + +The MUST write the events it handled into field{InterruptACK} when +it finishes handling an interrupt. + subsection{MMIO-specific Initialization And Device Operation}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation} subsubsection{Device Initialization}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization} +drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization} + The driver MUST start the device initialization by reading and checking values from field{MagicValue} and field{Version}. If both values are valid, it MUST read field{DeviceID} @@ -1921,7 +1936,7 @@ Further initialization MUST follow the procedure described in subsubsection{Virtqueue Configuration}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Virtqueue Configuration} -The driver MUST initialize the virtual queue in the following way: +The driver will typically initialize the virtual queue in the following way: egin{enumerate} item Select the queue writing its index (first queue is 0) to @@ -1937,8 +1952,6 @@ The driver MUST initialize the virtual queue in the following way: item Allocate and zero the queue pages, making sure the memory is physically contiguous. It is recommended to align the Used Ring to an optimal boundary (usually the page size). - Size of the allocated queue MUST be smaller than or equal to - the maximum size returned by the device. item Notify the device about the queue size by writing the size to field{QueueNum}. @@ -1954,7 +1967,7 @@ The driver MUST initialize the virtual queue in the following way: subsubsection{Notifying The Device}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifying The Device} -The driver MUST notify the device about new buffers being available in +The driver notifies the device about new buffers being available in a queue by writing the index of the updated queue to field{QueueNotify}. subsubsection{Notifications From The Device}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device} @@ -1962,10 +1975,11 @@ a queue by writing the index of the updated queue to field{QueueNotify}. The memory mapped virtio device is using a single, dedicated interrupt signal, which is asserted when at least one of the bits described in the description of field{InterruptStatus} -is set. This way the device may notify the +is set. This is how the device notifies the driver about a new used buffer being available in the queue or about a change in the device configuration. +drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device} After receiving an interrupt, the driver MUST read field{InterruptStatus} to check what caused the interrupt (see the register description). After the interrupt is handled, -- 1.8.3.2


  • 3.  Re: [virtio] [PATCH 14/18] MMIO: Separate normative and descriptive text.

    Posted 02-20-2014 10:09
    On Wed, 2014-02-19 at 06:39 +0000, Rusty Russell wrote:
    > The section on initialization is now non-normative.
    >
    > Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
    > ---
    > content.tex | 134 +++++++++++++++++++++++++++++++++---------------------------
    > 1 file changed, 74 insertions(+), 60 deletions(-)
    >
    > diff --git a/content.tex b/content.tex
    > index b8e5cc8..b3dfccf 100644
    > --- a/content.tex
    > +++ b/content.tex
    > @@ -1694,14 +1694,9 @@ virtio_block@1e000 {
    > MMIO virtio devices provides a set of memory mapped control
    > registers followed by a device-specific configuration space,
    > described in the table~\ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
    > -Driver MUST NOT access memory locations not explicitly described in the
    > -table (or, in case of the configuration space, described in the device specification),
    > -MUST NOT write to the read-only registers (direction R) and
    > -MUST NOT read from the write-only registers (direction W).
    >
    > All register values are organized as Little Endian.
    >
    > -
    > \newcommand{\mmioreg}[5]{% Name Function Offset Direction Description
    > {\field{#1}} \newline #3 \newline #4 & {\bf#2} \newline #5 \\
    > }
    > @@ -1726,23 +1721,20 @@ All register values are organized as Little Endian.
    > \endfoot
    > \endlastfoot
    > \mmioreg{MagicValue}{Magic value}{0x000}{R}{%
    > - Device MUST return value 0x74726976
    > + 0x74726976
    > (a Little Endian equivalent of the "virt" string).
    > - Driver MUST ignore device returning other values,
    > - although it MAY report an error.
    > }
    > \hline
    > \mmioreg{Version}{Device version number}{0x004}{R}{%
    > - Devices compliant with this specification MUST return value 0x2.
    > - Legacy device (see \ref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) MAY return value 0x1.
    > - Driver MUST ignore device returning other values,
    > - although it MAY report an error.
    > + 0x2.
    > + \begin{note}
    > + Legacy devices (see \ref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1.
    > + \end{note}
    > }
    > \hline
    > \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{%
    > See \ref{sec:Device Types}~\nameref{sec:Device Types} for possible values.
    > - Value zero (0x0) is invalid and driver MUST ignore such device
    > - but MUST NOT report any error. This behaviour can be used to
    > + Value zero (0x0) is used to
    > define a system memory map with placeholder devices at static,
    > well known addresses, assigning functions to them depending
    > on user's needs.
    > @@ -1762,9 +1754,7 @@ All register values are organized as Little Endian.
    > \hline
    > \mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{%
    > Writing to this register selects a set of 32 device feature bits
    > - accessible by reading from \field{DeviceFeatures}. The driver
    > - MUST write a value to \field{DeviceFeaturesSel} before
    > - reading from \field{DeviceFeatures}.
    > + accessible by reading from \field{DeviceFeatures}.
    > }
    > \hline
    > \mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{%
    > @@ -1779,8 +1769,6 @@ All register values are organized as Little Endian.
    > \mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{%
    > Writing to this register selects a set of 32 activated feature
    > bits accessible by writing to \field{DriverFeatures}.
    > - The driver MUST write a value to the \field{DriverFeaturesSel}
    > - register before writing to the \field{DriverFeatures} register.
    > }
    > \hline
    > \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
    > @@ -1795,9 +1783,7 @@ All register values are organized as Little Endian.
    > Reading from the register returns the maximum size (number of
    > elements) of the queue the device is ready to process or
    > zero (0x0) if the queue is not available. This applies to the
    > - queue selected by writing to \field{QueueSel}. The driver MUST NOT
    > - access this register when the queue is in use (so when \field{QueueReady}
    > - is not zero).
    > + queue selected by writing to \field{QueueSel}.
    > }
    > \hline
    > \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
    > @@ -1805,8 +1791,7 @@ All register values are organized as Little Endian.
    > of the Descriptor Table and both Available and Used rings.
    > Writing to this register notifies the device what size of the
    > queue the driver will use. This applies to the queue selected by
    > - writing to \field{QueueSel}. The driver MUST NOT access this register when
    > - the queue is in use (so when \field{QueueReady} is not zero).
    > + writing to \field{QueueSel}.
    > }
    > \hline
    > \mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{%
    > @@ -1814,9 +1799,6 @@ All register values are organized as Little Endian.
    > virtual queue is ready to be used. Reading from this register
    > returns the last value written to it. Both read and write
    > accesses apply to the queue selected by writing to \field{QueueSel}.
    > - When the driver wants to stop using the queue it MUST write
    > - zero (0x0) to this register and MUST read the value back to
    > - ensure synchronization.
    > }
    > \hline
    > \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{%
    > @@ -1827,11 +1809,6 @@ All register values are organized as Little Endian.
    > \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
    > Reading from this register returns a bit mask of events that
    > caused the device interrupt to be asserted.
    > - From a moment when any of these events takes place, the
    > - device MUST be returning a value with the related
    > - bits set, ie. equal one (1), and all other bits cleared,
    > - ie. equal zero (0), until the driver acknowledges the interrupt
    > - by writing a corresponding bit mask to the InterruptACK register.
    > The following events are possible:
    > \begin{description}
    > \item[Used Ring Update] - bit 0 - the interrupt was asserted
    > @@ -1840,17 +1817,11 @@ All register values are organized as Little Endian.
    > \item [Configuration Change] - bit 1 - the interrupt was
    > asserted because the configuration of the device has changed.
    > \end{description}
    > - Other bits of the value are reserved for future use and the
    > - driver MUST ignore them.
    > }
    > \hline
    > \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{%
    > Writing to this register notifies the device that the interrupt
    > - has been handled.
    > - When the driver finishes handling an interrupt, it MUST write
    > - a value to this register with bits corresponding to the handled
    > - events (as defined for \field{InterruptStatus}) set, ie.
    > - equal one (1), and all other bits cleared, ie. equal zero (0).
    > + has been handled, as per values for {InterruptStatus}.
    > }
    > \hline
    > \mmioreg{Status}{Device status}{0x070}{RW}{%
    > @@ -1858,9 +1829,7 @@ All register values are organized as Little Endian.
    > flags.
    > Writing non-zero values to this register sets the status flags,
    > indicating the driver progress. Writing zero (0x0) to this
    > - register triggers a device reset, including clearing all
    > - bits in \field{InterruptStatus} and ready bits in the
    > - \field{QueueReady} register for all queues in the device.
    > + register triggers a device reset.
    > See also p. \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}.
    > }
    > \hline
    > @@ -1868,34 +1837,25 @@ All register values are organized as Little Endian.
    > Writing to these two registers (lower 32 bits of the address
    > to \field{QueueDescLow}, higher 32 bits to \field{QueueDescHigh}) notifies
    > the device about location of the Descriptor Table of the queue
    > - selected by writing to \field{QueueSel} register. The driver MUST NOT
    > - access this register when the queue is in use (so when \field{QueueReady}
    > - is not zero).
    > + selected by writing to \field{QueueSel} register.
    > }
    > \hline
    > \mmiodreg{QueueAvailLow}{QueueAvailHigh}{Virtual queue's Available Ring 64 bit long physical address}{0x090}{0x094}{W}{%
    > Writing to these two registers (lower 32 bits of the address
    > to \field{QueueAvailLow}, higher 32 bits to \field{QueueAvailHigh}) notifies
    > the device about location of the Available Ring of the queue
    > - selected by writing to \field{QueueSel}. The driver MUST NOT
    > - access this register when the queue is in use (so when \field{QueueReady}
    > - is not zero).
    > + selected by writing to \field{QueueSel}.
    > }
    > \hline
    > \mmiodreg{QueueUsedLow}{QueueUsedHigh}{Virtual queue's Used Ring 64 bit long physical address}{0x0a0}{0x0a4}{W}{%
    > Writing to these two registers (lower 32 bits of the address
    > to \field{QueueUsedLow}, higher 32 bits to \field{QueueUsedHigh}) notifies
    > the device about location of the Used Ring of the queue
    > - selected by writing to \field{QueueSel}. The driver MUST NOT
    > - access this register when the queue is in use (so when \field{QueueReady}
    > - is not zero).
    > + selected by writing to \field{QueueSel}.
    > }
    > \hline
    > \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
    > - Changes every time the configuration noticeably changes. This
    > - means the device may only change the value after a configuration
    > - read operation, but it MUST change if there is any risk of a
    > - device seeing an inconsistent configuration state.
    > + Changes every time the configuration noticeably changes (see \ref {sec:Basic Facilities of a Virtio Device / Device Configuration Space}.
    > }
    > \hline
    > \mmioreg{Config}{Configuration space}{0x100+}{RW}{
    > @@ -1906,10 +1866,65 @@ All register values are organized as Little Endian.
    > \hline
    > \end{longtable}
    >
    > +\devicenormative{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
    > +
    > +The device MUST return 0x74726976 in \field{MagicValue}.
    > +
    > +The device MUST return value 0x2 in \field{Version}.
    > +
    > +The device MUST present each event by setting the corresponding bit in \field{InterruptStatus} from the
    > +moment it takes place, until the driver acknowledges the interrupt
    > +by writing a corresponding bit mask to the InterruptACK register. Bits which
    > +do not represent events which took place MUST be zero.
    > +
    > +Upon reset, the device MUST clear all bits in \field{InterruptStatus} and ready bits in the
    > +\field{QueueReady} register for all queues in the device.
    > +
    > +The device MUST change \field{ConfigGeneration} if there is any risk of a
    > +device seeing an inconsistent configuration state, but it MAY only change the value
    > +after a configuration read operation.
    > +
    > +\drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
    > +The driver MUST NOT access memory locations not explicitly described in the
    > +table (or, in case of the configuration space, described in the device specification),
    > +MUST NOT write to the read-only registers (direction R) and
    > +MUST NOT read from the write-only registers (direction W).
    > +
    > +The driver MUST ignore a device with \field{MagicValue} which is not 0x74726976,
    > +although it MAY report an error.
    > +
    > +The driver MUST ignore a device with \field{Version} which is not 0x2,
    > +although it MAY report an error.
    > +
    > +The driver MUST ignore a device with \field{DeviceID} 0x0,
    > +but MUST NOT report any error.
    > +
    > +Before reading from \field{DeviceFeatures}, the driver MUST write a value to \field{DeviceFeaturesSel}.
    > +
    > +Before writing to the \field{DriverFeatures} register, the driver MUST write a value to the \field{DriverFeaturesSel} register.
    > +
    > +The driver MUST write a value to \field{QueueNum} which is less than
    > +or equal to the value presented by the device.
    > +
    > +When \field{QueueReady} is not zero, the driver MUST NOT access
    > +\field{QueueNum}, \field{QueueDescLow}, \field{QueueDescHigh},
    > +\field{QueueAvailLow}, \field{QueueAvailHigh}, \field{QueueUsedLow}, \field{QueueUsedHigh}
    > +
    > +To stop using the queue the driver MUST write zero (0x0) to this
    > +\field{QueueReady} and MUST read the value back to ensure
    > +synchronization.
    > +
    > +The driver MUST ignore undefined bits in \field{InterruptStatus}.
    > +
    > +The MUST write the events it handled into \field{InterruptACK} when
    > +it finishes handling an interrupt.
    > +
    > \subsection{MMIO-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation}
    >
    > \subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
    >
    > +\drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
    > +
    > The driver MUST start the device initialization by reading and
    > checking values from \field{MagicValue} and \field{Version}.
    > If both values are valid, it MUST read \field{DeviceID}
    > @@ -1921,7 +1936,7 @@ Further initialization MUST follow the procedure described in
    >
    > \subsubsection{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Virtqueue Configuration}
    >
    > -The driver MUST initialize the virtual queue in the following way:
    > +The driver will typically initialize the virtual queue in the following way:
    >
    > \begin{enumerate}
    > \item Select the queue writing its index (first queue is 0) to
    > @@ -1937,8 +1952,6 @@ The driver MUST initialize the virtual queue in the following way:
    > \item Allocate and zero the queue pages, making sure the memory
    > is physically contiguous. It is recommended to align the
    > Used Ring to an optimal boundary (usually the page size).
    > - Size of the allocated queue MUST be smaller than or equal to
    > - the maximum size returned by the device.
    >
    > \item Notify the device about the queue size by writing the size to
    > \field{QueueNum}.
    > @@ -1954,7 +1967,7 @@ The driver MUST initialize the virtual queue in the following way:
    >
    > \subsubsection{Notifying The Device}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifying The Device}
    >
    > -The driver MUST notify the device about new buffers being available in
    > +The driver notifies the device about new buffers being available in
    > a queue by writing the index of the updated queue to \field{QueueNotify}.
    >
    > \subsubsection{Notifications From The Device}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
    > @@ -1962,10 +1975,11 @@ a queue by writing the index of the updated queue to \field{QueueNotify}.
    > The memory mapped virtio device is using a single, dedicated
    > interrupt signal, which is asserted when at least one of the
    > bits described in the description of \field{InterruptStatus}
    > -is set. This way the device may notify the
    > +is set. This is how the device notifies the
    > driver about a new used buffer being available in the queue
    > or about a change in the device configuration.
    >
    > +\drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
    > After receiving an interrupt, the driver MUST read
    > \field{InterruptStatus} to check what caused the interrupt
    > (see the register description). After the interrupt is handled,
    > --
    > 1.8.3.2

    Acked-by: Pawel Moll <pawel.moll@arm.com>

    Cheers!

    Pawel




  • 4.  Re: [virtio] [PATCH 14/18] MMIO: Separate normative and descriptive text.

    Posted 02-20-2014 10:09
    On Wed, 2014-02-19 at 06:39 +0000, Rusty Russell wrote: > The section on initialization is now non-normative. > > Signed-off-by: Rusty Russell <rusty@au1.ibm.com> > --- > content.tex 134 +++++++++++++++++++++++++++++++++--------------------------- > 1 file changed, 74 insertions(+), 60 deletions(-) > > diff --git a/content.tex b/content.tex > index b8e5cc8..b3dfccf 100644 > --- a/content.tex > +++ b/content.tex > @@ -1694,14 +1694,9 @@ virtio_block@1e000 { > MMIO virtio devices provides a set of memory mapped control > registers followed by a device-specific configuration space, > described in the table~
    ef{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}. > -Driver MUST NOT access memory locations not explicitly described in the > -table (or, in case of the configuration space, described in the device specification), > -MUST NOT write to the read-only registers (direction R) and > -MUST NOT read from the write-only registers (direction W). > > All register values are organized as Little Endian. > > - >
    ewcommand{mmioreg}[5]{% Name Function Offset Direction Description > {field{#1}}
    ewline #3
    ewline #4 & {f#2}
    ewline #5 \ > } > @@ -1726,23 +1721,20 @@ All register values are organized as Little Endian. > endfoot > endlastfoot > mmioreg{MagicValue}{Magic value}{0x000}{R}{% > - Device MUST return value 0x74726976 > + 0x74726976 > (a Little Endian equivalent of the "virt" string). > - Driver MUST ignore device returning other values, > - although it MAY report an error. > } > hline > mmioreg{Version}{Device version number}{0x004}{R}{% > - Devices compliant with this specification MUST return value 0x2. > - Legacy device (see
    ef{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~
    ameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) MAY return value 0x1. > - Driver MUST ignore device returning other values, > - although it MAY report an error. > + 0x2. > + egin{note} > + Legacy devices (see
    ef{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~
    ameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1. > + end{note} > } > hline > mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{% > See
    ef{sec:Device Types}~
    ameref{sec:Device Types} for possible values. > - Value zero (0x0) is invalid and driver MUST ignore such device > - but MUST NOT report any error. This behaviour can be used to > + Value zero (0x0) is used to > define a system memory map with placeholder devices at static, > well known addresses, assigning functions to them depending > on user's needs. > @@ -1762,9 +1754,7 @@ All register values are organized as Little Endian. > hline > mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{% > Writing to this register selects a set of 32 device feature bits > - accessible by reading from field{DeviceFeatures}. The driver > - MUST write a value to field{DeviceFeaturesSel} before > - reading from field{DeviceFeatures}. > + accessible by reading from field{DeviceFeatures}. > } > hline > mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{% > @@ -1779,8 +1769,6 @@ All register values are organized as Little Endian. > mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{% > Writing to this register selects a set of 32 activated feature > bits accessible by writing to field{DriverFeatures}. > - The driver MUST write a value to the field{DriverFeaturesSel} > - register before writing to the field{DriverFeatures} register. > } > hline > mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{% > @@ -1795,9 +1783,7 @@ All register values are organized as Little Endian. > Reading from the register returns the maximum size (number of > elements) of the queue the device is ready to process or > zero (0x0) if the queue is not available. This applies to the > - queue selected by writing to field{QueueSel}. The driver MUST NOT > - access this register when the queue is in use (so when field{QueueReady} > - is not zero). > + queue selected by writing to field{QueueSel}. > } > hline > mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{% > @@ -1805,8 +1791,7 @@ All register values are organized as Little Endian. > of the Descriptor Table and both Available and Used rings. > Writing to this register notifies the device what size of the > queue the driver will use. This applies to the queue selected by > - writing to field{QueueSel}. The driver MUST NOT access this register when > - the queue is in use (so when field{QueueReady} is not zero). > + writing to field{QueueSel}. > } > hline > mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{% > @@ -1814,9 +1799,6 @@ All register values are organized as Little Endian. > virtual queue is ready to be used. Reading from this register > returns the last value written to it. Both read and write > accesses apply to the queue selected by writing to field{QueueSel}. > - When the driver wants to stop using the queue it MUST write > - zero (0x0) to this register and MUST read the value back to > - ensure synchronization. > } > hline > mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{% > @@ -1827,11 +1809,6 @@ All register values are organized as Little Endian. > mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{% > Reading from this register returns a bit mask of events that > caused the device interrupt to be asserted. > - From a moment when any of these events takes place, the > - device MUST be returning a value with the related > - bits set, ie. equal one (1), and all other bits cleared, > - ie. equal zero (0), until the driver acknowledges the interrupt > - by writing a corresponding bit mask to the InterruptACK register. > The following events are possible: > egin{description} > item[Used Ring Update] - bit 0 - the interrupt was asserted > @@ -1840,17 +1817,11 @@ All register values are organized as Little Endian. > item [Configuration Change] - bit 1 - the interrupt was > asserted because the configuration of the device has changed. > end{description} > - Other bits of the value are reserved for future use and the > - driver MUST ignore them. > } > hline > mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{% > Writing to this register notifies the device that the interrupt > - has been handled. > - When the driver finishes handling an interrupt, it MUST write > - a value to this register with bits corresponding to the handled > - events (as defined for field{InterruptStatus}) set, ie. > - equal one (1), and all other bits cleared, ie. equal zero (0). > + has been handled, as per values for {InterruptStatus}. > } > hline > mmioreg{Status}{Device status}{0x070}{RW}{% > @@ -1858,9 +1829,7 @@ All register values are organized as Little Endian. > flags. > Writing non-zero values to this register sets the status flags, > indicating the driver progress. Writing zero (0x0) to this > - register triggers a device reset, including clearing all > - bits in field{InterruptStatus} and ready bits in the > - field{QueueReady} register for all queues in the device. > + register triggers a device reset. > See also p.
    ef{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~
    ameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}. > } > hline > @@ -1868,34 +1837,25 @@ All register values are organized as Little Endian. > Writing to these two registers (lower 32 bits of the address > to field{QueueDescLow}, higher 32 bits to field{QueueDescHigh}) notifies > the device about location of the Descriptor Table of the queue > - selected by writing to field{QueueSel} register. The driver MUST NOT > - access this register when the queue is in use (so when field{QueueReady} > - is not zero). > + selected by writing to field{QueueSel} register. > } > hline > mmiodreg{QueueAvailLow}{QueueAvailHigh}{Virtual queue's Available Ring 64 bit long physical address}{0x090}{0x094}{W}{% > Writing to these two registers (lower 32 bits of the address > to field{QueueAvailLow}, higher 32 bits to field{QueueAvailHigh}) notifies > the device about location of the Available Ring of the queue > - selected by writing to field{QueueSel}. The driver MUST NOT > - access this register when the queue is in use (so when field{QueueReady} > - is not zero). > + selected by writing to field{QueueSel}. > } > hline > mmiodreg{QueueUsedLow}{QueueUsedHigh}{Virtual queue's Used Ring 64 bit long physical address}{0x0a0}{0x0a4}{W}{% > Writing to these two registers (lower 32 bits of the address > to field{QueueUsedLow}, higher 32 bits to field{QueueUsedHigh}) notifies > the device about location of the Used Ring of the queue > - selected by writing to field{QueueSel}. The driver MUST NOT > - access this register when the queue is in use (so when field{QueueReady} > - is not zero). > + selected by writing to field{QueueSel}. > } > hline > mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{ > - Changes every time the configuration noticeably changes. This > - means the device may only change the value after a configuration > - read operation, but it MUST change if there is any risk of a > - device seeing an inconsistent configuration state. > + Changes every time the configuration noticeably changes (see
    ef {sec:Basic Facilities of a Virtio Device / Device Configuration Space}. > } > hline > mmioreg{Config}{Configuration space}{0x100+}{RW}{ > @@ -1906,10 +1866,65 @@ All register values are organized as Little Endian. > hline > end{longtable} > > +devicenormative{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout} > + > +The device MUST return 0x74726976 in field{MagicValue}. > + > +The device MUST return value 0x2 in field{Version}. > + > +The device MUST present each event by setting the corresponding bit in field{InterruptStatus} from the > +moment it takes place, until the driver acknowledges the interrupt > +by writing a corresponding bit mask to the InterruptACK register. Bits which > +do not represent events which took place MUST be zero. > + > +Upon reset, the device MUST clear all bits in field{InterruptStatus} and ready bits in the > +field{QueueReady} register for all queues in the device. > + > +The device MUST change field{ConfigGeneration} if there is any risk of a > +device seeing an inconsistent configuration state, but it MAY only change the value > +after a configuration read operation. > + > +drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout} > +The driver MUST NOT access memory locations not explicitly described in the > +table (or, in case of the configuration space, described in the device specification), > +MUST NOT write to the read-only registers (direction R) and > +MUST NOT read from the write-only registers (direction W). > + > +The driver MUST ignore a device with field{MagicValue} which is not 0x74726976, > +although it MAY report an error. > + > +The driver MUST ignore a device with field{Version} which is not 0x2, > +although it MAY report an error. > + > +The driver MUST ignore a device with field{DeviceID} 0x0, > +but MUST NOT report any error. > + > +Before reading from field{DeviceFeatures}, the driver MUST write a value to field{DeviceFeaturesSel}. > + > +Before writing to the field{DriverFeatures} register, the driver MUST write a value to the field{DriverFeaturesSel} register. > + > +The driver MUST write a value to field{QueueNum} which is less than > +or equal to the value presented by the device. > + > +When field{QueueReady} is not zero, the driver MUST NOT access > +field{QueueNum}, field{QueueDescLow}, field{QueueDescHigh}, > +field{QueueAvailLow}, field{QueueAvailHigh}, field{QueueUsedLow}, field{QueueUsedHigh} > + > +To stop using the queue the driver MUST write zero (0x0) to this > +field{QueueReady} and MUST read the value back to ensure > +synchronization. > + > +The driver MUST ignore undefined bits in field{InterruptStatus}. > + > +The MUST write the events it handled into field{InterruptACK} when > +it finishes handling an interrupt. > + > subsection{MMIO-specific Initialization And Device Operation}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation} > > subsubsection{Device Initialization}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization} > > +drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization} > + > The driver MUST start the device initialization by reading and > checking values from field{MagicValue} and field{Version}. > If both values are valid, it MUST read field{DeviceID} > @@ -1921,7 +1936,7 @@ Further initialization MUST follow the procedure described in > > subsubsection{Virtqueue Configuration}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Virtqueue Configuration} > > -The driver MUST initialize the virtual queue in the following way: > +The driver will typically initialize the virtual queue in the following way: > > egin{enumerate} > item Select the queue writing its index (first queue is 0) to > @@ -1937,8 +1952,6 @@ The driver MUST initialize the virtual queue in the following way: > item Allocate and zero the queue pages, making sure the memory > is physically contiguous. It is recommended to align the > Used Ring to an optimal boundary (usually the page size). > - Size of the allocated queue MUST be smaller than or equal to > - the maximum size returned by the device. > > item Notify the device about the queue size by writing the size to > field{QueueNum}. > @@ -1954,7 +1967,7 @@ The driver MUST initialize the virtual queue in the following way: > > subsubsection{Notifying The Device}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifying The Device} > > -The driver MUST notify the device about new buffers being available in > +The driver notifies the device about new buffers being available in > a queue by writing the index of the updated queue to field{QueueNotify}. > > subsubsection{Notifications From The Device}label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device} > @@ -1962,10 +1975,11 @@ a queue by writing the index of the updated queue to field{QueueNotify}. > The memory mapped virtio device is using a single, dedicated > interrupt signal, which is asserted when at least one of the > bits described in the description of field{InterruptStatus} > -is set. This way the device may notify the > +is set. This is how the device notifies the > driver about a new used buffer being available in the queue > or about a change in the device configuration. > > +drivernormative{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device} > After receiving an interrupt, the driver MUST read > field{InterruptStatus} to check what caused the interrupt > (see the register description). After the interrupt is handled, > -- > 1.8.3.2 Acked-by: Pawel Moll <pawel.moll@arm.com> Cheers! Pawel


  • 5.  [PATCH 03/18] Feedback: use proper list in introduction.

    Posted 02-19-2014 06:45
    Also avoid extra spacing before footnote markers. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- introduction.tex 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/introduction.tex b/introduction.tex index ba57419..146042d 100644 --- a/introduction.tex +++ b/introduction.tex @@ -2,22 +2,22 @@ input{abstract.tex} - Straightforward: Virtio devices use normal bus mechanisms of +egin{description} +item[Straightforward:] Virtio devices use normal bus mechanisms of interrupts and DMA which should be familiar to any device driver author. There is no exotic page-flipping or COW mechanism: it's just - a normal device. -footnote{This lack of page-sharing implies that the implementation of the + a normal device.footnote{This lack of page-sharing implies that the implementation of the device (e.g. the hypervisor or host) needs full access to the guest memory. Communication with untrusted parties (i.e. inter-guest communication) requires copying. } - Efficient: Virtio devices consist of rings of descriptors +item[Efficient:] Virtio devices consist of rings of descriptors for both input and output, which are neatly laid out to avoid cache effects from both driver and device writing to the same cache lines. - Standard: Virtio makes no assumptions about the environment in which +item[Standard:] Virtio makes no assumptions about the environment in which it operates, beyond supporting the bus to which device is attached. In this specification, virtio devices are implemented over MMIO, Channel I/O and PCI bus transports @@ -27,11 +27,12 @@ between different transports. }, earlier drafts have been implemented on other buses not included here. - Extensible: Virtio devices contain feature bits which are +item[Extensible:] Virtio devices contain feature bits which are acknowledged by the guest operating system during device setup. This allows forwards and backwards compatibility: the device offers all the features it knows about, and the driver acknowledges those it understands and wishes to use. +end{description} section{Terminology}label{Terminology} -- 1.8.3.2


  • 6.  [PATCH 09/18] Feedback: split Basic Facilities feature bits and config space into normative.

    Posted 02-19-2014 06:45
    Split text into descriptive and normative. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 74 +++++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 45 insertions(+), 29 deletions(-) diff --git a/content.tex b/content.tex index dd9b8a7..a2b83cd 100644 --- a/content.tex +++ b/content.tex @@ -58,18 +58,8 @@ the device. This allows for forwards and backwards compatibility: if the device is enhanced with a new feature bit, older drivers will not write that -feature bit back to the device and it SHOULD go into backwards -compatibility mode. Similarly, if a driver is enhanced with a feature -that the device doesn't support, it see the new feature is not offered -and SHOULD go into backwards compatibility mode (or, for poor -implementations it MAY set the FAILED Device Status bit). - -The driver MUST NOT accept a feature which the device did not offer, -and MUST NOT accept a feature which requires another feature which was -not accepted. - -The device MUST NOT offer a feature which requires another feature -which was not offered. +feature bit back to the device. Similarly, if a driver is enhanced with a feature +that the device doesn't support, it see the new feature is not offered. Feature bits are allocated as follows: @@ -82,14 +72,29 @@ Feature bits are allocated as follows: item[33 and above] Feature bits reserved for future extensions. end{description} +egin{note} For example, feature bit 0 for a network device (i.e. Subsystem Device ID 1) indicates that the device supports checksumming of packets. +end{note} In particular, new fields in the device configuration space are -indicated by offering a feature bit, so the driver MUST check that the -feature is offered before accessing that part of the configuration -space. +indicated by offering a new feature bit. + +drivernormative{Basic Facilities of a Virtio Device / Feature Bits} +The driver MUST NOT accept a feature which the device did not offer, +and MUST NOT accept a feature which requires another feature which was +not accepted. + +The driver SHOULD go into backwards compatibility mode +if the device does not offer a feature it understands, otherwise MUST +set the FAILED field{device status} bit and cease initialization. + +devicenormative{Basic Facilities of a Virtio Device / Feature Bits} +The device MUST NOT offer a feature which requires another feature +which was not offered. The device SHOULD accept any valid subset +of features the driver accepts, otherwise it MUST fail to set the +FEATURES_OK field{device status} bit when the driver writes it. subsection{Legacy Interface: A Note on transitions from earlier drafts}label{sec:Basic Facilities of a Virtio Device / Feature Bits / Legacy Interface: A Note on transitions from earlier drafts} @@ -112,16 +117,25 @@ In this case device is used through the legacy interface. section{Device Configuration Space}label{sec:Basic Facilities of a Virtio Device / Device Configuration Space} Device configuration space is generally used for rarely-changing or -initialization-time parameters. Drivers MUST NOT assume reads from -fields greater than 32 bits wide are atomic, nor reads from -multiple fields. +initialization-time parameters. Where configuration fields are +optional, their existence is indicated by feature bits: Future +versions of this specification will likely extend the device +configuration space by adding extra fields at the tail. -Each transport provides a generation count for the device configuration -space, which must change whenever there is a possibility that two +egin{note} +The device configuration space uses the little-endian format +for multi-byte fields. +end{note} + +Each transport also provides a generation count for the device configuration +space, which will change whenever there is a possibility that two accesses to the device configuration space can see different versions of that space. -Thus drivers SHOULD read device configuration space fields like so: +drivernormative{Basic Facilities of a Virtio Device / Device Configuration Space} +Drivers MUST NOT assume reads from +fields greater than 32 bits wide are atomic, nor reads from +multiple fields: drivers SHOULD read device configuration space fields like so: egin{lstlisting} u32 before, after; @@ -132,23 +146,25 @@ do { } while (after != before); end{lstlisting} -Note that device configuration space uses the little-endian format -for multi-byte fields. - -Note that future versions of this specification will likely -extend the device configuration space for devices by adding extra fields -at the tail end of some structures in device configuration space. +For optional configuration space fields, the driver MUST check that the +corresponding feature is offered before accessing that part of the configuration +space. +egin{note} +See section
    ef{sec:General Initialization And Device Operation / Device Initialization} for details on feature negotiation. +end{note} -To allow forward compatibility with such extensions, drivers MUST +Drivers MUST NOT limit structure size and device configuration space size. Instead, -drivers SHOULD only check that device configuration space is *large enough* to +drivers SHOULD only check that device configuration space is {em large enough} to contain the fields required for device operation. +egin{note} For example, if the specification states that device configuration space 'includes a single 8-bit field' drivers should understand this to mean that the device configuration space might also include an arbitrary amount of tail padding, and accept any device configuration space size equal to or greater than the specified 8-bit size. +end{note} subsection{Legacy Interface: A Note on Device Configuration Space endian-ness}label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: A Note on Configuration Space endian-ness} -- 1.8.3.2


  • 7.  Re: [virtio] [PATCH 09/18] Feedback: split Basic Facilities feature bits and config space into normative.

    Posted 02-26-2014 17:14
    On Wed, Feb 19, 2014 at 05:09:46PM +1030, Rusty Russell wrote: > Split text into descriptive and normative. > > Signed-off-by: Rusty Russell <rusty@au1.ibm.com> Thought of a slightly better wording meanwhile. I'll commit in a couple of days if there are no comments, seems minor to me. > --- > content.tex 74 +++++++++++++++++++++++++++++++++++++------------------------ > 1 file changed, 45 insertions(+), 29 deletions(-) > > diff --git a/content.tex b/content.tex > index dd9b8a7..a2b83cd 100644 > --- a/content.tex > +++ b/content.tex > @@ -58,18 +58,8 @@ the device. > > This allows for forwards and backwards compatibility: if the device is > enhanced with a new feature bit, older drivers will not write that > -feature bit back to the device and it SHOULD go into backwards > -compatibility mode. Similarly, if a driver is enhanced with a feature > -that the device doesn't support, it see the new feature is not offered > -and SHOULD go into backwards compatibility mode (or, for poor > -implementations it MAY set the FAILED Device Status bit). > - > -The driver MUST NOT accept a feature which the device did not offer, > -and MUST NOT accept a feature which requires another feature which was > -not accepted. > - > -The device MUST NOT offer a feature which requires another feature > -which was not offered. > +feature bit back to the device. Similarly, if a driver is enhanced with a feature > +that the device doesn't support, it see the new feature is not offered. > > Feature bits are allocated as follows: > > @@ -82,14 +72,29 @@ Feature bits are allocated as follows: > item[33 and above] Feature bits reserved for future extensions. > end{description} > > +egin{note} > For example, feature bit 0 for a network device (i.e. Subsystem > Device ID 1) indicates that the device supports checksumming of > packets. > +end{note} > > In particular, new fields in the device configuration space are > -indicated by offering a feature bit, so the driver MUST check that the > -feature is offered before accessing that part of the configuration > -space. > +indicated by offering a new feature bit. > + > +drivernormative{Basic Facilities of a Virtio Device / Feature Bits} > +The driver MUST NOT accept a feature which the device did not offer, > +and MUST NOT accept a feature which requires another feature which was > +not accepted. > + > +The driver SHOULD go into backwards compatibility mode > +if the device does not offer a feature it understands, otherwise MUST > +set the FAILED field{device status} bit and cease initialization. > + Actually I think that what this means is this: Backward compatibility: With the exception of VIRTIO_F_VERSION_1, the driver SHOULD NOT cause initialization to fail unless it is impossible for the driver to support the device. The driver SHOULD support devices offering any subset of features documented in this specification. Forward compatibility: The driver MAY accept a subset of features that it recognizes, out of the set offered by the device. The driver MUST ignore any feature bits offered by the device and not described in this specification. > +devicenormative{Basic Facilities of a Virtio Device / Feature Bits} > +The device MUST NOT offer a feature which requires another feature > +which was not offered. The device SHOULD accept any valid subset > +of features the driver accepts, > otherwise it MUST fail to set the > +FEATURES_OK field{device status} bit when the driver writes it. a bit vague here I think this actually means: The device SHOULD support the driver accepting any valid subset of features offered by the device. Upon detecting driver write of field{device status} with FEATURES_OK bit set, the device MUST set the FEATURES_OK bit in field{device status} if it supports the subset of features accepted by driver. Upon detecting driver write of field{device status} with FEATURES_OK bit set, the device MUST clear the FEATURES_OK bit in field{device status} if it does not support the subset of features accepted by driver. > > subsection{Legacy Interface: A Note on transitions from earlier drafts}label{sec:Basic Facilities of a Virtio Device / Feature Bits / Legacy Interface: A Note on transitions from earlier drafts} > > @@ -112,16 +117,25 @@ In this case device is used through the legacy interface. > section{Device Configuration Space}label{sec:Basic Facilities of a Virtio Device / Device Configuration Space} > > Device configuration space is generally used for rarely-changing or > -initialization-time parameters. Drivers MUST NOT assume reads from > -fields greater than 32 bits wide are atomic, nor reads from > -multiple fields. > +initialization-time parameters. Where configuration fields are > +optional, their existence is indicated by feature bits: Future > +versions of this specification will likely extend the device > +configuration space by adding extra fields at the tail. > > -Each transport provides a generation count for the device configuration > -space, which must change whenever there is a possibility that two > +egin{note} > +The device configuration space uses the little-endian format > +for multi-byte fields. > +end{note} > + > +Each transport also provides a generation count for the device configuration > +space, which will change whenever there is a possibility that two > accesses to the device configuration space can see different versions of that > space. > > -Thus drivers SHOULD read device configuration space fields like so: > +drivernormative{Basic Facilities of a Virtio Device / Device Configuration Space} > +Drivers MUST NOT assume reads from > +fields greater than 32 bits wide are atomic, nor reads from > +multiple fields: drivers SHOULD read device configuration space fields like so: > > egin{lstlisting} > u32 before, after; > @@ -132,23 +146,25 @@ do { > } while (after != before); > end{lstlisting} > > -Note that device configuration space uses the little-endian format > -for multi-byte fields. > - > -Note that future versions of this specification will likely > -extend the device configuration space for devices by adding extra fields > -at the tail end of some structures in device configuration space. > +For optional configuration space fields, the driver MUST check that the > +corresponding feature is offered before accessing that part of the configuration > +space. > +egin{note} > +See section
    ef{sec:General Initialization And Device Operation / Device Initialization} for details on feature negotiation. > +end{note} > > -To allow forward compatibility with such extensions, drivers MUST > +Drivers MUST > NOT limit structure size and device configuration space size. Instead, > -drivers SHOULD only check that device configuration space is *large enough* to > +drivers SHOULD only check that device configuration space is {em large enough} to > contain the fields required for device operation. Actually I would even make it MAY > > +egin{note} > For example, if the specification states that device configuration > space 'includes a single 8-bit field' drivers should understand this to mean that > the device configuration space might also include an arbitrary amount of > tail padding, and accept any device configuration space size equal to or > greater than the specified 8-bit size. > +end{note} > > subsection{Legacy Interface: A Note on Device Configuration Space endian-ness}label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: A Note on Configuration Space endian-ness} > > -- > 1.8.3.2 > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 8.  Re: [virtio] [PATCH 09/18] Feedback: split Basic Facilities feature bits and config space into normative.

    Posted 02-26-2014 17:20
    On Wed, Feb 19, 2014 at 05:09:46PM +1030, Rusty Russell wrote:
    > Split text into descriptive and normative.
    >
    > Signed-off-by: Rusty Russell <rusty@au1.ibm.com>

    Thought of a slightly better wording meanwhile.
    I'll commit in a couple of days if there are
    no comments, seems minor to me.

    > ---
    > content.tex | 74 +++++++++++++++++++++++++++++++++++++------------------------
    > 1 file changed, 45 insertions(+), 29 deletions(-)
    >
    > diff --git a/content.tex b/content.tex
    > index dd9b8a7..a2b83cd 100644
    > --- a/content.tex
    > +++ b/content.tex
    > @@ -58,18 +58,8 @@ the device.
    >
    > This allows for forwards and backwards compatibility: if the device is
    > enhanced with a new feature bit, older drivers will not write that
    > -feature bit back to the device and it SHOULD go into backwards
    > -compatibility mode. Similarly, if a driver is enhanced with a feature
    > -that the device doesn't support, it see the new feature is not offered
    > -and SHOULD go into backwards compatibility mode (or, for poor
    > -implementations it MAY set the FAILED Device Status bit).
    > -
    > -The driver MUST NOT accept a feature which the device did not offer,
    > -and MUST NOT accept a feature which requires another feature which was
    > -not accepted.
    > -
    > -The device MUST NOT offer a feature which requires another feature
    > -which was not offered.
    > +feature bit back to the device. Similarly, if a driver is enhanced with a feature
    > +that the device doesn't support, it see the new feature is not offered.
    >
    > Feature bits are allocated as follows:
    >
    > @@ -82,14 +72,29 @@ Feature bits are allocated as follows:
    > \item[33 and above] Feature bits reserved for future extensions.
    > \end{description}
    >
    > +\begin{note}
    > For example, feature bit 0 for a network device (i.e. Subsystem
    > Device ID 1) indicates that the device supports checksumming of
    > packets.
    > +\end{note}
    >
    > In particular, new fields in the device configuration space are
    > -indicated by offering a feature bit, so the driver MUST check that the
    > -feature is offered before accessing that part of the configuration
    > -space.
    > +indicated by offering a new feature bit.
    > +
    > +\drivernormative{Basic Facilities of a Virtio Device / Feature Bits}
    > +The driver MUST NOT accept a feature which the device did not offer,
    > +and MUST NOT accept a feature which requires another feature which was
    > +not accepted.
    > +
    > +The driver SHOULD go into backwards compatibility mode
    > +if the device does not offer a feature it understands, otherwise MUST
    > +set the FAILED \field{device status} bit and cease initialization.
    > +

    Actually I think that what this means is this:

    Backward compatibility:

    With the exception of VIRTIO_F_VERSION_1,
    the driver SHOULD NOT cause initialization to fail
    unless it is impossible for the driver to support
    the device.
    The driver SHOULD support devices offering
    any subset of features
    documented in this specification.

    Forward compatibility:

    The driver MAY accept a subset of features that it recognizes,
    out of the set offered by the device.
    The driver MUST ignore any feature bits offered by the
    device and not described in this specification.



    > +\devicenormative{Basic Facilities of a Virtio Device / Feature Bits}
    > +The device MUST NOT offer a feature which requires another feature
    > +which was not offered. The device SHOULD accept any valid subset
    > +of features the driver accepts,
    > otherwise it MUST fail to set the
    > +FEATURES_OK \field{device status} bit when the driver writes it.

    a bit vague here

    I think this actually means:

    The device SHOULD support the driver accepting any valid
    subset of features offered by the device.

    Upon detecting driver write of
    \field{device status} with FEATURES_OK
    bit set,
    the device MUST set the FEATURES_OK bit
    in \field{device status} if it supports
    the subset of features accepted by driver.

    Upon detecting driver write of
    \field{device status} with FEATURES_OK
    bit set,
    the device MUST clear the FEATURES_OK bit
    in \field{device status} if it does not support
    the subset of features accepted by driver.



    >
    > \subsection{Legacy Interface: A Note on transitions from earlier drafts}\label{sec:Basic Facilities of a Virtio Device / Feature Bits / Legacy Interface: A Note on transitions from earlier drafts}
    >
    > @@ -112,16 +117,25 @@ In this case device is used through the legacy interface.
    > \section{Device Configuration Space}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
    >
    > Device configuration space is generally used for rarely-changing or
    > -initialization-time parameters. Drivers MUST NOT assume reads from
    > -fields greater than 32 bits wide are atomic, nor reads from
    > -multiple fields.
    > +initialization-time parameters. Where configuration fields are
    > +optional, their existence is indicated by feature bits: Future
    > +versions of this specification will likely extend the device
    > +configuration space by adding extra fields at the tail.
    >
    > -Each transport provides a generation count for the device configuration
    > -space, which must change whenever there is a possibility that two
    > +\begin{note}
    > +The device configuration space uses the little-endian format
    > +for multi-byte fields.
    > +\end{note}
    > +
    > +Each transport also provides a generation count for the device configuration
    > +space, which will change whenever there is a possibility that two
    > accesses to the device configuration space can see different versions of that
    > space.
    >
    > -Thus drivers SHOULD read device configuration space fields like so:
    > +\drivernormative{Basic Facilities of a Virtio Device / Device Configuration Space}
    > +Drivers MUST NOT assume reads from
    > +fields greater than 32 bits wide are atomic, nor reads from
    > +multiple fields: drivers SHOULD read device configuration space fields like so:
    >
    > \begin{lstlisting}
    > u32 before, after;
    > @@ -132,23 +146,25 @@ do {
    > } while (after != before);
    > \end{lstlisting}
    >
    > -Note that device configuration space uses the little-endian format
    > -for multi-byte fields.
    > -
    > -Note that future versions of this specification will likely
    > -extend the device configuration space for devices by adding extra fields
    > -at the tail end of some structures in device configuration space.
    > +For optional configuration space fields, the driver MUST check that the
    > +corresponding feature is offered before accessing that part of the configuration
    > +space.
    > +\begin{note}
    > +See section \ref{sec:General Initialization And Device Operation / Device Initialization} for details on feature negotiation.
    > +\end{note}
    >
    > -To allow forward compatibility with such extensions, drivers MUST
    > +Drivers MUST
    > NOT limit structure size and device configuration space size. Instead,
    > -drivers SHOULD only check that device configuration space is *large enough* to
    > +drivers SHOULD only check that device configuration space is {\em large enough} to
    > contain the fields required for device operation.

    Actually I would even make it MAY

    >
    > +\begin{note}
    > For example, if the specification states that device configuration
    > space 'includes a single 8-bit field' drivers should understand this to mean that
    > the device configuration space might also include an arbitrary amount of
    > tail padding, and accept any device configuration space size equal to or
    > greater than the specified 8-bit size.
    > +\end{note}
    >
    > \subsection{Legacy Interface: A Note on Device Configuration Space endian-ness}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: A Note on Configuration Space endian-ness}
    >
    > --
    > 1.8.3.2
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail. Follow this link to all your TCs in OASIS at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php



  • 9.  Re: [virtio] [PATCH 09/18] Feedback: split Basic Facilities feature bits and config space into normative.

    Posted 02-27-2014 00:38
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > On Wed, Feb 19, 2014 at 05:09:46PM +1030, Rusty Russell wrote:
    >> Split text into descriptive and normative.
    >>
    >> Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
    >
    > Thought of a slightly better wording meanwhile.
    > I'll commit in a couple of days if there are
    > no comments, seems minor to me.

    Agreed with all the comments.

    Thanks,
    Rusty.




  • 10.  Re: [virtio] [PATCH 09/18] Feedback: split Basic Facilities feature bits and config space into normative.

    Posted 02-27-2014 00:40
    "Michael S. Tsirkin" <mst@redhat.com> writes: > On Wed, Feb 19, 2014 at 05:09:46PM +1030, Rusty Russell wrote: >> Split text into descriptive and normative. >> >> Signed-off-by: Rusty Russell <rusty@au1.ibm.com> > > Thought of a slightly better wording meanwhile. > I'll commit in a couple of days if there are > no comments, seems minor to me. Agreed with all the comments. Thanks, Rusty.


  • 11.  [PATCH 02/18] Feedback: move new device design section to Appendix.

    Posted 02-19-2014 06:45
    It's non-normative. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 67 ----------------------------------------------------------- main.tex 2 ++ newdevice.tex 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 67 deletions(-) create mode 100644 newdevice.tex diff --git a/content.tex b/content.tex index b9b234d..511fda1 100644 --- a/content.tex +++ b/content.tex @@ -4356,70 +4356,3 @@ Legacy or transitional devices may offer the following: for experimental early versions of virtio which did not perform correct feature negotiation, and should not be used. end{description} - -chapter{Creating New Device Types}label{sec:Creating New Device Types} - -Various considerations are necessary when creating a new device -type. - -section{How Many Virtqueues?}label{sec:Creating New Device Types / How Many Virtqueues?} - -It is possible that a very simple device will operate entirely -through its device configuration space, but most will need at least one -virtqueue in which it will place requests. A device with both -input and output (eg. console and network devices described here) -need two queues: one which the driver fills with buffers to -receive input, and one which the driver places buffers to -transmit output. - -section{What Device Configuration Space Layout?}label{sec:Creating New Device Types / What Device Configuration Space Layout?} - -Device configuration space should only be used for initialization-time -parameters. It is a limited resource with no synchronization between -field written by the driver, so for most uses it is better to use a virtqueue to update -configuration information (the network device does this for filtering, -otherwise the table in the config space could potentially be very -large). - -Devices must not assume that configuration fields over 32 bits wide -are atomically writable by the driver. - -section{What Device Number?}label{sec:Creating New Device Types / What Device Number?} - -Device numbers can be reserved by the OASIS committee: email -virtio-dev@lists.oasis-open.org to secure a unique one. - -Meanwhile for experimental drivers, use 65535 and work backwards. - -section{How many MSI-X vectors? (for PCI)}label{sec:Creating New Device Types / How many MSI-X vectors? (for PCI)} - -Using the optional MSI-X capability devices can speed up -interrupt processing by removing the need to read ISR Status -register by guest driver (which might be an expensive operation), -reducing interrupt sharing between devices and queues within the -device, and handling interrupts from multiple CPUs. However, some -systems impose a limit (which might be as low as 256) on the -total number of MSI-X vectors that can be allocated to all -devices. Devices and/or drivers should take this into -account, limiting the number of vectors used unless the device is -expected to cause a high volume of interrupts. Devices can -control the number of vectors used by limiting the MSI-X Table -Size or not presenting MSI-X capability in PCI configuration -space. Drivers can control this by mapping events to as small -number of vectors as possible, or disabling MSI-X capability -altogether. - -section{Device Improvements}label{sec:Creating New Device Types / Device Improvements} - -Any change to device configuration space, or new virtqueues, or -behavioural changes, should be indicated by negotiation of a new -feature bit. This establishes clarityfootnote{Even if it does mean documenting design or implementation -mistakes! -} and avoids future expansion problems. - -Clusters of functionality which are always implemented together -can use a single bit, but if one feature makes sense without the -others they should not be gratuitously grouped together to -conserve feature bits. - - diff --git a/main.tex b/main.tex index 314b0de..b1913d6 100644 --- a/main.tex +++ b/main.tex @@ -40,6 +40,8 @@ input{headerfile.tex} +input{newdevice.tex} + % acknowledgements input{acknowledgements.tex} diff --git a/newdevice.tex b/newdevice.tex new file mode 100644 index 0000000..5e07b79 --- /dev/null +++ b/newdevice.tex @@ -0,0 +1,66 @@ +chapter{Creating New Device Types}label{sec:Creating New Device Types} + +Various considerations are necessary when creating a new device +type. + +section{How Many Virtqueues?}label{sec:Creating New Device Types / How Many Virtqueues?} + +It is possible that a very simple device will operate entirely +through its device configuration space, but most will need at least one +virtqueue in which it will place requests. A device with both +input and output (eg. console and network devices described here) +need two queues: one which the driver fills with buffers to +receive input, and one which the driver places buffers to +transmit output. + +section{What Device Configuration Space Layout?}label{sec:Creating New Device Types / What Device Configuration Space Layout?} + +Device configuration space should only be used for initialization-time +parameters. It is a limited resource with no synchronization between +field written by the driver, so for most uses it is better to use a virtqueue to update +configuration information (the network device does this for filtering, +otherwise the table in the config space could potentially be very +large). + +Devices must not assume that configuration fields over 32 bits wide +are atomically writable by the driver. + +section{What Device Number?}label{sec:Creating New Device Types / What Device Number?} + +Device numbers can be reserved by the OASIS committee: email +virtio-dev@lists.oasis-open.org to secure a unique one. + +Meanwhile for experimental drivers, use 65535 and work backwards. + +section{How many MSI-X vectors? (for PCI)}label{sec:Creating New Device Types / How many MSI-X vectors? (for PCI)} + +Using the optional MSI-X capability devices can speed up +interrupt processing by removing the need to read ISR Status +register by guest driver (which might be an expensive operation), +reducing interrupt sharing between devices and queues within the +device, and handling interrupts from multiple CPUs. However, some +systems impose a limit (which might be as low as 256) on the +total number of MSI-X vectors that can be allocated to all +devices. Devices and/or drivers should take this into +account, limiting the number of vectors used unless the device is +expected to cause a high volume of interrupts. Devices can +control the number of vectors used by limiting the MSI-X Table +Size or not presenting MSI-X capability in PCI configuration +space. Drivers can control this by mapping events to as small +number of vectors as possible, or disabling MSI-X capability +altogether. + +section{Device Improvements}label{sec:Creating New Device Types / Device Improvements} + +Any change to device configuration space, or new virtqueues, or +behavioural changes, should be indicated by negotiation of a new +feature bit. This establishes clarityfootnote{Even if it does mean documenting design or implementation +mistakes! +} and avoids future expansion problems. + +Clusters of functionality which are always implemented together +can use a single bit, but if one feature makes sense without the +others they should not be gratuitously grouped together to +conserve feature bits. + + -- 1.8.3.2


  • 12.  [PATCH 05/18] Feedback: hoist the one legacy-related requirement out of legacy section.

    Posted 02-19-2014 06:45
    This requirement applies to any system which *did* have legacy drivers. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/content.tex b/content.tex index 511fda1..e6ebd1d 100644 --- a/content.tex +++ b/content.tex @@ -1295,6 +1295,20 @@ As a prerequisite to device initialization, the driver scans the PCI capability list, detecting virtio configuration layout using Virtio Structure PCI capabilities. +paragraph{Non-transitional Device With Legacy Driver}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Non-transitional Device With Legacy Driver} + +Non-transitional devices, on a platform where a legacy driver for +a legacy device with the same ID might have previously existed, +MUST take the following steps to fail gracefully when a legacy +driver attempts to drive them: + +egin{enumerate} +item Present an I/O BAR in BAR0, and +item Respond to a single-byte zero write to offset 18 + (corresponding to Device Status register in the legacy layout) + of BAR0 by presenting zeroes on every BAR and ignoring writes. +end{enumerate} + subparagraph{Legacy Interface: A Note on Device Layout Detection}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection / Legacy Interface: A Note on Device Layout Detection} Legacy drivers skipped the Device Layout Detection step, assuming legacy @@ -1317,18 +1331,6 @@ Capabilities on the capability list. If these are not present, driver should assume a legacy device, and fail gracefully. -Non-transitional devices, on a platform where a legacy driver for -a legacy device with the same ID might have previously existed, -MUST take the following steps to fail gracefully when a legacy -driver attempts to drive them: - -egin{enumerate} -item Present an I/O BAR in BAR0, and -item Respond to a single-byte zero write to offset 18 - (corresponding to Device Status register in the legacy layout) - of BAR0 by presenting zeroes on every BAR and ignoring writes. -end{enumerate} - paragraph{MSI-X Vector Configuration}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration} When MSI-X capability is present and enabled in the device -- 1.8.3.2


  • 13.  [PATCH 06/18] Feedback: move legacy/transitional definitions into terminology.

    Posted 02-19-2014 06:45
    Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 50 ++++++++------------------------------------------ introduction.tex 27 +++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 42 deletions(-) diff --git a/content.tex b/content.tex index e6ebd1d..13e7749 100644 --- a/content.tex +++ b/content.tex @@ -88,45 +88,15 @@ space. subsection{Legacy Interface: A Note on transitions from earlier drafts}label{sec:Basic Facilities of a Virtio Device / Feature Bits / Legacy Interface: A Note on transitions from earlier drafts} -Earlier drafts of this specification (up to 0.9.X) defined a similar, but -different interface between the hypervisor and the guest. -Since these are widely deployed, this specification -accommodates optional features to simplify transition -from these earlier draft interfaces. Specifically: +Careful consideration has been given on the transition from the older +hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} specification to +this one. Advice pertaining to transitional devices and drivers +is contained in sections named 'Legacy Interface' like this one. -egin{description} -item[Legacy Interface] - is an interface specified by an earlier draft of this specification - (up to 0.9.X) -item[Legacy Device] - is a device implemented before this specification was released, - and implementing a legacy interface on the host side -item[Legacy Driver] - is a driver implemented before this specification was released, - and implementing a legacy interface on the guest side -end{description} - -Legacy devices and legacy drivers are not compliant with this -specification. - -To simplify transition from these earlier draft interfaces, -it is possible to implement: - -egin{description} -item[Transitional Device] - a device supporting both drivers conforming to this - specification, and allowing legacy drivers. - -item[Transitional Driver] - a driver supporting both devices conforming to this - specification, and legacy devices. -end{description} - -Transitional devices and transitional drivers can be compliant with -this specification (ie. when not operating in legacy mode). - -Devices or drivers with no legacy compatibility are referred to as -non-transitional devices and drivers, respectively. +egin{note} + No legacy interfaces are required; ie. don't implement them unless you + have a need for backwards compatibility! +end{note} Transitional Drivers can detect Legacy Devices by detecting that the feature bit VIRTIO_F_VERSION_1 is not offered. @@ -134,10 +104,6 @@ Transitional devices can detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not been acknowledged by the driver. In this case device is used through the legacy interface. -To make them easier to locate, specification sections documenting -these transitional features are explicitly marked with 'Legacy -Interface' in the section title. - section{Device Configuration Space}label{sec:Basic Facilities of a Virtio Device / Device Configuration Space} Device configuration space is generally used for rarely-changing or diff --git a/introduction.tex b/introduction.tex index 4ea4259..65392d9 100644 --- a/introduction.tex +++ b/introduction.tex @@ -38,6 +38,33 @@ between different transports. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in hyperref[intro:rfc2119]{[RFC2119]}. +An older specification (see hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}) defined a +similar, but different interface between the hypervisor and the guest. +To simplify transition and note differences, the following terms are used: + +egin{description} +item[Legacy Interface] + An interface specified by hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}. +item[Legacy Device] + A device which implements hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}, but not this specification. +item[Legacy Driver] + A driver which implements hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}, but not this specification. +item[Transitional Device] + A device which implements both this specification and the older + hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} + specification, thus allowing legacy drivers. +item[Transitional Driver] + A device which implements both this specification and the older + hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} + specification, thus allowing legacy devices. +item[Non-Transitional Device] + A device which does not implement the + hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} specification. +item[Non-Transitional Driver] + A driver which does not implement the + hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} specification. +end{description} + section{Normative References} egin{longtable}{l p{5in}} -- 1.8.3.2


  • 14.  [PATCH 16/18] Feedback: net: separate normative and instructional text.

    Posted 02-19-2014 06:45
    Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 263 +++++++++++++++++++++++++++++++++++++------------------ introduction.tex 4 + 2 files changed, 182 insertions(+), 85 deletions(-) diff --git a/content.tex b/content.tex index e2cf1b7..4b00e6b 100644 --- a/content.tex +++ b/content.tex @@ -2716,9 +2716,10 @@ features. subsection{Feature bits}label{sec:Device Types / Network Device / Feature bits} egin{description} -item[VIRTIO_NET_F_CSUM (0)] Device handles packets with partial checksum +item[VIRTIO_NET_F_CSUM (0)] Device handles packets with partial checksum. This + “checksum offload” is a common feature on modern network cards. -item[VIRTIO_NET_F_GUEST_CSUM (1)] Driver handles packets with partial checksum +item[VIRTIO_NET_F_GUEST_CSUM (1)] Driver handles packets with partial checksum. item[VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2)] Control channel offloads reconfiguration support. @@ -2762,6 +2763,29 @@ features. channel. end{description} +subsubsection{Feature bit requirements}label{sec:Device Types / Network Device / Feature bits / Feature bit requirements} + +Some networking feature bits require other networking feature bits +(see
    ef{drivernormative:Basic Facilities of a Virtio Device / Feature Bits}): + +egin{description} +item[VIRTIO_NET_F_GUEST_TSO4] Requires VIRTIO_NET_F_GUEST_CSUM. +item[VIRTIO_NET_F_GUEST_TSO6] Requires VIRTIO_NET_F_GUEST_CSUM. +item[VIRTIO_NET_F_GUEST_ECN] Requires VIRTIO_NET_F_GUEST_TSO4 or VIRTIO_NET_F_GUEST_TSO6. +item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM. + +item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM. +item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM. +item[VIRTIO_NET_F_HOST_ECN] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. +item[VIRTIO_NET_F_HOST_UFO] Requires VIRTIO_NET_F_CSUM. + +item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ. +item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ. +item[VIRTIO_NET_F_GUEST_ANNOUNCE] Requires VIRTIO_NET_F_CTRL_VQ. +item[VIRTIO_NET_F_MQ] Requires VIRTIO_NET_F_CTRL_VQ. +item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ. +end{description} + subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} egin{description} item[VIRTIO_NET_F_GSO (6)] Device handles packets with any GSO type. @@ -2789,7 +2813,7 @@ VIRTIO_NET_F_MQ is set. This field specifies the maximum number of each of transmit and receive virtqueues (receiveq0..receiveqN and transmitq0..transmitqN respectively; N=field{max_virtqueue_pairs} - 1) that can be configured once VIRTIO_NET_F_MQ -is negotiated. Legal values for this field are 1 to 0x8000. +is negotiated. egin{lstlisting} /* Note: LEGACY version was not little endian! */ @@ -2800,6 +2824,23 @@ struct virtio_net_config { }; end{lstlisting} +devicenormative{Device Types / Network Device / Device configuration layout} + +The device MUST set field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive, +if it offers VIRTIO_NET_F_MQ. + +drivernormative{Device Types / Network Device / Device configuration layout} + +A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it. +If the driver negotiates the VIRTIO_NET_F_MAC feature, the driver MUST set +the physical address of the NIC to field{mac}. Otherwise, it SHOULD +use a locally-administered MAC address (see hyperref[intro:IEEE 802], +"9.2 48-bit universal LAN MAC addresses"). + +If the driver does not negotiate the VIRTIO_NET_F_STATUS feature, it SHOULD +assume the link is active, otherwise it SHOULD read the link status from +the bottom bit of field{status}. + subsubsection{Legacy Interface: Device configuration layout}label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout} For legacy devices, field{status} and field{max_virtqueue_pairs} in struct virtio_net_config are the native endian of the guest rather than (necessarily) little-endian. @@ -2807,56 +2848,40 @@ native endian of the guest rather than (necessarily) little-endian. subsection{Device Initialization}label{sec:Device Types / Network Device / Device Initialization} +A driver would perform a typical initialization routine like so: + egin{enumerate} -item The initialization routine should identify the receive and +item Identify and initialize the receive and transmission virtqueues, up to N+1 of each kind. If VIRTIO_NET_F_MQ feature bit is negotiated, N=field{max_virtqueue_pairs}-1, otherwise identify N=0. -item If the VIRTIO_NET_F_MAC feature bit is set, the configuration - space field{mac} entry indicates the “physical” address of the - network card, otherwise a private MAC address should be - assigned. All drivers are expected to negotiate this feature if - it is set. - item If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated, identify the control virtqueue. +item Fill the receive queues with buffers: see
    ef{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}. + +item Even with VIRTIO_NET_F_MQ, only receiveq0, transmitq0 and + controlq are used by default. The driver would send the + VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying the + number of the transmit and receive queues to use. + +item If the VIRTIO_NET_F_MAC feature bit is set, the configuration + space field{mac} entry indicates the “physical” address of the + network card, otherwise the driver would typically generate a random + local MAC address. + item If the VIRTIO_NET_F_STATUS feature bit is negotiated, the link - status can be read from the bottom bit of field{status}. - Otherwise, the link should be assumed active. - -item Only receiveq0, transmitq0 and controlq are used by default. - To use more queues driver must negotiate the VIRTIO_NET_F_MQ - feature; initialize up to field{max_virtqueue_pairs} of each of - transmit and receive queues; - execute VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying the - number of the transmit and receive queues that is going to be - used and wait until the device consumes the controlq buffer and - acks this command. - The receive virtqueue should be filled with receive buffers - before multiqueue is activated - (see
    ef{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}~
    ameref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}). - This is described in detail below in
    ameref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}. - -item A driver can indicate that it will generate checksumless - packets by negotating the VIRTIO_NET_F_CSUM feature. This - “checksum offload” is a common feature on modern network cards. + status comes from the bottom bit of field{status}. + Otherwise, the driver assumes it's active. + +item A performant driver would indicate that it will generate checksumless + packets by negotating the VIRTIO_NET_F_CSUM feature. -item If that feature is negotiatedfootnote{ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are -dependent on VIRTIO_NET_F_CSUM; a device which offers the offload -features must offer the checksum feature, and a driver which -accepts the offload features must accept the checksum feature. -Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features -depending on VIRTIO_NET_F_GUEST_CSUM. -}, a driver can use TCP or UDP +item If that feature is negotiated, a driver can use TCP or UDP segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO - (UDP fragmentation) features. It should not send TCP packets - requiring segmentation offload which have the Explicit Congestion - Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is - negotiated.footnote{This is a common restriction in real, older network cards. -} + (UDP fragmentation) features. item The converse features are also available: a driver can save the virtual device some work by negotiating these features.footnote{For example, a network packet transported between two guests on @@ -2871,6 +2896,9 @@ if both guests are amenable. See
    ef{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}~
    ameref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers} and
    ef{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}~
    ameref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers} below. end{enumerate} +A truly minimal driver would only accept VIRTIO_NET_F_MAC and ignore +everything else. + subsection{Device Operation}label{sec:Device Types / Network Device / Device Operation} Packets are transmitted by placing them in the @@ -2911,10 +2939,11 @@ Transmitting a single packet is simple, but varies depending on the different features the driver negotiated. egin{enumerate} -item If the driver negotiated VIRTIO_NET_F_CSUM, and the packet has - not been fully checksummed, then the virtio_net_hdr's fields - are set as follows. Otherwise, the packet must be fully - checksummed, and flags is zero. +item The driver MAY send a completely checksummed packet. In this case, + field{flags} will be zero, and field{gso_type} will be VIRTIO_NET_HDR_GSO_NONE. + +item If the driver negotiated VIRTIO_NET_F_CSUM, it MAY skip + checksumming the packet: egin{itemize} item field{flags} has the VIRTIO_NET_HDR_F_NEEDS_CSUM set, @@ -2923,17 +2952,20 @@ the different features the driver negotiated. item field{csum_offset} indicates how many bytes after the csum_start the new (16 bit ones' complement) checksum should be placed. + + item The TCP checksum field in the packet is set to the sum + of the TCP pseudo header, so that replacing it by the ones' + complement checksum of the TCP header and body will give the + correct result. end{itemize} +egin{note} For example, consider a partially checksummed TCP (IPv4) packet. It will have a 14 byte ethernet header and 20 byte IP header followed by the TCP header (with the TCP checksum field 16 bytes into that header). field{csum_start} will be 14+20 = 34 (the TCP -checksum includes the header), and field{csum_offset} will be 16. The -value in the TCP checksum field should be initialized to the sum -of the TCP pseudo header, so that replacing it by the ones' -complement checksum of the TCP header and body will give the -correct result. +checksum includes the header), and field{csum_offset} will be 16. +end{note} item If the driver negotiated VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires @@ -2962,15 +2994,32 @@ specifically in the protocol. end{itemize} item If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature, - field{num_buffers} is set to zero. + field{num_buffers} is set to zero. This field is unused on transmitted packets. -item The header and packet are added as one output buffer to the +item The header and packet are added as one output descriptor to the transmitq, and the device is notified of the new entry (see
    ef{sec:Device Types / Network Device / Device Initialization}~
    ameref{sec:Device Types / Network Device / Device Initialization}).footnote{Note that the header will be two bytes longer for the VIRTIO_NET_F_MRG_RXBUF case. } end{enumerate} +drivernormative{Device Types / Network Device / Device Operation / Packet Transmission} + +If a driver has not negotiated VIRTIO_NET_F_CSUM, field{flags} MUST be zero and +the packet must be fully checksummed. + +If a driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature, it MUST include +field{num_buffers} in the header, and it MUST set the value to zero. If a driver +did not negotiate VIRTIO_NET_F_MRG_RXBUF, it MUST NOT include field{num_buffers} in the header. +egin{note} + ie. With VIRTIO_NET_F_MRG_RXBUF, both receive and transmit headers + are 12 bytes. Without it, they're 10 bytes. +end{note} + +A driver SHOULD NOT send TCP packets requiring segmentation offload which have the Explicit Congestion Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is +negotiatedfootnote{This is a common restriction in real, older network cards.}, in +which case it MUST set the VIRTIO_NET_HDR_GSO_ECN bit in field{gso_type}. + paragraph{Packet Transmission Interrupt}label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt} Often a driver will suppress transmission interrupts using the @@ -2990,19 +3039,33 @@ fully populated as possible: if it runs out, network performance will suffer. If the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or -VIRTIO_NET_F_GUEST_UFO features are used, the Driver will need to -accept packets of up to 65550 bytes long (the maximum size of a +VIRTIO_NET_F_GUEST_UFO features are used, the maximum incoming packet +will be to 65550 bytes long (the maximum size of a TCP or UDP packet, plus the 14 byte ethernet header), otherwise -1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every -buffer in the receive queue needs to be at least this length.footnote{Obviously each one can be split across multiple descriptor -elements. -} +1514 bytes. The 12-byte struct virtio_net_hdr is prepended to this, +making for 65562 or 1526 bytes. + +drivernormative{Device Types / Network Device / Device Operation / Setting Up Receive Buffers} -If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at -least the size of the struct virtio_net_hdr. +egin{itemize} +item If VIRTIO_NET_F_MRG_RXBUF is not negotiated: + egin{itemize} + item If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or + VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate + the receive queue(s) with buffers of at least 65562 bytes. + item Otherwise, the driver SHOULD populate the receive queue(s) + with buffers of at least 1526 bytes. + end{itemize} +item If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at + least the size of the struct virtio_net_hdr. +end{itemize} + +egin{note} +Obviously each buffer can be split across multiple descriptor elements. +end{note} If VIRTIO_NET_F_MQ is negotiated, each of receiveq0...receiveqN -that will be used should be populated with receive buffers. +that will be used SHOULD be populated with receive buffers. paragraph{Packet Receive Interrupt}label{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers / Packet Receive Interrupt} @@ -3029,7 +3092,7 @@ Processing packet involves: virtio_net_hdr. item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the - VIRTIO_NET_HDR_F_NEEDS_CSUM bit in field{flags} may be + VIRTIO_NET_HDR_F_NEEDS_CSUM bit in field{flags} MAY be set: if so, the checksum on the packet is incomplete and field{csum_start} and field{csum_offset} indicate how to calculate it (see Packet Transmission point 1). @@ -3113,11 +3176,6 @@ command-specific-data is two variable length tables of 6-byte MAC addresses. The first table contains unicast addresses, and the second contains multicast addresses. -When VIRTIO_NET_F_MAC_ADDR is not negotiated, field{mac} in the -config space is writeable and is used to set the default MAC -address which rx filtering accepts. -When VIRTIO_NET_F_MAC_ADDR is negotiated, field{mac} in the -config space becomes read-only for the driver. The VIRTIO_NET_CTRL_MAC_ADDR_SET command is used to set the default MAC address which rx filtering accepts. @@ -3129,6 +3187,11 @@ accepts. The command-specific-data for VIRTIO_NET_CTRL_MAC_ADDR_SET is the 6-byte MAC address. +drivernormative{Device Types / Network Device / Device Operation / Control Virtqueue / Setting MAC Address Filtering} + +A driver MUST NOT write to the field{mac} if VIRTIO_NET_F_MAC_ADDR is +negotiated. + The VIRTIO_NET_CTRL_MAC_ADDR_SET command is atomic whereas field{mac} in config space is not, therefore drivers @@ -3180,17 +3243,16 @@ the guest in this way). #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0 end{lstlisting} -The Driver needs to check VIRTIO_NET_S_ANNOUNCE bit in status -field when it notices the changes of device configuration. The +The driver checks VIRTIO_NET_S_ANNOUNCE bit in the device configuration field{status} field +when it notices the changes of device configuration. The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that -driver has received the notification and device would clear the -VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received -this command. +driver has received the notification and device clears the +VIRTIO_NET_S_ANNOUNCE bit in field{status}. Processing this notification involves: egin{enumerate} -item Sending the gratuitous packets or marking there are pending +item Sending the gratuitous packets (eg. ARP) or marking there are pending gratuitous packets to be sent and letting deferred routine to send them. @@ -3198,6 +3260,20 @@ Processing this notification involves: vq. end{enumerate} +drivernormative{Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending} + +If the driver negotiates VIRTIO_NET_F_GUEST_ANNOUNCE, it SHOULD notify +network peers of its new location after it sees the VIRTIO_NET_S_ANNOUNCE bit +in field{status}. The driver MUST send a command on the command queue +with class VIRTIO_NET_CTRL_ANNOUNCE and command VIRTIO_NET_CTRL_ANNOUNCE_ACK. + +devicenormative{Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending} + +If VIRTIO_NET_F_GUEST_ANNOUNCE is negotiated, the device MUST clear the +VIRTIO_NET_S_ANNOUNCE bit in field{status} upon receipt of a command buffer +with class VIRTIO_NET_CTRL_ANNOUNCE and command VIRTIO_NET_CTRL_ANNOUNCE_ACK +before marking the buffer as used. + paragraph{Automatic receive steering in multiqueue mode}label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} If the driver negotiates the VIRTIO_NET_F_MQ feature bit (depends @@ -3220,11 +3296,10 @@ struct virtio_net_ctrl_mq { Multiqueue is disabled by default. The driver enables multiqueue by executing the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command, specifying -the number of the transmit and receive queues to be used; subsequently, +the number of the transmit and receive queues to be used up to +field{max_virtqueue_pairs}; subsequently, transmitq0..transmitqn and receiveq0..receiveqn where -n=virtqueue_pairs-1 MAY be used. All these virtqueues MUST have -been pre-configured in advance. The range of legal values for the -field{virtqueue_pairs} field is between 1 and field{max_virtqueue_pairs}. +n=virtqueue_pairs-1 MAY be used. When multiqueue is enabled, the device MUST use automatic receive steering based on packet flow. Programming of the receive steering @@ -3235,12 +3310,29 @@ no packets have been transmitted yet, the device MAY steer a packet to a random queue out of the specified receiveq0..receiveqn. Multiqueue is disabled by setting field{virtqueue_pairs} to 1 (this is -the default). After the command has been consumed by the device, the -device MUST NOT steer new packets to virtqueues -receveq1..receiveqN (i.e. other than receiveq0) and MUST NOT read from -transmitq1..transmitqN (i.e. other than transmitq0); accordingly, -the driver MUST NOT transmit new packets on virtqueues other than -transmitq0. +the default) and waiting for the device to use the command buffer. + +drivernormative{Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} + +The driver MUST configure the virtqueues before enabling them with the +VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command. + +The driver MUST NOT request a field{virtqueue_pairs} of 0 or +greater than field{max_virtqueue_pairs} in the device configuration space. + +The driver MUST queue packets only on any transmitq0 before the +VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command. + +The driver MUST NOT queue packets on transmit queues greater than +field{virtqueue_pairs} once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command in the available ring. + +devicenormative{Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} + +The device MUST queue packets only on any receiveq0 before the +VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command. + +The device MUST NOT queue packets on receive queues greater than +field{virtqueue_pairs} once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command in the used ring. subparagraph{Legacy Interface: Automatic receive steering in multiqueue mode}label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Legacy Interface: Automatic receive steering in multiqueue mode} For legacy devices, field{virtqueue_pairs} is in the @@ -3276,9 +3368,10 @@ There is a corresponding device feature for each offload. Upon feature negotiation corresponding offload gets enabled to preserve backward compartibility. -Corresponding feature must be negotiated at startup in order to allow dynamic -change of specific offload state. +drivernormative{Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} +A driver MUST NOT enable a offload for which the appropriate feature +has not been negotiated. subparagraph{Legacy Interface: Setting Offloads State}label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State / Legacy Interface: Setting Offloads State} For legacy devices, field{offloads} is the diff --git a/introduction.tex b/introduction.tex index 65392d9..d098718 100644 --- a/introduction.tex +++ b/introduction.tex @@ -82,6 +82,10 @@ To simplify transition and note differences, the following terms are used: phantomsectionlabel{intro:Virtio PCI Draft} extbf{[Virtio PCI Draft]} & Virtio PCI Draft Specification
    ewlineurl{ http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf}\ + phantomsectionlabel{intro:IEEE 802} extbf{[IEEE 802]} & + IEEE Standard for Local and Metropolitan Area Networks: Overview and Architecture, +
    ewlineurl{ http://standards.ieee.org/about/get/802/802.html} , + IEEE\ end{longtable} section{Structure Specifications} -- 1.8.3.2


  • 15.  [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-19-2014 06:45
    Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 54 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 15 deletions(-) diff --git a/content.tex b/content.tex index b3dfccf..e2cf1b7 100644 --- a/content.tex +++ b/content.tex @@ -2188,9 +2188,6 @@ Bytes & Description & Contents \ hline end{tabular} -A driver for virtio-ccw devices MUST check for a control unit -type of 0x3832 and MUST ignore the device type and model. - In addition to the basic channel commands, virtio-ccw defines a set of channel commands related to configuration and operation of virtio: @@ -2210,6 +2207,8 @@ virtio: #define CCW_CMD_SET_VIRTIO_REV 0x83 end{lstlisting} +devicenormative{Virtio Transport Options / Virtio over channel I/O / Basic Concepts} + The virtio-ccw device acts like a normal channel device, as specified in hyperref[intro:S390 PoP]{[S390 PoP]} and hyperref[intro:S390 Common I/O]{[S390 Common I/O]}. In particular: @@ -2225,10 +2224,16 @@ in hyperref[intro:S390 PoP]{[S390 PoP]} and hyperref[intro:S390 Common I/O]{[S device MUST present a check condition if the transmitted data does not contain enough data to process the command. If the driver submitted a buffer that was too long, the device SHOULD accept the command. - The driver SHOULD attempt to provide the correct length even if it - suppresses length checks. end{itemize} +drivernormative{Virtio Transport Options / Virtio over channel I/O / Basic Concepts} + +A driver for virtio-ccw devices MUST check for a control unit +type of 0x3832 and MUST ignore the device type and model. + +A driver SHOULD attempt to provide the correct length even if it +suppresses length checks. + subsection{Device Initialization}label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization} virtio-ccw uses several channel commands to set up a device. @@ -2267,30 +2272,36 @@ The following values are supported: Note that a change in the virtio standard does not necessarily correspond to a change in the virtio-ccw revision. +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision} + A device MUST post a unit check with command reject for any field{revision} it does not support. For any invalid combination of field{revision}, field{length} and field{data}, it MUST post a unit check with command reject as well. A non-transitional device MUST reject revision id 0. -A driver SHOULD start with trying to set the highest revision it -supports and continue with lower revisions if it gets a command reject. - -A driver MUST NOT issue any other virtio-ccw specific channel commands -prior to setting the revision. - A device MUST answer with command reject to any virtio-ccw specific channel command that is not contained in the revision selected by the driver. -After a revision has been successfully selected by the driver, it -MUST NOT attempt to select a different revision. A device MUST answer -to any such attempt with a command reject. +A device MUST answer with command reject to any attempt to select a different revision +after a revision has been successfully selected by the driver. A device MUST treat the revision as unset from the time the associated subchannel has been enabled until a revision has been successfully set by the driver. This implies that revisions are not persistent across disabling and enabling of the associated subchannel. +drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision} + +A driver SHOULD start with trying to set the highest revision it +supports and continue with lower revisions if it gets a command reject. + +A driver MUST NOT issue any other virtio-ccw specific channel commands +prior to setting the revision. + +After a revision has been successfully selected by the driver, it +MUST NOT attempt to select a different revision. + paragraph{Legacy Interfaces: A Note on Setting the Virtio Revision}label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision / Legacy Interfaces: A Note on Setting the Virtio Revision} A legacy device will not support the CCW_CMD_SET_VIRTIO_REV and answer @@ -2339,6 +2350,9 @@ struct vq_info_block { field{desc}, field{avail} and field{used} contain the guest addresses for the descriptor table, available ring and used ring for queue field{index}, respectively. The actual virtqueue size (number of allocated buffers) is transmitted in field{num}. + +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue} + field{res0} is reserved and MUST be ignored by the device. paragraph{Legacy Interface: A Note on Configuring a Virtqueue}label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue / Legacy Interface: A Note on Configuring a Virtqueue} @@ -2506,6 +2520,7 @@ No padding is added at the end of the structure, it is exactly 25 bytes in length. +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators} If the driver has already set up classic queue indicators via the CCW_CMD_SET_IND command, the device MUST post a unit check with command reject to any subsequent CCW_CMD_SET_IND_ADAPTER command. @@ -2553,8 +2568,14 @@ For notifying the driver of virtqueue buffers, the device sets the bit in the guest-provided indicator area at the corresponding offset. The guest-provided summary indicator is set to 0x01. An adapter I/O interrupt for the corresponding interruption subclass is generated. + +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts} + The device SHOULD only generate an adapter I/O interrupt if the -summary indicator had not been set prior to notification. The driver +summary indicator had not been set prior to notification. + +drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts} +The driver MUST clear the summary indicator after receiving an adapter I/O interrupt before it processes the queue indicators. @@ -2585,12 +2606,15 @@ GPR & Input Value & Output Value \ hline end{tabular} +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} The device MUST ignore bits 0-31 (counting from the left) of GPR2. This aligns passing the subchannel ID with the way it is passed for the existing I/O instructions. Host cookie is an optional per-virtqueue 64 bit value that MAY be used by the hypervisor to speed up the notification execution. + +drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} For each notification, the output value is returned in GPR2 and SHOULD be passed in GPR4 for the next notification: -- 1.8.3.2


  • 16.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-20-2014 14:47
    On Wed, 19 Feb 2014 17:09:52 +1030
    Rusty Russell <rusty@au1.ibm.com> wrote:

    > Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
    > ---
    > content.tex | 54 +++++++++++++++++++++++++++++++++++++++---------------
    > 1 file changed, 39 insertions(+), 15 deletions(-)
    >
    > diff --git a/content.tex b/content.tex
    > index b3dfccf..e2cf1b7 100644
    > --- a/content.tex
    > +++ b/content.tex

    > @@ -2225,10 +2224,16 @@ in \hyperref[intro:S390 PoP]{[S390 PoP]} and \hyperref[intro:S390 Common I/O]{[S
    > device MUST present a check condition if the transmitted data does
    > not contain enough data to process the command. If the driver submitted
    > a buffer that was too long, the device SHOULD accept the command.
    > - The driver SHOULD attempt to provide the correct length even if it
    > - suppresses length checks.
    > \end{itemize}
    >
    > +\drivernormative{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
    > +
    > +A driver for virtio-ccw devices MUST check for a control unit
    > +type of 0x3832 and MUST ignore the device type and model.
    > +
    > +A driver SHOULD attempt to provide the correct length even if it
    > +suppresses length checks.

    This sentence needs to be expanded, or it is hard to understand due to
    missing context after the move:

    "A driver SHOULD attempt to provide the correct length in a channel
    command even if it suppresses length checks for that command."

    > +
    > \subsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization}
    >
    > virtio-ccw uses several channel commands to set up a device.

    > @@ -2553,8 +2568,14 @@ For notifying the driver of virtqueue buffers, the device sets the
    > bit in the guest-provided indicator area at the corresponding offset.
    > The guest-provided summary indicator is set to 0x01. An adapter I/O
    > interrupt for the corresponding interruption subclass is generated.
    > +
    > +\devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
    > +
    > The device SHOULD only generate an adapter I/O interrupt if the
    > -summary indicator had not been set prior to notification. The driver
    > +summary indicator had not been set prior to notification.
    > +
    > +\drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
    > +The driver
    > MUST clear the summary indicator after receiving an adapter I/O
    > interrupt before it processes the queue indicators.

    And I just noticed that there's a slight technical problem with that
    statement. I'll fix that separately.

    >
    > @@ -2585,12 +2606,15 @@ GPR & Input Value & Output Value \\
    > \hline
    > \end{tabular}
    >
    > +\devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    > The device MUST ignore bits 0-31 (counting from the left) of GPR2.
    > This aligns passing the subchannel ID with the way it is passed
    > for the existing I/O instructions.
    >
    > Host cookie is an optional per-virtqueue 64 bit value that MAY be
    > used by the hypervisor to speed up the notification execution.
    > +
    > +\drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    > For each notification, the output value is returned in GPR2 and

    Hm, don't we need to split that? It is the device that returns the
    value in gpr2 and the driver that should put it into gpr4 for the next
    notification.

    > SHOULD be passed in GPR4 for the next notification:
    >




  • 17.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-20-2014 14:47
    On Wed, 19 Feb 2014 17:09:52 +1030 Rusty Russell <rusty@au1.ibm.com> wrote: > Signed-off-by: Rusty Russell <rusty@au1.ibm.com> > --- > content.tex 54 +++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 39 insertions(+), 15 deletions(-) > > diff --git a/content.tex b/content.tex > index b3dfccf..e2cf1b7 100644 > --- a/content.tex > +++ b/content.tex > @@ -2225,10 +2224,16 @@ in hyperref[intro:S390 PoP]{[S390 PoP]} and hyperref[intro:S390 Common I/O]{[S > device MUST present a check condition if the transmitted data does > not contain enough data to process the command. If the driver submitted > a buffer that was too long, the device SHOULD accept the command. > - The driver SHOULD attempt to provide the correct length even if it > - suppresses length checks. > end{itemize} > > +drivernormative{Virtio Transport Options / Virtio over channel I/O / Basic Concepts} > + > +A driver for virtio-ccw devices MUST check for a control unit > +type of 0x3832 and MUST ignore the device type and model. > + > +A driver SHOULD attempt to provide the correct length even if it > +suppresses length checks. This sentence needs to be expanded, or it is hard to understand due to missing context after the move: "A driver SHOULD attempt to provide the correct length in a channel command even if it suppresses length checks for that command." > + > subsection{Device Initialization}label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization} > > virtio-ccw uses several channel commands to set up a device. > @@ -2553,8 +2568,14 @@ For notifying the driver of virtqueue buffers, the device sets the > bit in the guest-provided indicator area at the corresponding offset. > The guest-provided summary indicator is set to 0x01. An adapter I/O > interrupt for the corresponding interruption subclass is generated. > + > +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts} > + > The device SHOULD only generate an adapter I/O interrupt if the > -summary indicator had not been set prior to notification. The driver > +summary indicator had not been set prior to notification. > + > +drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts} > +The driver > MUST clear the summary indicator after receiving an adapter I/O > interrupt before it processes the queue indicators. And I just noticed that there's a slight technical problem with that statement. I'll fix that separately. > > @@ -2585,12 +2606,15 @@ GPR & Input Value & Output Value \ > hline > end{tabular} > > +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} > The device MUST ignore bits 0-31 (counting from the left) of GPR2. > This aligns passing the subchannel ID with the way it is passed > for the existing I/O instructions. > > Host cookie is an optional per-virtqueue 64 bit value that MAY be > used by the hypervisor to speed up the notification execution. > + > +drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} > For each notification, the output value is returned in GPR2 and Hm, don't we need to split that? It is the device that returns the value in gpr2 and the driver that should put it into gpr4 for the next notification. > SHOULD be passed in GPR4 for the next notification: >


  • 18.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-21-2014 04:42
    Cornelia Huck <cornelia.huck@de.ibm.com> writes:
    > On Wed, 19 Feb 2014 17:09:52 +1030
    > Rusty Russell <rusty@au1.ibm.com> wrote:
    >> +A driver SHOULD attempt to provide the correct length even if it
    >> +suppresses length checks.
    >
    > This sentence needs to be expanded, or it is hard to understand due to
    > missing context after the move:
    >
    > "A driver SHOULD attempt to provide the correct length in a channel
    > command even if it suppresses length checks for that command."

    Thanks, fixed. In some ways, having a single list of requirement is
    clear, in others it means we double up a bit.

    >> +\devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    >> The device MUST ignore bits 0-31 (counting from the left) of GPR2.
    >> This aligns passing the subchannel ID with the way it is passed
    >> for the existing I/O instructions.
    >>
    >> Host cookie is an optional per-virtqueue 64 bit value that MAY be
    >> used by the hypervisor to speed up the notification execution.
    >> +
    >> +\drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    >> For each notification, the output value is returned in GPR2 and
    >
    > Hm, don't we need to split that? It is the device that returns the
    > value in gpr2 and the driver that should put it into gpr4 for the next
    > notification.

    Ah, I was a bit unclear on this.

    We already say:

    Host cookie is an optional per-virtqueue 64 bit value that MAY be
    used by the hypervisor to speed up the notification execution.

    So the requirement is a SHOULD or a MUST?

    I tidied the whole thing up a bit more. Here's what I've got for now
    in my feedback git tree:

    \devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    The device MUST ignore bits 0-31 (counting from the left) of GPR2.
    This aligns passing the subchannel ID with the way it is passed
    for the existing I/O instructions.

    The driver MAY return a 64-bit host cookie in GPR2 to speed up the
    notification execution.

    \drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}

    For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication.

    \begin{note}
    For example:
    \begin{lstlisting}
    info->cookie = do_notify(schid,
    virtqueue_get_queue_index(vq),
    info->cookie);
    \end{lstlisting}
    \end{note}
    ==

    Cheers,
    Rusty.




  • 19.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-21-2014 04:48
    Cornelia Huck <cornelia.huck@de.ibm.com> writes: > On Wed, 19 Feb 2014 17:09:52 +1030 > Rusty Russell <rusty@au1.ibm.com> wrote: >> +A driver SHOULD attempt to provide the correct length even if it >> +suppresses length checks. > > This sentence needs to be expanded, or it is hard to understand due to > missing context after the move: > > "A driver SHOULD attempt to provide the correct length in a channel > command even if it suppresses length checks for that command." Thanks, fixed. In some ways, having a single list of requirement is clear, in others it means we double up a bit. >> +devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} >> The device MUST ignore bits 0-31 (counting from the left) of GPR2. >> This aligns passing the subchannel ID with the way it is passed >> for the existing I/O instructions. >> >> Host cookie is an optional per-virtqueue 64 bit value that MAY be >> used by the hypervisor to speed up the notification execution. >> + >> +drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} >> For each notification, the output value is returned in GPR2 and > > Hm, don't we need to split that? It is the device that returns the > value in gpr2 and the driver that should put it into gpr4 for the next > notification. Ah, I was a bit unclear on this. We already say: Host cookie is an optional per-virtqueue 64 bit value that MAY be used by the hypervisor to speed up the notification execution. So the requirement is a SHOULD or a MUST? I tidied the whole thing up a bit more. Here's what I've got for now in my feedback git tree: devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} The device MUST ignore bits 0-31 (counting from the left) of GPR2. This aligns passing the subchannel ID with the way it is passed for the existing I/O instructions. The driver MAY return a 64-bit host cookie in GPR2 to speed up the notification execution. drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication. egin{note} For example: egin{lstlisting} info->cookie = do_notify(schid, virtqueue_get_queue_index(vq), info->cookie); end{lstlisting} end{note} == Cheers, Rusty.


  • 20.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-24-2014 10:20
    On Fri, 21 Feb 2014 15:11:44 +1030
    Rusty Russell <rusty@au1.ibm.com> wrote:


    > I tidied the whole thing up a bit more. Here's what I've got for now
    > in my feedback git tree:
    >
    > \devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    > The device MUST ignore bits 0-31 (counting from the left) of GPR2.
    > This aligns passing the subchannel ID with the way it is passed
    > for the existing I/O instructions.
    >
    > The driver MAY return a 64-bit host cookie in GPR2 to speed up the

    It's the device doing the returning.

    > notification execution.
    >
    > \drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    >
    > For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication.
    >
    > \begin{note}
    > For example:
    > \begin{lstlisting}
    > info->cookie = do_notify(schid,
    > virtqueue_get_queue_index(vq),
    > info->cookie);
    > \end{lstlisting}
    > \end{note}

    Otherwise, looks good.




  • 21.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-24-2014 10:20
    On Fri, 21 Feb 2014 15:11:44 +1030 Rusty Russell <rusty@au1.ibm.com> wrote: > I tidied the whole thing up a bit more. Here's what I've got for now > in my feedback git tree: > > devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} > The device MUST ignore bits 0-31 (counting from the left) of GPR2. > This aligns passing the subchannel ID with the way it is passed > for the existing I/O instructions. > > The driver MAY return a 64-bit host cookie in GPR2 to speed up the It's the device doing the returning. > notification execution. > > drivernormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} > > For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication. > > egin{note} > For example: > egin{lstlisting} > info->cookie = do_notify(schid, > virtqueue_get_queue_index(vq), > info->cookie); > end{lstlisting} > end{note} Otherwise, looks good.


  • 22.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-25-2014 03:52
    Cornelia Huck <cornelia.huck@de.ibm.com> writes:
    > On Fri, 21 Feb 2014 15:11:44 +1030
    > Rusty Russell <rusty@au1.ibm.com> wrote:
    >
    >
    >> I tidied the whole thing up a bit more. Here's what I've got for now
    >> in my feedback git tree:
    >>
    >> \devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
    >> The device MUST ignore bits 0-31 (counting from the left) of GPR2.
    >> This aligns passing the subchannel ID with the way it is passed
    >> for the existing I/O instructions.
    >>
    >> The driver MAY return a 64-bit host cookie in GPR2 to speed up the
    >
    > It's the device doing the returning.

    Oops, fixed.

    Thanks,
    Rusty.




  • 23.  Re: [virtio-comment] [PATCH 15/18] CCW: Separate normative and descriptive sections.

    Posted 02-25-2014 06:14
    Cornelia Huck <cornelia.huck@de.ibm.com> writes: > On Fri, 21 Feb 2014 15:11:44 +1030 > Rusty Russell <rusty@au1.ibm.com> wrote: > > >> I tidied the whole thing up a bit more. Here's what I've got for now >> in my feedback git tree: >> >> devicenormative{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification} >> The device MUST ignore bits 0-31 (counting from the left) of GPR2. >> This aligns passing the subchannel ID with the way it is passed >> for the existing I/O instructions. >> >> The driver MAY return a 64-bit host cookie in GPR2 to speed up the > > It's the device doing the returning. Oops, fixed. Thanks, Rusty.


  • 24.  [PATCH 08/18] Feedback: add normative marker.

    Posted 02-19-2014 06:45
    From http://docs.oasis-open.org/templates/TCHandbook/ConformanceGuidelines.html: Normative statements MUST be referenceable so that a statement may be referenced from another part of a specification, but more importantly so they can be referenced from Conformance Clauses. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- commands.tex 4 ++++ content.tex 2 ++ 2 files changed, 6 insertions(+) diff --git a/commands.tex b/commands.tex index 671757b..c4b19de 100644 --- a/commands.tex +++ b/commands.tex @@ -8,3 +8,7 @@ % How we format a field name
    ewcommand{field}[1]{emph{#1}} + +% Mark a normative paragraph (driver or device) +
    ewcommand{drivernormative}[1]{phantomsectionlabel{drivernormative:#1}} +
    ewcommand{devicenormative}[1]{phantomsectionlabel{devicenormative:#1}} diff --git a/content.tex b/content.tex index 88c6d6a..dd9b8a7 100644 --- a/content.tex +++ b/content.tex @@ -40,11 +40,13 @@ following bits are defined: even a fatal error during device operation. end{description} +drivernormative{Basic Facilities of a Virtio Device / Device Status Field} The driver MUST update field{device status} in the order above to indicate the driver's progress. The driver MUST NOT clear a field{device status} bit. If the driver sets the FAILED bit, it MUST reset the device before attempting to re-initialize. +devicenormative{Basic Facilities of a Virtio Device / Device Status Field} The device MUST initialize field{device status} to 0 upon reset. section{Feature Bits}label{sec:Basic Facilities of a Virtio Device / Feature Bits} -- 1.8.3.2


  • 25.  [PATCH 13/18] PCI: Separate explanatory and normative text.

    Posted 02-19-2014 06:45
    Rather than treat selectors 0 and 1 as special, the wording for features is made more general (though still the same effect). I split the interrupt handler into a separate subsection: it was misleading because it didn't handle configuration interrupts until the next section. It's also non-normative. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 430 +++++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 278 insertions(+), 152 deletions(-) diff --git a/content.tex b/content.tex index d6fdeae..b8e5cc8 100644 --- a/content.tex +++ b/content.tex @@ -844,15 +844,18 @@ Virtio devices are commonly implemented as PCI devices. A Virtio device can be implemented as any kind of PCI device: a Conventional PCI device or a PCI Express -device. A Virtio device using Virtio Over PCI Bus MUST expose to -guest an interface that meets the specification requirements of -the appropriate PCI specification: hyperref[intro:PCI]{[PCI]} -and hyperref[intro:PCIe]{[PCIe]} -respectively. To assure designs meet the latest level +device. To assure designs meet the latest level requirements, designers of Virtio Over PCI devices must refer to the PCI-SIG home page at url{ http://www.pcisig.com} for any approved changes. +devicenormative{Virtio Transport Options / Virtio Over PCI Bus} +A Virtio device using Virtio Over PCI Bus MUST expose to +guest an interface that meets the specification requirements of +the appropriate PCI specification: hyperref[intro:PCI]{[PCI]} +and hyperref[intro:PCIe]{[PCIe]} +respectively. + subsection{PCI Device Discovery}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery} Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through @@ -860,22 +863,25 @@ Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through }. The Subsystem Device ID indicates which virtio device is -supported by the device. The Subsystem Vendor ID SHOULD reflect +supported by the device, as indicated in section
    ef{sec:Device Types}. + +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery} +The Subsystem Vendor ID SHOULD reflect the PCI Vendor ID of the environment (it's currently only used for informational purposes by the driver). +Non-transitional devices MUST have a Revision ID of 1 or higher. + +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery} All drivers MUST match devices with any Revision ID, this is to allow devices to be versioned without breaking drivers. +Drivers MUST match any Revision ID value. + subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery} Transitional devices MUST have a Revision ID of 0 to match legacy drivers. -Non-transitional devices MUST have a Revision ID of 1 or higher. - -Both transitional and non-transitional drivers MUST match -any Revision ID value. - subsection{PCI Device Layout}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout} The device is configured via I/O and/or memory regions (though see @@ -884,11 +890,15 @@ for access via the PCI configuration space), as specified by Virtio Structure PCI Capabilities. Fields of different sizes are present in the device -configuration regions; the driver +configuration regions. +All 32-bit and 16-bit fields are little-endian. + +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout} + +The driver MUST access each field using the “natural” access method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit fields and 8-bit accesses for 8-bit fields. -All 32-bit and 16-bit fields are little-endian. subsection{Virtio Structure PCI Capabilities}label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities} @@ -922,13 +932,7 @@ struct virtio_pci_cap { end{lstlisting} This structure can be followed by extra data, depending on -field{cfg_type}, as documented below. In this case device MUST include -this extra data (from the beginning of the field{cap_vndr} field -through end of the extra data fields if any) -in the capability length as specified by field{cap_len}. -The device MAY append extra data -or padding to any structure beyond that; the driver MUST accept a field{cap_len} value -which is larger than specified here. +field{cfg_type}, as documented below. The fields are interpreted as follows: @@ -960,22 +964,22 @@ The fields are interpreted as follows: #define VIRTIO_PCI_CAP_PCI_CFG 5 end{lstlisting} - Any other value - reserved for future use. Drivers MUST - ignore any vendor-specific capability structure which has - a reserved field{cfg_type} value. + Any other value is reserved for future use. + + Each structure is detailed individually below. The device MAY offer more than one structure of any type - this makes it possible for the device to expose multiple interfaces to drivers. The order of the capabilities in the capability list specifies the order of preference - suggested by the device; drivers SHOULD use the first interface that they can - support. For example, on some hypervisors, notifications using IO accesses are + suggested by the device. + egin{note} + For example, on some hypervisors, notifications using IO accesses are faster than memory accesses. In this case, the device would expose two capabilities with field{cfg_type} set to VIRTIO_PCI_CAP_NOTIFY_CFG: the first one addressing an I/O BAR, the second one addressing a memory BAR. - In this example, the driver SHOULD use the I/O BAR if I/O resources are available, and fall back on + In this example, the driver would use the I/O BAR if I/O resources are available, and fall back on memory BAR when I/O resources are unavailable. - - Each structure is detailed individually below. + end{note} item[field{bar}] values 0x0 to 0x5 specify a Base Address register (BAR) belonging to @@ -984,9 +988,7 @@ The fields are interpreted as follows: The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space or I/O Space. - Any other value is reserved for future use. Drivers MUST - ignore any vendor-specific capability structure which has - a reserved field{bar} value. + Any other value is reserved for future use. item[field{offset}] indicates where the structure begins relative to the base address associated @@ -999,11 +1001,7 @@ The fields are interpreted as follows: field{length} MAY include padding, or fields unused by the driver, or future extensions. - Drivers SHOULD only map part of configuration structure - large enough for device operation. Drivers MUST handle - an unexpectedly large field{length}, but MAY check that field{length} - is large enough for device operation. - + egin{note} For example, a future device might present a large structure size of several MBytes. As current devices never utilize structures larger than 4KBytes in size, @@ -1011,14 +1009,44 @@ The fields are interpreted as follows: 4KBytes (thus ignoring parts of structure after the first 4KBytes) to allow forward compatibility with such devices without loss of functionality and without wasting resources. + end{note} end{description} +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities} + +The driver MUST ignore any vendor-specific capability structure which has +a reserved field{cfg_type} value. + +The driver SHOULD use the first instance of each virtio structure type they can +support. + +The driver MUST accept a field{cap_len} value which is larger than specified here. + +The driver MUST ignore any vendor-specific capability structure which has +a reserved field{bar} value. + + The drivers SHOULD only map part of configuration structure + large enough for device operation. The drivers MUST handle + an unexpectedly large field{length}, but MAY check that field{length} + is large enough for device operation. + +The driver MUST NOT write into any field of the capability structure, +with the exception of those with field{cap_type} VIRTIO_PCI_CAP_PCI_CFG as +detailed in
    ef{drivernormative:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}. + +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities} + +The device MUST include any extra data (from the beginning of the field{cap_vndr} field +through end of the extra data fields if any) in field{cap_len}. +The device MAY append extra data +or padding to any structure beyond that. + +If the device presents multiple structures of the same type, it SHOULD order +them from optimal (first) to least-optimal (last). + subsubsection{Common configuration structure layout}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout} The common configuration structure is found at the field{bar} and field{offset} within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below. -field{offset} must be 4-byte aligned. - -The device MUST present at least one common configuration capability. egin{lstlisting} struct virtio_pci_common_cfg { @@ -1047,8 +1075,7 @@ struct virtio_pci_common_cfg { egin{description} item[field{device_feature_select}] The driver uses this to select which feature bits field{device_feature} shows. - Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63. - The device MUST present 0 on field{device_feature} for any other value. + Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc. item[field{device_feature}] The device uses this to report which feature bits it is @@ -1057,14 +1084,7 @@ struct virtio_pci_common_cfg { item[field{driver_feature_select}] The driver uses this to select which feature bits field{driver_feature} shows. - Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63. - When set to any other value: - egin{itemize} - item the device MUST return 0 on reads from field{driver_feature} - item the device MUST ignore writing of 0 into field{driver_feature} - item the driver MUST NOT write any non 0 value into field{driver_feature} (a corollary of - the rule that the driver can only write a subset of device features). - end{itemize} + Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc. item[field{driver_feature}] The driver writes this to accept feature bits offered by the device. @@ -1082,10 +1102,7 @@ struct virtio_pci_common_cfg { item[field{config_generation}] Configuration atomicity value. The device changes this every time the - configuration noticeably changes. This means the device may - only change the value after a configuration read operation, - but MUST change it if there is any risk of a driver seeing an - inconsistent configuration state. + configuration noticeably changes. item[field{queue_select}] Queue Select. The driver selects which virtqueue the following @@ -1094,7 +1111,7 @@ struct virtio_pci_common_cfg { item[field{queue_size}] Queue Size. On reset, specifies the maximum queue size supported by the hypervisor. This can be modified by driver to reduce memory requirements. - The device MUST set this to 0 if this virtqueue is unavailable. + A 0 means the queue is unavailable. item[field{queue_msix_vector}] The driver uses this to specify the queue vector for MSI-X. @@ -1103,14 +1120,12 @@ struct virtio_pci_common_cfg { The driver uses this to selectively prevent the device from executing requests from this virtqueue. 1 - enabled; 0 - disabled. - The driver MUST configure the other virtqueue fields before enabling - the virtqueue. - item[field{queue_notify_off}] The driver reads this to calculate the offset from start of Notification structure at which this virtqueue is located. - Note: this is em{not} an offset in bytes. + egin{note} this is em{not} an offset in bytes. See
    ef{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below. + end{note} item[field{queue_desc}] The driver writes the physical address of Descriptor Table here. See section
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues}. @@ -1122,12 +1137,58 @@ struct virtio_pci_common_cfg { The driver writes the physical address of Used Ring here. See section
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues}. end{description} -subsubsection{Notification structure layout}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout} +field{offset} MUST be 4-byte aligned. -The device MUST present at least one notification capability. +The device MUST present at least one common configuration capability. + +The device MUST present the feature bits it is offering in field{device_feature}, starting at bit field{device_feature_select} $*$ 32 for any field{device_feature_select} written by the driver. +egin{note} + This means that it will present 0 for any field{device_feature_select} other than 0 or 1, since no feature defined here exceeds 63. +end{note} + +The device MUST present any valid feature bits the driver has written in field{driver_feature}, starting at bit field{driver_feature_select} $*$ 32 for any field{driver_feature_select} written by the driver. Valid feature bits are those which are subset of the corresponding field{device_feature} bits. The device MAY present invalid bits written by the driver. + +egin{note} + This means that a device can ignore writes for feature bits it never + offers, and simply present 0 on reads. Or it can just mirror what the driver wrote + (but it will still have to check them when the driver sets FEATURES_OK). +end{note} + +egin{note} + A driver shouldn't write invalid bits anyway, as per
    ef{drivernormative:General Initialization And Device Operation / Device Initialization}, but this attempts to handle it. +end{note} + +The device MUST present a changed field{config_generation} after the +driver has read a device-specific configuration value which has +changed since any part of the device-specific configuration was last +read. +egin{note} +As field{config_generation} is an 8-bit value, simply incrementing it +on every configuration change may violate this requirement due to wrap. +Better would be to set an internal flag when it has changed, +and if that flag is set when the driver reads from the device-specific +configuration, increment field{config_generation} and clear the flag. +end{note} + +The device MUST reset when 0 is written to field{device_status}. + +The device MUST present a 0 in field{queue_size} if the virtqueue +corresponding to the current field{queue_select} is unavailable. + +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout} + +The driver MUST NOT write to field{device_feature}, field{num_queues}, field{config_generation} or field{queue_notify_off}. + +The driver MUST NOT write a value which is not a power of 2 to field{queue_size}. + +The driver MUST configure the other virtqueue fields before enabling the virtqueue +with field{queue_enable}. + +subsubsection{Notification structure layout}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG -capability. The field{offset} must be 2-byte aligned. This capability is immediately followed by an additional +capability. This capability is immediately followed by an additional field, like so: egin{lstlisting} @@ -1137,9 +1198,6 @@ struct virtio_pci_notify_cap { }; end{lstlisting} -The device MUST either present field{notify_off_multiplier} as an even power of 2, -or present field{notify_off_multiplier} as 0. - field{notify_off_multiplier} is combined with the field{queue_notify_off} to derive the Queue Notify address within a BAR for a specific queue: @@ -1151,12 +1209,23 @@ The field{cap.offset} and field{notify_off_multiplier} are taken from the notification capability structure above, and the field{queue_notify_off} is taken from the common configuration structure. +egin{note} For example, if field{notifier_off_multiplier} is 0, the device uses the same Queue Notify address for all queues. +end{note} + +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} +The device MUST present at least one notification capability. + +The field{cap.offset} must be 2-byte aligned. + +The device MUST either present field{notify_off_multiplier} as an even power of 2, +or present field{notify_off_multiplier} as 0. The value field{cap.length} presented by the device MUST be at least 2 and MUST be large enough to support queue notification offsets for all supported queues in all possible configurations. + For all queues, the value field{cap.length} presented by the device MUST satisfy: egin{lstlisting} cap.length >= queue_notify_off * notify_off_multiplier + 2 @@ -1164,16 +1233,14 @@ cap.length >= queue_notify_off * notify_off_multiplier + 2 subsubsection{ISR status capability}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability} -The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability. This -refers to at least a single byte, which contains the 8-bit ISR status field. +The VIRTIO_PCI_CAP_ISR_CFG capability +refers to at least a single byte, which contains the 8-bit ISR status field +to be used for INT#x interrupt handling. The field{offset} for the field{ISR status} has no specific alignment requirements. -subsection{ISR status field}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status field} - -field{ISR status} is used for INT#x interrupt handling. -Driver MUST NOT access field{ISR status} when MSI-X capability -is enabled. +The ISR bits allow the device to distinguish between device-specific configuration +change interrupts and normal virtqueue interrupts: egin{tabular}{ l l l l } hline @@ -1183,29 +1250,43 @@ Purpose & Device Configuration Interrupt & Queue Interrupt & Reserved \ hline end{tabular} -If MSI-X capability is disabled, device MUST set Interrupt Status +To avoid an extra access, simply reading this register resets it to 0 and +causes the device to de-assert the interrupt. + +In this way, driver read of ISR status causes the device to de-assert +an interrupt. + +See sections
    ef{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} and
    ef{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used. + +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability} + +The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability. + +If MSI-X capability is disabled, the device MUST set the Interrupt Status bit in the PCI Status register in the PCI Configuration Header of the device to the logical OR of all bits in field{ISR status} of -the device. Device then asserts/deasserts INT#x interrupts unless masked +the device. The device then asserts/deasserts INT#x interrupts unless masked according to standard PCI rules hyperref[intro:PCI]{[PCI]}. -Device MUST reset field{ISR status} to 0 on read. +The device MUST reset field{ISR status} to 0 on driver read. -In this way, driver read of field{ISR status} causes the device to de-assert -an interrupt. +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability} -See sections
    ef{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} and
    ef{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used. +The driver MUST NOT access the ISR field when MSI-X capability +is enabled. subsubsection{Device specific structure}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device specific structure} The device MAY present at least one VIRTIO_PCI_CAP_DEVICE_CFG capability (some devices may not have any device specific structure). +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device specific structure} + The field{offset} for the device specific structure must be 4-byte aligned. subsubsection{PCI configuration access capability}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability} -The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG. This +The VIRTIO_PCI_CAP_PCI_CFG capability creates an alternative (and likely suboptimal) access method to the common configuration, notification, ISR and device-specific regions. @@ -1218,8 +1299,8 @@ struct virtio_pci_cfg_cap { }; end{lstlisting} -The fields field{cap.bar}, field{cap.legth}, field{cap.offset} and -field{pci_cfg_data} are read-write (RW). +The fields field{cap.bar}, field{cap.length}, field{cap.offset} and +field{pci_cfg_data} are read-write (RW) for the driver. To access a device region, the driver writes into the capability structure (ie. within the PCI configuration space) as follows: @@ -1231,13 +1312,16 @@ structure (ie. within the PCI configuration space) as follows: field{cap.length}. item The driver sets the offset within the BAR by writing to - field{cap.offset}. The driver MUST NOT write an offset which is not - a multiple of field{cap.length} (ie. all accesses must be aligned). + field{cap.offset}. end{itemize} At that point, field{pci_cfg_data} will provide a window of size field{cap.length} into the given field{cap.bar} at offset field{cap.offset}. +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability} + +The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability. + Upon detecting driver write access to field{pci_cfg_data}, the device MUST execute a write access at offset field{cap.offset} at BAR selected by field{cap.bar} using the first field{cap.length} @@ -1249,6 +1333,11 @@ execute a read access of length cap.length at offset field{cap.offset} at BAR selected by field{cap.bar} and store the first field{cap.length} bytes in field{pci_cfg_data}. +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability} + +The driver MUST NOT write a field{cap.offset} which is not +a multiple of field{cap.length} (ie. all accesses must be aligned). + subsubsection{Legacy Interfaces: A Note on PCI Device Layout}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout} Transitional devices should present part of configuration @@ -1327,18 +1416,17 @@ Space}~
    ameref{sec:Basic Facilities of a Virtio Device / Device Configuration S subsubsection{Device Initialization}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization} This documents PCI-specific steps executed during Device Initialization. -As the first step, driver must detect device configuration layout -to locate configuration fields in memory, I/O or PCI configuration space of the -device. paragraph{Virtio Device Configuration Layout Detection}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection} As a prerequisite to device initialization, the driver scans the PCI capability list, detecting virtio configuration layout using Virtio -Structure PCI capabilities. +Structure PCI capabilities as detailed in
    ef{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities} paragraph{Non-transitional Device With Legacy Driver}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Non-transitional Device With Legacy Driver} +drivernormative{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Non-transitional Device With Legacy Driver} + Non-transitional devices, on a platform where a legacy driver for a legacy device with the same ID might have previously existed, MUST take the following steps to fail gracefully when a legacy @@ -1379,27 +1467,11 @@ When MSI-X capability is present and enabled in the device (through standard PCI configuration space) field{config_msix_vector} and field{queue_msix_vector} are used to map configuration change and queue interrupts to MSI-X vectors. In this case, the ISR Status is unused. -A device that has an MSI-X capability SHOULD support at least 2 -and at most 0x800 MSI-X vectors. -Device MUST report the number of vectors supported in -field{Table Size} in the MSI-X Capability as specified in -hyperref[intro:PCI]{[PCI]}. -Driver MUST support device with any MSI-X Table Size 0 to 0x7FF. -Driver MAY fall back on using INT#x interrupts for a device -which only supports one MSI-X vector (MSI-X Table Size = 0). - -Driver MAY intepret the Table Size as a hint from the device -for the suggested number of MSI-X vectors to use. -Therefore, devices SHOULD restrict the reported MSI-X Table Size field -to a value that might benefit system performance. -For example, a device which does not expect to send -interrupts at a high rate might only specify 2 MSI-X vectors. - Writing a valid MSI-X Table entry number, 0 to 0x7FF, to field{config_msix_vector}/field{queue_msix_vector} maps interrupts triggered by the configuration change/selected queue events respectively to the corresponding MSI-X vector. To disable interrupts for a -specific event type, unmap this event by writing a special NO_VECTOR +specific event type, the driver unmaps this event by writing a special NO_VECTOR value: egin{lstlisting} @@ -1407,31 +1479,57 @@ value: #define VIRTIO_MSI_NO_VECTOR 0xffff end{lstlisting} -Driver MUST NOT attempt to map an event to a vector -outside the MSI-X Table supported by the device, -as reported by field{Table Size} in the MSI-X Capability. +Note that mapping an event to vector might require device to +allocate internal device resources, and thus could fail. + +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration} + +A device that has an MSI-X capability SHOULD support at least 2 +and at most 0x800 MSI-X vectors. +Device MUST report the number of vectors supported in +field{Table Size} in the MSI-X Capability as specified in +hyperref[intro:PCI]{[PCI]}. +The device SHOULD restrict the reported MSI-X Table Size field +to a value that might benefit system performance. +egin{note} +For example, a device which does not expect to send +interrupts at a high rate might only specify 2 MSI-X vectors. +end{note} Device MUST support mapping any event type to any valid vector 0 to MSI-X field{Table Size}. Device MUST support unmapping any event type. -Reading these registers returns vector mapped to a given event, -or NO_VECTOR if unmapped. All queue and configuration change -events are unmapped by default. +The device MUST return vector mapped to a given event, +(NO_VECTOR if unmapped) on read of field{config_msix_vector}/field{queue_msix_vector}. +The device MUST have all queue and configuration change +events are unmapped upon reset. -Note that mapping an event to vector might require device to -allocate internal device resources, and MAY fail. Devices MUST report such +Devices SHOULD NOT cause mapping an event to vector to fail +unless it is impossible for the device to satisfy the mapping +request. Devices MUST report mapping failures by returning the NO_VECTOR value when the relevant -Vector field is read. After mapping an event to vector, the +field{config_msix_vector}/field{queue_msix_vector} field is read. + +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration} + +Driver MUST support device with any MSI-X Table Size 0 to 0x7FF. +Driver MAY fall back on using INT#x interrupts for a device +which only supports one MSI-X vector (MSI-X Table Size = 0). + +Driver MAY intepret the Table Size as a hint from the device +for the suggested number of MSI-X vectors to use. + +Driver MUST NOT attempt to map an event to a vector +outside the MSI-X Table supported by the device, +as reported by field{Table Size} in the MSI-X Capability. + +After mapping an event to vector, the driver MUST verify success by reading the Vector field value: on success, the previously written value is returned, and on failure, NO_VECTOR is returned. If a mapping failure is detected, the driver MAY retry mapping with fewer vectors, disable MSI-X or report device failure. -Devices SHOULD NOT cause mapping an event to vector to fail -unless it is impossible for the device to satisfy the mapping -request. - paragraph{Virtqueue Configuration}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration} As a device can have zero or more virtqueues for bulk data @@ -1439,13 +1537,12 @@ transport (for example, the simplest network device has two), the driver needs to configure them as part of the device-specific configuration. -The driver does this as follows, for each virtqueue a device has: +The driver typically does this as follows, for each virtqueue a device has: egin{enumerate} item Write the virtqueue index (first queue is 0) to field{queue_select}. -item Read the virtqueue size from field{queue_size}, which MUST - be a power of 2. This controls how big the virtqueue is +item Read the virtqueue size from field{queue_size}. This controls how big the virtqueue is (see
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues}~
    ameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If this field is 0, the virtqueue does not exist. item Optionally, select a smaller virtqueue size and write it to field{queue_size}. @@ -1476,7 +1573,7 @@ of this virtqueue to the Queue Notify address. See
    ef{sec:Virtio Transport Op subsubsection{Virtqueue Interrupts From The Device}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} -If an interrupt is necessary for a virtqueue, the device SHOULD: +If an interrupt is necessary for a virtqueue, the device would typically act as follows: egin{itemize} item If MSI-X capability is disabled: @@ -1488,31 +1585,18 @@ If an interrupt is necessary for a virtqueue, the device SHOULD: item If MSI-X capability is enabled: egin{enumerate} - item Request the appropriate MSI-X interrupt message for the + item If field{queue_msix_vector} is not NO_VECTOR, + request the appropriate MSI-X interrupt message for the device, field{queue_msix_vector} sets the MSI-X Table entry number. - - item If the vector value is NO_VECTOR, no interrupt - message is requested for this event, so the device MUST NOT - deliver an interrupt. end{enumerate} end{itemize} -The driver interrupt handler SHOULD: +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} -egin{itemize} - item If MSI-X capability is disabled: read the ISR Status field, - which will reset it to zero. If the lower bit is zero, the - interrupt was not for this device. Otherwise, the driver - SHOULD look through the used rings of all virtqueues for the - device, to see if any progress has been made by the device - which requires servicing. - - item If MSI-X capability is enabled: look through the used rings of - all virtqueues mapped to the specific MSI-X vector for the - device, to see if any progress has been made by the device - which requires servicing. -end{itemize} +If MSI-X capability is enabled and field{queue_msix_vector} is +NO_VECTOR for a virtqueue, the device MUST NOT deliver an interrupt +for that virtqueue. subsubsection{Notification of Device Configuration Changes}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} @@ -1520,19 +1604,61 @@ Some virtio PCI devices can change the device configuration state, as reflected in the device-specific region of the device. In this case: egin{itemize} - item If MSI-X capability is disabled: an interrupt is delivered and - the second lowest bit is set in the ISR Status field to - indicate that the driver should re-examine the configuration - space. Note that a single interrupt can indicate both that one - or more virtqueue has been used and that the configuration - space has changed: even if the config bit is set, virtqueues - MUST be scanned. - - item If MSI-X capability is enabled: an interrupt message is - requested. field{config_msix_vector} sets the MSI-X Table - entry number to use. If field{config_msix_vector} is - NO_VECTOR, no interrupt message is requested for this event and - the device MUST NOT deliver an interrupt. + item If MSI-X capability is disabled: + egin{enumerate} + item Set the second lower bit of the ISR Status field for the device. + + item Send the appropriate PCI interrupt for the device. + end{enumerate} + + item If MSI-X capability is enabled: + egin{enumerate} + item If field{config_msix_vector} is not NO_VECTOR, + request the appropriate MSI-X interrupt message for the + device, field{config_msix_vector} sets the MSI-X Table entry + number. + end{enumerate} +end{itemize} + +A single interrupt MAY indicate both that one or more virtqueue has +been used and that the configuration space has changed. + +devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} + +If MSI-X capability is enabled and field{config_msix_vector} is +NO_VECTOR, the device MUST NOT deliver an interrupt +for device configuration space changes. + +drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} + +A driver MUST handle the case where the same interrupt is used to indicate +both device configuration space change and one or more virtqueues being used. + +subsubsection{Driver Handling Interrupts}label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Driver Handling Interrupts} +The driver interrupt handler would typically: + +egin{itemize} + item If MSI-X capability is disabled: + egin{itemize} + item Read the ISR Status field, which will reset it to zero. + item If the lower bit is set: + look through the used rings of all virtqueues for the + device, to see if any progress has been made by the device + which requires servicing. + item If the second lower bit is set: + re-examine the configuration space to see what changed. + end{itemize} + item If MSI-X capability is enabled: + egin{itemize} + item + Look through the used rings of + all virtqueues mapped to the specific MSI-X vector for the + device, to see if any progress has been made by the device + which requires servicing. + item + If the MSI-X vector is equal to field{config_msix_vector}, + re-examine the configuration space to see what changed. + end{itemize} end{itemize} section{Virtio Over MMIO}label{sec:Virtio Transport Options / Virtio Over MMIO} -- 1.8.3.2


  • 26.  [PATCH 11/18] Feedback: Normative split for Basic Facilities of a Virtio Device / Virtqueues / Message Framing

    Posted 02-19-2014 06:45
    Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/content.tex b/content.tex index 1decb0a..966b293 100644 --- a/content.tex +++ b/content.tex @@ -290,8 +290,7 @@ endian of the guest, not little-endian as specified by this standard. It is assumed that the host is already aware of the guest endian. subsection{Message Framing}label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing} -The device MUST NOT make assumptions about the particular arrangement -of descriptors: the message framing is +The framing of messages with descriptors is independent of the contents of the buffers. For example, a network transmit buffer consists of a 12 byte header followed by the network packet. This could be most simply placed in the descriptor table as a @@ -300,13 +299,22 @@ but it could also consist of a single 1526 byte output descriptor in the case where the header and packet are adjacent, or even three or more descriptors (possibly with loss of efficiency in that case). -Note that, some implementations may have large-but-reasonable +Note that, some device implementations have large-but-reasonable restrictions on total descriptor size (such as based on IOV_MAX in the host OS). This has not been a problem in practice: little sympathy will be given to drivers which create unreasonably-sized descriptors such as by dividing a network packet into 1500 single-byte descriptors! +devicenormative{Basic Facilities of a Virtio Device / Message Framing} +The device MUST NOT make assumptions about the particular arrangement +of descriptors. The device MAY have a reasonable limit of descriptors +it will allow in a chain. + +drivernormative{Basic Facilities of a Virtio Device / Message Framing} +The driver SHOULD NOT use an excessive number of descriptors to +describe a buffer. + subsubsection{Legacy Interface: Message Framing}label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing} Regrettably, initial driver implementations used simple layouts, and -- 1.8.3.2


  • 27.  [PATCH 07/18] Feedback: 2.1 Device Status field: Separate description from normative.

    Posted 02-19-2014 06:45
    Start with explanation, progress to normative requirements. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/content.tex b/content.tex index 13e7749..88c6d6a 100644 --- a/content.tex +++ b/content.tex @@ -14,13 +14,10 @@ device consists of the following parts: section{field{Device Status} Field}label{sec:Basic Facilities of a Virtio Device / Device Status Field} -The driver MUST update the field{device status} field in the order below to -indicate its progress. This provides a simple low-level diagnostic: -it's most useful to imagine them hooked up to traffic lights on the -console indicating the status of each device. The driver MUST NOT -clear a field{device status} bit. - -field{device status} is 0 upon reset, otherwise at least one bit should be set: +The field{device status} field provides a simple low-level +diagnostic: it's most useful to imagine them hooked up to traffic +lights on the console indicating the status of each device. The +following bits are defined: egin{description} item[ACKNOWLEDGE (1)] Indicates that the guest OS has found the @@ -40,10 +37,16 @@ clear a field{device status} bit. item[FAILED (128)] Indicates that something went wrong in the guest, and it has given up on the device. This could be an internal error, or the driver didn't like the device for some reason, or - even a fatal error during device operation. The driver MUST - reset the device before attempting to re-initialize. + even a fatal error during device operation. end{description} +The driver MUST update field{device status} in the order above to +indicate the driver's progress. The driver MUST NOT clear a +field{device status} bit. If the driver sets the FAILED bit, +it MUST reset the device before attempting to re-initialize. + +The device MUST initialize field{device status} to 0 upon reset. + section{Feature Bits}label{sec:Basic Facilities of a Virtio Device / Feature Bits} Each virtio device offers all the features it understands. During -- 1.8.3.2


  • 28.  [PATCH 12/18] Feedback: Separate the rest of chapter 2 into normative vs explanatory.

    Posted 02-19-2014 06:45
    The big change here is in introducing new subsections for interrupt and notification suppression, and moving all requirements into them. The example processing loop is also moved into a note, to show clearly that it's not normative. Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 284 +++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 164 insertions(+), 120 deletions(-) diff --git a/content.tex b/content.tex index 966b293..d6fdeae 100644 --- a/content.tex +++ b/content.tex @@ -49,6 +49,8 @@ it MUST reset the device before attempting to re-initialize. devicenormative{Basic Facilities of a Virtio Device / Device Status Field} The device MUST initialize field{device status} to 0 upon reset. +The device MUST NOT consume buffers or notify the driver before DRIVER_OK. + section{Feature Bits}label{sec:Basic Facilities of a Virtio Device / Feature Bits} Each virtio device offers all the features it understands. During @@ -166,6 +168,12 @@ tail padding, and accept any device configuration space size equal to or greater than the specified 8-bit size. end{note} +devicenormative{Basic Facilities of a Virtio Device / Device Configuration Space} +The device MUST allow reading of any device-specific configuration +field before FEATURES_OK is set by the driver. This includes fields which are +conditional on feature bits, as long as those feature bits are offered +by the device. + subsection{Legacy Interface: A Note on Device Configuration Space endian-ness}label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: A Note on Configuration Space endian-ness} Note that for legacy interfaces, device configuration space is generally the @@ -334,18 +342,12 @@ the device. field{addr} is a physical address, and the buffers can be chained via field{next}. Each descriptor describes a buffer which is read-only for the device (``device-readable'') or write-only for the device (``device-writable''), but a chain of descriptors can contain both device-readable and device-writable buffers. -A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT -read a device-writable buffer (it might do so for debugging or diagnostic -purposes). The actual contents of the memory offered to the device depends on the device type. Most common is to begin the data with a header (containing little-endian fields) for the device to read, and postfix it with a status tailer for the device to write. -Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total; -this implies that loops in the descriptor chain are forbidden! - egin{lstlisting} /* Note: LEGACY version was not little endian! */ struct vring_desc { @@ -370,6 +372,15 @@ struct vring_desc { The number of descriptors in the table is defined by the queue size for this virtqueue. +devicenormative{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} +A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT +read a device-writable buffer (it MAY do so for debugging or diagnostic +purposes). + +drivernormative{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} +Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total; +this implies that loops in the descriptor chain are forbidden! + subsubsection{Indirect Descriptors}label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} Some devices benefit by concurrently dispatching a large number @@ -381,9 +392,6 @@ containing this indirect descriptor table; field{addr} and field{len} refer to the indirect table address and length in bytes, respectively. -The driver MUST NOT set the VRING_DESC_F_INDIRECT flag unless the -VIRTIO_RING_F_INDIRECT_DESC feature was negotiated. - The indirect table layout structure looks like this (field{len} is the length of the descriptor that refers to this table, which is a variable, so this code won't compile): @@ -399,11 +407,17 @@ The first indirect descriptor is located at start of the indirect descriptor table (index 0), additional indirect descriptors are chained by field{next}. An indirect descriptor without a valid field{next} (with field{flags}&VRING_DESC_F_NEXT off) signals the end of the descriptor. -An -indirect descriptor can not refer to another indirect descriptor -table (field{flags}&VRING_DESC_F_INDIRECT MUST be off). A single indirect descriptor -table can include both device-readable and device-writable descriptors; -the device MUST ignore the write-only flag (field{flags}&VRING_DESC_F_WRITE) in the descriptor that refers to it. +A single indirect descriptor +table can include both device-readable and device-writable descriptors. + +drivernormative{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} +The driver MUST NOT set the VRING_DESC_F_INDIRECT flag unless the +VIRTIO_RING_F_INDIRECT_DESC feature was negotiated. The driver MUST NOT +set the VRING_DESC_F_INDIRECT flag within an indirect descriptor (ie. only +one table per descriptor). + +devicenormative{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} +The device MUST ignore the write-only flag (field{flags}&VRING_DESC_F_WRITE) in the descriptor that refers to an indirect table. subsection{The Virtqueue Available Ring}label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring} @@ -425,22 +439,64 @@ written by the driver and read by the device. field{idx} field indicates where the driver would put the next descriptor entry in the ring (modulo the queue size). This starts at 0, and increases. +subsection{Virtqueue Interrupt Suppression}label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} + If the VIRTIO_RING_F_EVENT_IDX feature bit is not negotiated, -field{flags} field offers a crude interrupt control mechanism. The driver -MUST set this to 0 or 1: 1 indicates that the device SHOULD NOT send -an interrupt when it consumes a descriptor chain from the available -ring. The device MUST ignore the field{used_event} value in this case. - -Otherwise, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated, -the driver MUST set field{flags} to 0, and use field{used_event} -in the used ring instead. The driver can ask the device to delay interrupts -until an entry with an index specified by field{used_event} is -written in the used ring (equivalently, until field{idx} in the +the field{flags} field in the available ring offers a crude mechanism for the driver to inform +the device that it doesn't want interrupts when buffers are used. Otherwise +field{used_event} is a more performant alterative where the driver +specifies how much progress the device should make before interrupting. + +Neither of these interrupt suppression methods are reliable, as they +are not explicitly synchronized with the device, but they serve as +useful optimizations. + +drivernormative{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} +If the VIRTIO_RING_F_EVENT_IDX feature bit is not negotiated: +egin{itemize} +item The driver MUST set field{flags} to 0 or 1. +item The driver MAY set field{flags} to 1 to advise +the device that interrupts are not required. +end{itemize} + +Otherwise, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated: +egin{itemize} +item The driver MUST set field{flags} to 0. +item The driver MAY use field{used_event} to advise the device that interrupts are not required until the device writes entry with an index specified by field{used_event} into the used ring (equivalently, until field{idx} in the used ring will reach the value field{used_event} + 1). +end{itemize} + +The driver MUST handle spurious interrupts from the device. -The driver MUST handle spurious interrupts: either form of interrupt -suppression is merely an optimization; it may not suppress interrupts -entirely. +devicenormative{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} + +If the VIRTIO_RING_F_EVENT_IDX feature bit is not negotiated: +egin{itemize} +item The device MUST ignore the field{used_event} value. +item After the device writes a descriptor index into the used ring: + egin{itemize} + item If field{flags} is 1, the device SHOULD NOT send an interrupt. + item If field{flags} is 0, the device MUST send an interrupt. + end{itemize} +end{itemize} + +Otherwise, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated: +egin{itemize} +item The device MUST ignore the lower bit of field{flags}. +item After the device writes a descriptor index into the used ring: + egin{itemize} + item If the field{idx} field in the used ring (which determined + where that descriptor index was placed) was equal to + field{used_event}, the device MUST send an interrupt. + item Otherwise the device SHOULD NOT send an interrupt. + end{itemize} +end{itemize} + +egin{note} +For example, if field{used_event} is 0, then a device using + VIRTIO_RING_F_EVENT_IDX would interrupt after the first buffer is + used (and again after the 65536th buffer, etc). +end{note} subsection{The Virtqueue Used Ring}label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} @@ -475,24 +531,56 @@ for drivers using untrusted buffers: if you do not know exactly how much has been written by the device, you usually have to zero the buffer to ensure no data leakage occurs. -If the VIRTIO_RING_F_EVENT_IDX feature bit is not negotiated, -field{flags} offers a crude interrupt control mechanism. The driver -MUST initialize this to 0, the device MUST set this to 0 or 1: 1 -indicates that the driver SHOULD NOT send an notification when it adds -a descriptor chain to the available ring. The driver MUST ignore the -field{used_event} value in this case. - -Otherwise, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated, -the device MUST leave field{flags} at 0, and use -field{avail_event} in the used ring instead. The device can ask the -driver to delay notifications until an entry with an index specified -by field{avail_event} is written in the available ring (equivalently, -until field{idx} in the used ring will reach the value field{avail_event} + -1). - -The device MUST handle spurious notification: either form of -notification suppression is merely an optimization; it may not -suppress them entirely. +subsection{Virtqueue Notification Suppression}label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} + +The device can suppress notifications in a manner analogous to the way +drivers can suppress interrupts as detailed in section
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}. +The device manipulates field{flags} or field{avail_event} in the used ring the +same way the driver manipulates field{flags} or field{used_event} in the available ring. + +drivernormative{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} + +The driver MUST initialize field{flags} in the used ring to 0 when +allocating the used ring. + +If the VIRTIO_RING_F_EVENT_IDX feature bit is not negotiated: +egin{itemize} +item The driver MUST ignore the field{avail_event} value. +item After the driver writes a descriptor index into the available ring: + egin{itemize} + item If field{flags} is 1, the driver SHOULD NOT send a notification. + item If field{flags} is 0, the driver MUST send a notification. + end{itemize} +end{itemize} + +Otherwise, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated: +egin{itemize} +item The driver MUST ignore the lower bit of field{flags}. +item After the driver writes a descriptor index into the available ring: + egin{itemize} + item If the field{idx} field in the available ring (which determined + where that descriptor index was placed) was equal to + field{avail_event}, the driver MUST send a notification. + item Otherwise the driver SHOULD NOT send a notification. + end{itemize} +end{itemize} + +devicenormative{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} +If the VIRTIO_RING_F_EVENT_IDX feature bit is not negotiated: +egin{itemize} +item The device MUST set field{flags} to 0 or 1. +item The device MAY set field{flags} to 1 to advise +the driver that notifications are not required. +end{itemize} + +Otherwise, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated: +egin{itemize} +item The device MUST set field{flags} to 0. +item The device MAY use field{avail_event} to advise the driver that notifications are not required until the driver writes entry with an index specified by field{avail_event} into the available ring (equivalently, until field{idx} in the +available ring will reach the value field{avail_event} + 1). +end{itemize} + +The device MUST handle spurious notifications from the driver. subsection{Helpers for Operating Virtqueues}label{sec:Basic Facilities of a Virtio Device / Virtqueues / Helpers for Operating Virtqueues} @@ -512,6 +600,7 @@ how to communicate with the specific device. section{Device Initialization}label{sec:General Initialization And Device Operation / Device Initialization} +drivernormative{General Initialization And Device Operation / Device Initialization} The driver MUST follow this sequence to initialize a device: egin{enumerate} @@ -524,9 +613,6 @@ The driver MUST follow this sequence to initialize a device: item Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. - The device MUST allow reading of any device-specific configuration field - before FEATURES_OK is set. This includes fields which are conditional - on feature bits, as long as those feature bits are offered by the device. itemlabel{itm:General Initialization And Device Operation / Device Initialization / Set FEATURES-OK} Set the FEATURES_OK status bit. The driver MUST not accept new feature bits after this step. @@ -548,13 +634,7 @@ set the FAILED status bit to indicate that it has given up on the device (it can reset the device later to restart if desired). The driver MUST NOT continue initialization in that case. -The device MUST NOT consume buffers or notify the driver before DRIVER_OK, and the driver -MUST NOT notify the device before it sets DRIVER_OK. - -Devices SHOULD support all valid combinations of features, but we know -that implementations may well make assuptions that they will only be -used by fully-optimized drivers. The resetting of the FEATURES_OK flag -provides a semi-graceful failure mode for this case. +The driver MUST NOT notify the device before setting DRIVER_OK. subsection{Legacy Interface: Device Initialization}label{sec:General Initialization And Device Operation / Device Initialization / Legacy Interface: Device Initialization} Legacy devices do not support the FEATURES_OK status bit, and thus did @@ -592,20 +672,20 @@ The driver offers buffers to one of the device's virtqueues as follows: itemlabel{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Index} The driver places the index of the head of the descriptor chain into the next ring entry of the available ring. -item Steps
    ef{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Buffers} and
    ef{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Index} may be performed repeatedly if batching +item Steps
    ef{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Buffers} and
    ef{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Index} MAY be performed repeatedly if batching is possible. -item The driver MUST perform suitable a memory barrier to ensure the device sees +item The driver performs suitable a memory barrier to ensure the device sees the updated descriptor table and available ring before the next step. item The available field{idx} is increased by the number of descriptor chain heads added to the available ring. -item The driver MUST perform a suitable memory barrier to ensure that it updates +item The driver performs a suitable memory barrier to ensure that it updates the field{idx} field before checking for notification suppression. -item If notifications are not suppressed, the driver MUST notify the device +item If notifications are not suppressed, the driver notifies the device of the new available buffers. end{enumerate} @@ -618,7 +698,7 @@ In addition, the maximum queue size is 32768 (it must be a power of 2 which fits in 16 bits), so the 16-bit field{idx} value can always distinguish between a full and empty buffer. -Here is a description of each stage in more detail. +What follows is the requirements of each stage in more detail. subsubsection{Placing Buffers Into The Descriptor Table}label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Placing Buffers Into The Descriptor Table} @@ -669,12 +749,6 @@ avail->ring[(avail->idx + added++) % qsz] = head; subsubsection{Updating field{idx}}label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Updating idx} -Once available field{idx} is updated by driver, the device MAY -access the descriptor chains the driver created and the -memory they refer to. This is why the driver SHOULD generally -use a memory barrier before the field{idx} update, to ensure the -device sees the most up-to-date copy. - field{idx} always increments, and wraps naturally at 65536: @@ -682,72 +756,39 @@ device sees the most up-to-date copy. avail->idx += added; end{lstlisting} +Once available field{idx} is updated by the driver, this exposes the +descriptor and its contents. The device MAY +access the descriptor chains the driver created and the +memory they refer to immediately. + +drivernormative{General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Updating idx} +The driver MUST perform a suitable memory barrier before the field{idx} update, to ensure the +device sees the most up-to-date copy. + subsubsection{Notifying The Device}label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Notifying The Device} The actual method of device notification is bus-specific, but generally it can be expensive. So the device MAY suppress such notifications if it -doesn't need them. The driver has to be careful to expose the new field{idx} -value before checking if notifications are suppressed: the driver MAY notify -gratuitously, but MUST NOT to omit a required notification. So again, -the driver SHOULD use a memory barrier here before reading field{flags} or -field{avail_event}. - -If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated, and if the -VRING_USED_F_NOTIFY flag is not set, the driver SHOULD notify the -device. +doesn't need them, as detailed in section
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}. -If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, the driver reads -field{avail_event} in the available ring structure. If the -available field{idx} crossed field{avail_event} value since the -last notification, the driver SHOULD notify the device. field{avail_event} wraps naturally at 65536 as well, -giving the following algorithm for calculating whether a device needs -notification: +The driver has to be careful to expose the new field{idx} +value before checking if notifications are suppressed. -egin{lstlisting} -(u16)(new_idx - avail_event - 1) < (u16)(new_idx - old_idx) -end{lstlisting} +drivernormative{General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Updating idx} +The driver MUST perform a suitable memory barrier before reading field{flags} or +field{avail_event}, to avoid missing a required notification. subsection{Receiving Used Buffers From The Device}label{sec:General Initialization And Device Operation / Device Operation / Receiving Used Buffers From The Device} Once the device has used buffers referred to by a descriptor (read from or written to them, or parts of both, depending on the nature of the virtqueue and the -device), it SHOULD send an interrupt, following an algorithm very -similar to the algorithm used for the driver to send the device a -buffer: - -egin{enumerate} -item Write the head descriptor number to the next free entry - (pointed to by the used ring field{idx}) in the used - ring. - -item Update the used ring field{idx}. +device), it interrupts the driver as detailed in section
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}. -item Deliver an interrupt if necessary: - - egin{enumerate} - item If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated: - check if the VRING_AVAIL_F_NO_INTERRUPT flag is not set in - field{flags} in the available structure. - - item If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check - whether the used field{idx} crossed the field{used_event} value - since the last update. field{used_event} wraps naturally - at 65536 as well: -egin{lstlisting} -(u16)(new_idx - used_event - 1) < (u16)(new_idx - old_idx) -end{lstlisting} - end{enumerate} -end{enumerate} - -For each ring, the driver MAY then disable interrupts by writing -VRING_AVAIL_F_NO_INTERRUPT to field{flags} in available structure, if required. -Once it has processed the ring entries, it SHOULD re-enable -interrupts by clearing VRING_AVAIL_F_NO_INTERRUPT in field{flags} or updating -field{event_idx} in the available structure. The driver SHOULD then -execute a memory barrier, and then recheck the ring empty -condition. This is necessary to handle the case where after the -last check and before enabling interrupts, an interrupt has been -suppressed by the device: +egin{note} +For optimal performance, a driver MAY disable interrupts while processing +the used ring, but beware the problem of missing interrupts between +emptying the ring and reenabling interrupts. This is usually handled by +re-checking for more used buffers after interrups are re-enabled: egin{lstlisting} vring_disable_interrupts(vq); @@ -768,6 +809,7 @@ for (;;) { vq->last_seen_used++; } end{lstlisting} +end{note} subsection{Notification of Device Configuration Changes}label{sec:General Initialization And Device Operation / Device Operation / Notification of Device Configuration Changes} @@ -780,6 +822,8 @@ Once the driver has set the DRIVER_OK status bit, all the configured virtqueue of the device are considered live. None of the virtqueues of a device are live once the device has been reset. +drivernormative{sec:General Initialization And Device Operation / Device Cleanup} + A driver MUST NOT alter descriptor table entries which have been exposed in the available ring (and not marked consumed by the device in the used ring) of a live virtqueue. -- 1.8.3.2


  • 29.  [PATCH 17/18] block: separate normative and descriptive text.

    Posted 02-19-2014 06:45
    Signed-off-by: Rusty Russell <rusty@au1.ibm.com> --- content.tex 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/content.tex b/content.tex index 4b00e6b..c0a0412 100644 --- a/content.tex +++ b/content.tex @@ -3471,9 +3471,7 @@ native endian of the guest rather than (necessarily) little-endian. subsection{Device Initialization}label{sec:Device Types / Block Device / Device Initialization} egin{enumerate} -item The device size should be read from field{capacity}. - No requests should be submitted which goes - beyond this limit. +item The device size can be read from field{capacity}. item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, field{blk_size} can be read to determine the optimal sector size @@ -3546,8 +3544,24 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device: #define VIRTIO_BLK_S_UNSUPP 2 end{lstlisting} -Any writes completed before the submission of the flush command should -be committed to non-volatile storage by the device. +drivernormative{Device Types / Block Device / Device Operation} + +A driver MUST NOT submit a request which would cause a read or write +beyond field{capacity}. + +A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. + +A driver MUST set field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. +A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. + +devicenormative{Device Types / Block Device / Device Operation} + +A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR +for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT +write any data. + +Upon receipt of a VIRTIO_BLK_T_FLUSH request, the driver SHOULD ensure +that any writes which were completed are committed to non-volatile storage. subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} For legacy devices, the fields in struct virtio_blk_req are the -- 1.8.3.2