OASIS Virtual I/O Device (VIRTIO) TC

Expand all | Collapse all

[PATCH v3] pci: new configuration layout

  • 1.  [PATCH v3] pci: new configuration layout

    Posted 09-18-2013 09:04
    - split data path, common config and device specific config
    - support for new VQ layout

    VIRTIO-21

    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

    ---

    changes from v2:
    pci: add support for queue size negotiation
    Make it possible for guest to use a smaller queue size than the
    maximum with virtio over PCI (MMIO already has this ability).

    clarify that device status 0 resets the device

    changes from v1:
    minimal patchset,
    stripped all controversial changes away:
    endian-ness, framing, revision id, config based access.

    made some minor clarifications
    ---
    virtio-v1.0-wd01-part1-specification.txt | 331 ++++++++++++++++++++++++++++---
    1 file changed, 308 insertions(+), 23 deletions(-)

    diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    index b4298bb..e6e7eee 100644
    --- a/virtio-v1.0-wd01-part1-specification.txt
    +++ b/virtio-v1.0-wd01-part1-specification.txt
    @@ -684,9 +684,145 @@ for informational purposes by the guest).
    2.3.1.2. PCI Device Layout
    -------------------------

    -To configure the device, we use the first I/O region of the PCI
    -device. This contains a virtio header followed by a
    -device-specific region.
    +To configure the device,
    +use I/O and/or memory regions and/or PCI configuration space of the PCI device.
    +These contain the virtio header registers, the notification register, the
    +ISR status register and device specific registers, as specified by Virtio
    ++ Structure PCI Capabilities
    +
    +There may be different widths of accesses to the I/O region; the
    +“natural” access method for each field must be
    +used (i.e. 32-bit accesses for 32-bit fields, etc).
    +
    +PCI Device Configuration Layout includes the common configuration,
    +ISR, notification and device specific configuration
    +structures.
    +
    +Unless explicitly specified otherwise, all multi-byte fields are little-endian.
    +
    +2.3.1.2.1. Common configuration structure layout
    +-------------------------
    +Common configuration structure layout is documented below:
    +
    +struct virtio_pci_common_cfg {
    + /* About the whole device. */
    + __le32 device_feature_select; /* read-write */
    + __le32 device_feature; /* read-only */
    + __le32 guest_feature_select; /* read-write */
    + __le32 guest_feature; /* read-write */
    + __le16 msix_config; /* read-write */
    + __le16 num_queues; /* read-only */
    + __u8 device_status; /* read-write */
    + __u8 unused1;
    +
    + /* About a specific virtqueue. */
    + __le16 queue_select; /* read-write */
    + __le16 queue_size; /* read-write, power of 2, or 0. */
    + __le16 queue_msix_vector; /* read-write */
    + __le16 queue_enable; /* read-write */
    + __le16 queue_notify_off; /* read-only */
    + __le64 queue_desc; /* read-write */
    + __le64 queue_avail; /* read-write */
    + __le64 queue_used; /* read-write */
    +};
    +
    +device_feature_select
    +
    + Selects which Feature Bits does device_feature field refer to.
    + Value 0x0 selects Feature Bits 0 to 31
    + Value 0x1 selects Feature Bits 32 to 63
    + All other values cause reads from device_feature to return 0.
    +
    +device_feature
    +
    + Used by Device to report Feature Bits to Driver.
    + Device Feature Bits selected by device_feature_select.
    +
    +guest_feature_select
    +
    + Selects which Feature Bits does guest_feature field refer to.
    + Value 0x0 selects Feature Bits 0 to 31
    + Value 0x1 selects Feature Bits 32 to 63
    + All other values cause writes to guest_feature to be ignored,
    + and reads to return 0.
    +
    +guest_feature
    +
    + Used by Driver to acknowledge Feature Bits to Device.
    + Guest Feature Bits selected by guest_feature_select.
    +
    +msix_config
    +
    + Configuration Vector for MSI-X.
    +
    +num_queues
    +
    + Specifies the maximum number of virtqueues supported by device.
    +
    +device_status
    +
    + Device Status field. Writing 0 into this field resets the
    + device.
    +
    +queue_select
    +
    + Queue Select. Selects which virtqueue do other fields refer to.
    +
    +queue_size
    +
    + Queue Size. On reset, specifies the maximum queue size supported by
    + the hypervisor. This can be modified by driver to reduce memory requirements.
    + Set to 0 if this virtqueue is unused.
    +
    +queue_msix_vector
    +
    + Queue Vector for MSI-X.
    +
    +queue_enable
    +
    + Used to selectively prevent host from executing requests from this virtqueue.
    + 1 - enabled; 0 - disabled
    +
    +queue_notify_off
    +
    + Used to calculate the offset from start of Notification structure at
    + which this virtqueue is located.
    + Note: this is *not* an offset in bytes. See notify_off_multiplier below.
    +
    +queue_desc
    +
    + Physical address of Descriptor Table.
    +
    +queue_avail
    +
    + Physical address of Available Ring.
    +
    +queue_used
    +
    + Physical address of Used Ring.
    +
    +2.3.1.2.2. ISR status structure layout
    +-------------------------
    +ISR status structure includes a single 8-bite ISR status field
    +
    +2.3.1.2.3. Notification structure layout
    +-------------------------
    +Notification structure is always a multiple of 2 bytes in size.
    +It includes 2-byte Queue Notify fields for each virtqueue of
    +the device. Note that multiple virtqueues can use the same
    +Queue Notify field, if necessary.
    +
    +2.3.1.2.4. Device specific structure
    +-------------------------
    +
    +Device specific structure is optional.
    +
    +2.3.1.2.5. Legacy Interfaces: A Note on PCI Device Layout
    +-------------------------
    +
    +Transitional devices should present part of configuration
    +registers in a legacy configuration structure in BAR0 in the first I/O
    +region of the PCI device, as documented below.

    There may be different widths of accesses to the I/O region; the
    “natural” access method for each field in the virtio header must be
    @@ -699,10 +835,7 @@ Note that this is possible because while the virtio header is PCI
    the native endian of the guest (where such distinction is
    applicable).

    -2.3.1.2.1. PCI Device Virtio Header
    -----------------------------------
    -
    -The virtio header looks as follows:
    +When used through the legacy interface, the virtio header looks as follows:

    +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
    | Bits || 32 | 32 | 32 | 16 | 16 | 16 | 8 | 8 |
    @@ -741,25 +874,167 @@ device-specific headers:
    | || |
    +------------++--------------------+

    +Note that only Feature Bits 0 to 31 are accessible through the
    +Legacy Interface. When used through the Legacy Interface,
    +Transitional Devices must assume that Feature Bits 32 to 63
    +are not acknowledged by Driver.
    +
    2.3.1.3. PCI-specific Initialization And Device Operation
    --------------------------------------------------------

    -The page size for a virtqueue on a PCI virtio device is defined as
    -4096 bytes.
    -
    2.3.1.3.1. Device Initialization
    -------------------------------

    +This documents PCI-specific steps executed during Device Initialization.
    +As the first step, driver must detect device configuration layout
    +to locate configuration fields in memory,I/O or configuration space of the
    +device.
    +
    +100.100.1.3.1.1. Virtio Device Configuration Layout Detection
    +-------------------------------
    +
    +As a prerequisite to device initialization, driver executes a
    +PCI capability list scan, detecting virtio configuration layout using Virtio
    +Structure PCI capabilities.
    +
    +Virtio Device Configuration Layout includes virtio configuration header, Notification
    +and ISR Status and device configuration structures.
    +Each structure can be mapped by a Base Address register (BAR) belonging to
    +the function, located beginning at 10h in Configuration Space,
    +or accessed though PCI configuration space.
    +
    +Actual location of each structure is specified using vendor-specific PCI capability located
    +on capability list in PCI configuration space of the device.
    +This virtio structure capability uses little-endian format; all bits are
    +read-only:
    +
    +struct virtio_pci_cap {
    + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
    + __u8 cap_next; /* Generic PCI field: next ptr. */
    + __u8 cap_len; /* Generic PCI field: capability length */
    + __u8 cfg_type; /* Identifies the structure. */
    + __u8 bar; /* Where to find it. */
    + __u8 padding[3];/* Pad to full dword. */
    + __le32 offset; /* Offset within bar. */
    + __le32 length; /* Length of the structure, in bytes. */
    +};
    +
    +This structure can optionally followed by extra data, depending on
    +other fields, as documented below.
    +
    +The fields are interpreted as follows:
    +
    +cap_vndr
    + 0x09; Identifies a vendor-specific capability.
    +
    +cap_next
    + Link to next capability in the capability list in the configuration space.
    +
    +cap_len
    + Length of the capability structure, including the whole of
    + struct virtio_pci_cap, and extra data if any.
    + This length might include padding, or fields unused by the driver.
    +
    +cfg_type
    + identifies the structure, according to the following table.
    +
    + /* Common configuration */
    + #define VIRTIO_PCI_CAP_COMMON_CFG 1
    + /* Notifications */
    + #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
    + /* ISR Status */
    + #define VIRTIO_PCI_CAP_ISR_CFG 3
    + /* Device specific configuration */
    + #define VIRTIO_PCI_CAP_DEVICE_CFG 4
    +
    + Any other value - reserved for future use. Drivers must
    + ignore any vendor-specific capability structure which has
    + a reserved cfg_type value.
    +
    + More than one capability can identify the same structure - this makes it
    + possible for the device to expose multiple interfaces to drivers. The order of
    + the capabilities in the capability list specifies the order of preference
    + suggested by the device; drivers should use the first interface that they can
    + support. For example, on some hypervisors, notifications using IO accesses are
    + faster than memory accesses. In this case, hypervisor can expose two
    + capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG:
    + the first one addressing an I/O BAR, the second one addressing a memory BAR.
    + Driver will use the I/O BAR if I/O resources are available, and fall back on
    + memory BAR when I/O resources are unavailable.
    +
    +bar
    + values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
    + the function located beginning at 10h in Configuration Space
    + and used to map the structure into Memory or I/O Space.
    + The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
    + or I/O Space.
    +
    + Any other value - reserved for future use. Drivers must
    + ignore any vendor-specific capability structure which has
    + a reserved bar value.
    +
    +offset
    + indicates where the structure begins relative to the base address associated
    + with the BAR.
    +
    +length
    + indicates the length of the structure.
    + This size might include padding, or fields unused by the driver.
    + Drivers are also recommended to only map part of configuration structure
    + large enough for device operation.
    + For example, a future device might present a large structure size of several
    + MBytes.
    + As current devices never utilize structures larger than 4KBytes in size,
    + driver can limit the mapped structure size to e.g.
    + 4KBytes to allow forward compatibility with such devices without loss of
    + functionality and without wasting resources.
    +
    +
    +If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed
    +by additional fields:
    +
    +struct virtio_pci_notify_cap {
    + struct virtio_pci_cap cap;
    + __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
    +};
    +
    +notify_off_multiplier
    +
    + Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0.
    + Value 0x1 is reserved.
    + For a given virtqueue, the address to use for notifications is calculated as follows:
    +
    + queue_notify_off * notify_off_multiplier + offset
    +
    + If notify_off_multiplier is 0, all virtqueues use the same address in
    + the Notifications structure!
    +
    +
    +100.100.1.3.1.1. Legacy Interface: A Note on Device Layout Detection
    +-------------------------------
    +
    +Legacy drivers skipped Device Layout Detection step, assuming legacy
    +configuration space in BAR0 in I/O space unconditionally.
    +
    +Legacy devices did not have the Virtio PCI Capability in their
    +capability list.
    +
    +Therefore:
    +
    +Transitional devices should expose the Legacy Interface in I/O
    +space in BAR0.
    +
    +Transitional drivers should look for the Virtio PCI
    +Capabilities on the capability list.
    +If there are not present, driver should assume a legacy device.
    +
    2.3.1.3.1.1. Queue Vector Configuration
    --------------------------------------

    When MSI-X capability is present and enabled in the device
    -(through standard PCI configuration space) 4 bytes at byte offset
    -20 are used to map configuration change and queue interrupts to
    -MSI-X vectors. In this case, the ISR Status field is unused, and
    -device specific configuration starts at byte offset 24 in virtio
    -header structure. When MSI-X capability is not enabled, device
    -specific configuration starts at byte offset 20 in virtio header.
    +(through standard PCI configuration space) Configuration/Queue
    +MSI-X Vector registers are used to map configuration change and queue
    +interrupts to MSI-X vectors. In this case, the ISR Status is unused.

    Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
    Configuration/Queue Vector registers, maps interrupts triggered
    @@ -801,23 +1076,30 @@ This is done as follows, for each virtqueue a device has:
    always a power of 2. This controls how big the virtqueue is
    (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist.

    -3. Allocate and zero virtqueue in contiguous physical memory, on
    - a 4096 byte alignment. Write the physical address, divided by
    - 4096 to the Queue Address field.[6]
    +3. Optionally, select a smaller virtqueue size and write it in the Queue Size
    + field.
    +
    +4. Allocate and zero Descriptor Table, Available and Used rings for the
    + virtqueue in contiguous physical memory.

    -4. Optionally, if MSI-X capability is present and enabled on the
    +5. Optionally, if MSI-X capability is present and enabled on the
    device, select a vector to use to request interrupts triggered
    by virtqueue events. Write the MSI-X Table entry number
    corresponding to this vector in Queue Vector field. Read the
    Queue Vector field: on success, previously written value is
    returned; on failure, NO_VECTOR value is returned.

    +100.100.1.3.1.4.1. Legacy Interface: A Note on Virtqueue Configuration
    +-----------------------------------
    +When using the legacy interface, the page size for a virtqueue on a PCI virtio
    +device is defined as 4096 bytes. Driver writes the physical address, divided
    +by 4096 to the Queue Address field [6].
    +
    2.3.1.3.2. Notifying The Device
    ------------------------------

    Device notification occurs by writing the 16-bit virtqueue index
    -of this virtqueue to the Queue Notify field of the virtio header
    -in the first I/O region of the PCI device.
    +of this virtqueue to the Queue Notify field.

    2.3.1.3.3. Virtqueue Interrupts From The Device
    ----------------------------------------------
    @@ -2867,7 +3149,10 @@ the non-PCI implementations (currently lguest and S/390).
    This is only allowed if the driver does not use any features
    which would alter this early use of the device.

    -[5] ie. once you enable MSI-X on the device, the other fields move.
    +[5] When MSI-X capability is enabled, device specific configuration starts at
    +byte offset 24 in virtio header structure. When MSI-X capability is not
    +enabled, device specific configuration starts at byte offset 20 in virtio
    +header. ie. once you enable MSI-X on the device, the other fields move.
    If you turn it off again, they move back!

    [6] The 4096 is based on the x86 page size, but it's also large
    --
    MST



  • 2.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-19-2013 02:23
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > - split data path, common config and device specific config
    > - support for new VQ layout
    >
    > VIRTIO-21
    >
    > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    >
    > ---
    >
    > changes from v2:
    > pci: add support for queue size negotiation
    > Make it possible for guest to use a smaller queue size than the
    > maximum with virtio over PCI (MMIO already has this ability).
    >
    > clarify that device status 0 resets the device
    >
    > changes from v1:
    > minimal patchset,
    > stripped all controversial changes away:
    > endian-ness, framing, revision id, config based access.
    >
    > made some minor clarifications
    > ---
    > virtio-v1.0-wd01-part1-specification.txt | 331 ++++++++++++++++++++++++++++---
    > 1 file changed, 308 insertions(+), 23 deletions(-)
    >
    > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    > index b4298bb..e6e7eee 100644
    > --- a/virtio-v1.0-wd01-part1-specification.txt
    > +++ b/virtio-v1.0-wd01-part1-specification.txt
    > @@ -684,9 +684,145 @@ for informational purposes by the guest).
    > 2.3.1.2. PCI Device Layout
    > -------------------------
    >
    > -To configure the device, we use the first I/O region of the PCI
    > -device. This contains a virtio header followed by a
    > -device-specific region.
    > +To configure the device,
    > +use I/O and/or memory regions and/or PCI configuration space of the PCI device.
    > +These contain the virtio header registers, the notification register, the
    > +ISR status register and device specific registers, as specified by Virtio
    > ++ Structure PCI Capabilities
    > +
    > +There may be different widths of accesses to the I/O region; the
    > +“natural” access method for each field must be
    > +used (i.e. 32-bit accesses for 32-bit fields, etc).
    > +
    > +PCI Device Configuration Layout includes the common configuration,
    > +ISR, notification and device specific configuration
    > +structures.
    > +
    > +Unless explicitly specified otherwise, all multi-byte fields are little-endian.
    > +
    > +2.3.1.2.1. Common configuration structure layout
    > +-------------------------
    > +Common configuration structure layout is documented below:
    > +
    > +struct virtio_pci_common_cfg {
    > + /* About the whole device. */
    > + __le32 device_feature_select; /* read-write */
    > + __le32 device_feature; /* read-only */
    > + __le32 guest_feature_select; /* read-write */
    > + __le32 guest_feature; /* read-write */
    > + __le16 msix_config; /* read-write */
    > + __le16 num_queues; /* read-only */
    > + __u8 device_status; /* read-write */
    > + __u8 unused1;
    > +
    > + /* About a specific virtqueue. */
    > + __le16 queue_select; /* read-write */
    > + __le16 queue_size; /* read-write, power of 2, or 0. */
    > + __le16 queue_msix_vector; /* read-write */
    > + __le16 queue_enable; /* read-write */
    > + __le16 queue_notify_off; /* read-only */
    > + __le64 queue_desc; /* read-write */
    > + __le64 queue_avail; /* read-write */
    > + __le64 queue_used; /* read-write */
    > +};
    > +
    > +device_feature_select
    > +
    > + Selects which Feature Bits does device_feature field refer to.
    > + Value 0x0 selects Feature Bits 0 to 31
    > + Value 0x1 selects Feature Bits 32 to 63
    > + All other values cause reads from device_feature to return 0.
    > +
    > +device_feature
    > +
    > + Used by Device to report Feature Bits to Driver.
    > + Device Feature Bits selected by device_feature_select.
    > +
    > +guest_feature_select
    > +
    > + Selects which Feature Bits does guest_feature field refer to.
    > + Value 0x0 selects Feature Bits 0 to 31
    > + Value 0x1 selects Feature Bits 32 to 63
    > + All other values cause writes to guest_feature to be ignored,
    > + and reads to return 0.

    This isn't quite right. Zero writes to guest_feature should be ignored,
    non-zero writes (ie. acking a feature which wasn't offered) is undefined
    anywhere it occurs.

    We should also define what "locks in" feature negotiation: that was
    simple for single 32-bit field. Should we define this in a
    transport-independent way, or leave it to the transports?

    We could overload the DRIVER status bit (which Linux currently calls too
    early, though moving it would be harmless), or add a new one.

    > +guest_feature
    > +
    > + Used by Driver to acknowledge Feature Bits to Device.
    > + Guest Feature Bits selected by guest_feature_select.
    > +
    > +msix_config
    > +
    > + Configuration Vector for MSI-X.
    > +
    > +num_queues
    > +
    > + Specifies the maximum number of virtqueues supported by device.
    > +
    > +device_status
    > +
    > + Device Status field. Writing 0 into this field resets the
    > + device.
    > +
    > +queue_select
    > +
    > + Queue Select. Selects which virtqueue do other fields refer to.
    > +
    > +queue_size
    > +
    > + Queue Size. On reset, specifies the maximum queue size supported by
    > + the hypervisor. This can be modified by driver to reduce memory requirements.
    > + Set to 0 if this virtqueue is unused.
    > +
    > +queue_msix_vector
    > +
    > + Queue Vector for MSI-X.
    > +
    > +queue_enable
    > +
    > + Used to selectively prevent host from executing requests from this virtqueue.
    > + 1 - enabled; 0 - disabled
    > +
    > +queue_notify_off
    > +
    > + Used to calculate the offset from start of Notification structure at
    > + which this virtqueue is located.
    > + Note: this is *not* an offset in bytes. See notify_off_multiplier below.
    > +
    > +queue_desc
    > +
    > + Physical address of Descriptor Table.
    > +
    > +queue_avail
    > +
    > + Physical address of Available Ring.
    > +
    > +queue_used
    > +
    > + Physical address of Used Ring.
    > +
    > +2.3.1.2.2. ISR status structure layout
    > +-------------------------
    > +ISR status structure includes a single 8-bite ISR status field
    > +
    > +2.3.1.2.3. Notification structure layout
    > +-------------------------
    > +Notification structure is always a multiple of 2 bytes in size.
    > +It includes 2-byte Queue Notify fields for each virtqueue of
    > +the device. Note that multiple virtqueues can use the same
    > +Queue Notify field, if necessary.
    > +
    > +2.3.1.2.4. Device specific structure
    > +-------------------------
    > +
    > +Device specific structure is optional.
    > +
    > +2.3.1.2.5. Legacy Interfaces: A Note on PCI Device Layout
    > +-------------------------
    > +
    > +Transitional devices should present part of configuration
    > +registers in a legacy configuration structure in BAR0 in the first I/O
    > +region of the PCI device, as documented below.
    >
    > There may be different widths of accesses to the I/O region; the
    > “natural” access method for each field in the virtio header must be
    > @@ -699,10 +835,7 @@ Note that this is possible because while the virtio header is PCI
    > the native endian of the guest (where such distinction is
    > applicable).
    >
    > -2.3.1.2.1. PCI Device Virtio Header
    > -----------------------------------
    > -
    > -The virtio header looks as follows:
    > +When used through the legacy interface, the virtio header looks as follows:
    >
    > +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
    > | Bits || 32 | 32 | 32 | 16 | 16 | 16 | 8 | 8 |
    > @@ -741,25 +874,167 @@ device-specific headers:
    > | || |
    > +------------++--------------------+
    >
    > +Note that only Feature Bits 0 to 31 are accessible through the
    > +Legacy Interface. When used through the Legacy Interface,
    > +Transitional Devices must assume that Feature Bits 32 to 63
    > +are not acknowledged by Driver.
    > +
    > 2.3.1.3. PCI-specific Initialization And Device Operation
    > --------------------------------------------------------
    >
    > -The page size for a virtqueue on a PCI virtio device is defined as
    > -4096 bytes.
    > -
    > 2.3.1.3.1. Device Initialization
    > -------------------------------
    >
    > +This documents PCI-specific steps executed during Device Initialization.
    > +As the first step, driver must detect device configuration layout
    > +to locate configuration fields in memory,I/O or configuration space of the
    > +device.
    > +
    > +100.100.1.3.1.1. Virtio Device Configuration Layout Detection
    > +-------------------------------
    > +
    > +As a prerequisite to device initialization, driver executes a
    > +PCI capability list scan, detecting virtio configuration layout using Virtio
    > +Structure PCI capabilities.
    > +
    > +Virtio Device Configuration Layout includes virtio configuration header, Notification
    > +and ISR Status and device configuration structures.
    > +Each structure can be mapped by a Base Address register (BAR) belonging to
    > +the function, located beginning at 10h in Configuration Space,
    > +or accessed though PCI configuration space.
    > +
    > +Actual location of each structure is specified using vendor-specific PCI capability located
    > +on capability list in PCI configuration space of the device.
    > +This virtio structure capability uses little-endian format; all bits are
    > +read-only:
    > +
    > +struct virtio_pci_cap {
    > + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
    > + __u8 cap_next; /* Generic PCI field: next ptr. */
    > + __u8 cap_len; /* Generic PCI field: capability length */
    > + __u8 cfg_type; /* Identifies the structure. */
    > + __u8 bar; /* Where to find it. */
    > + __u8 padding[3];/* Pad to full dword. */
    > + __le32 offset; /* Offset within bar. */
    > + __le32 length; /* Length of the structure, in bytes. */
    > +};
    > +
    > +This structure can optionally followed by extra data, depending on
    > +other fields, as documented below.
    > +
    > +The fields are interpreted as follows:
    > +
    > +cap_vndr
    > + 0x09; Identifies a vendor-specific capability.
    > +
    > +cap_next
    > + Link to next capability in the capability list in the configuration space.
    > +
    > +cap_len
    > + Length of the capability structure, including the whole of
    > + struct virtio_pci_cap, and extra data if any.
    > + This length might include padding, or fields unused by the driver.
    > +
    > +cfg_type
    > + identifies the structure, according to the following table.
    > +
    > + /* Common configuration */
    > + #define VIRTIO_PCI_CAP_COMMON_CFG 1
    > + /* Notifications */
    > + #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
    > + /* ISR Status */
    > + #define VIRTIO_PCI_CAP_ISR_CFG 3
    > + /* Device specific configuration */
    > + #define VIRTIO_PCI_CAP_DEVICE_CFG 4
    > +
    > + Any other value - reserved for future use. Drivers must
    > + ignore any vendor-specific capability structure which has
    > + a reserved cfg_type value.
    > +
    > + More than one capability can identify the same structure - this makes it
    > + possible for the device to expose multiple interfaces to drivers. The order of
    > + the capabilities in the capability list specifies the order of preference
    > + suggested by the device; drivers should use the first interface that they can
    > + support. For example, on some hypervisors, notifications using IO accesses are
    > + faster than memory accesses. In this case, hypervisor can expose two
    > + capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG:
    > + the first one addressing an I/O BAR, the second one addressing a memory BAR.
    > + Driver will use the I/O BAR if I/O resources are available, and fall back on
    > + memory BAR when I/O resources are unavailable.
    > +
    > +bar
    > + values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
    > + the function located beginning at 10h in Configuration Space
    > + and used to map the structure into Memory or I/O Space.
    > + The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
    > + or I/O Space.
    > +
    > + Any other value - reserved for future use. Drivers must
    > + ignore any vendor-specific capability structure which has
    > + a reserved bar value.
    > +
    > +offset
    > + indicates where the structure begins relative to the base address associated
    > + with the BAR.
    > +
    > +length
    > + indicates the length of the structure.
    > + This size might include padding, or fields unused by the driver.
    > + Drivers are also recommended to only map part of configuration structure
    > + large enough for device operation.
    > + For example, a future device might present a large structure size of several
    > + MBytes.
    > + As current devices never utilize structures larger than 4KBytes in size,
    > + driver can limit the mapped structure size to e.g.
    > + 4KBytes to allow forward compatibility with such devices without loss of
    > + functionality and without wasting resources.
    > +
    > +
    > +If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed
    > +by additional fields:
    > +
    > +struct virtio_pci_notify_cap {
    > + struct virtio_pci_cap cap;
    > + __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
    > +};
    > +
    > +notify_off_multiplier
    > +
    > + Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0.
    > + Value 0x1 is reserved.
    > + For a given virtqueue, the address to use for notifications is calculated as follows:
    > +
    > + queue_notify_off * notify_off_multiplier + offset
    > +
    > + If notify_off_multiplier is 0, all virtqueues use the same address in
    > + the Notifications structure!
    > +
    > +
    > +100.100.1.3.1.1. Legacy Interface: A Note on Device Layout Detection
    > +-------------------------------
    > +
    > +Legacy drivers skipped Device Layout Detection step, assuming legacy
    > +configuration space in BAR0 in I/O space unconditionally.
    > +
    > +Legacy devices did not have the Virtio PCI Capability in their
    > +capability list.
    > +
    > +Therefore:
    > +
    > +Transitional devices should expose the Legacy Interface in I/O
    > +space in BAR0.
    > +
    > +Transitional drivers should look for the Virtio PCI
    > +Capabilities on the capability list.
    > +If there are not present, driver should assume a legacy device.
    > +
    > 2.3.1.3.1.1. Queue Vector Configuration
    > --------------------------------------
    >
    > When MSI-X capability is present and enabled in the device
    > -(through standard PCI configuration space) 4 bytes at byte offset
    > -20 are used to map configuration change and queue interrupts to
    > -MSI-X vectors. In this case, the ISR Status field is unused, and
    > -device specific configuration starts at byte offset 24 in virtio
    > -header structure. When MSI-X capability is not enabled, device
    > -specific configuration starts at byte offset 20 in virtio header.
    > +(through standard PCI configuration space) Configuration/Queue
    > +MSI-X Vector registers are used to map configuration change and queue
    > +interrupts to MSI-X vectors. In this case, the ISR Status is unused.
    >
    > Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
    > Configuration/Queue Vector registers, maps interrupts triggered
    > @@ -801,23 +1076,30 @@ This is done as follows, for each virtqueue a device has:
    > always a power of 2. This controls how big the virtqueue is
    > (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist.
    >
    > -3. Allocate and zero virtqueue in contiguous physical memory, on
    > - a 4096 byte alignment. Write the physical address, divided by
    > - 4096 to the Queue Address field.[6]
    > +3. Optionally, select a smaller virtqueue size and write it in the Queue Size
    > + field.
    > +
    > +4. Allocate and zero Descriptor Table, Available and Used rings for the
    > + virtqueue in contiguous physical memory.
    >
    > -4. Optionally, if MSI-X capability is present and enabled on the
    > +5. Optionally, if MSI-X capability is present and enabled on the
    > device, select a vector to use to request interrupts triggered
    > by virtqueue events. Write the MSI-X Table entry number
    > corresponding to this vector in Queue Vector field. Read the
    > Queue Vector field: on success, previously written value is
    > returned; on failure, NO_VECTOR value is returned.
    >
    > +100.100.1.3.1.4.1. Legacy Interface: A Note on Virtqueue Configuration
    > +-----------------------------------
    > +When using the legacy interface, the page size for a virtqueue on a PCI virtio
    > +device is defined as 4096 bytes. Driver writes the physical address, divided
    > +by 4096 to the Queue Address field [6].
    > +
    > 2.3.1.3.2. Notifying The Device
    > ------------------------------
    >
    > Device notification occurs by writing the 16-bit virtqueue index
    > -of this virtqueue to the Queue Notify field of the virtio header
    > -in the first I/O region of the PCI device.
    > +of this virtqueue to the Queue Notify field.
    >
    > 2.3.1.3.3. Virtqueue Interrupts From The Device
    > ----------------------------------------------
    > @@ -2867,7 +3149,10 @@ the non-PCI implementations (currently lguest and S/390).
    > This is only allowed if the driver does not use any features
    > which would alter this early use of the device.
    >
    > -[5] ie. once you enable MSI-X on the device, the other fields move.
    > +[5] When MSI-X capability is enabled, device specific configuration starts at
    > +byte offset 24 in virtio header structure. When MSI-X capability is not
    > +enabled, device specific configuration starts at byte offset 20 in virtio
    > +header. ie. once you enable MSI-X on the device, the other fields move.
    > If you turn it off again, they move back!

    Minor nit: my general approach has been to move the contents of legacy
    footnotes into the appropriate section itself.

    This all looks pretty sane.

    Cheers,
    Rusty.




  • 3.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-19-2013 03:36
    "Michael S. Tsirkin" <mst@redhat.com> writes: > - split data path, common config and device specific config > - support for new VQ layout > > VIRTIO-21 > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > changes from v2: > pci: add support for queue size negotiation > Make it possible for guest to use a smaller queue size than the > maximum with virtio over PCI (MMIO already has this ability). > > clarify that device status 0 resets the device > > changes from v1: > minimal patchset, > stripped all controversial changes away: > endian-ness, framing, revision id, config based access. > > made some minor clarifications > --- > virtio-v1.0-wd01-part1-specification.txt 331 ++++++++++++++++++++++++++++--- > 1 file changed, 308 insertions(+), 23 deletions(-) > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > index b4298bb..e6e7eee 100644 > --- a/virtio-v1.0-wd01-part1-specification.txt > +++ b/virtio-v1.0-wd01-part1-specification.txt > @@ -684,9 +684,145 @@ for informational purposes by the guest). > 2.3.1.2. PCI Device Layout > ------------------------- > > -To configure the device, we use the first I/O region of the PCI > -device. This contains a virtio header followed by a > -device-specific region. > +To configure the device, > +use I/O and/or memory regions and/or PCI configuration space of the PCI device. > +These contain the virtio header registers, the notification register, the > +ISR status register and device specific registers, as specified by Virtio > ++ Structure PCI Capabilities > + > +There may be different widths of accesses to the I/O region; the > +“natural” access method for each field must be > +used (i.e. 32-bit accesses for 32-bit fields, etc). > + > +PCI Device Configuration Layout includes the common configuration, > +ISR, notification and device specific configuration > +structures. > + > +Unless explicitly specified otherwise, all multi-byte fields are little-endian. > + > +2.3.1.2.1. Common configuration structure layout > +------------------------- > +Common configuration structure layout is documented below: > + > +struct virtio_pci_common_cfg { > + /* About the whole device. */ > + __le32 device_feature_select; /* read-write */ > + __le32 device_feature; /* read-only */ > + __le32 guest_feature_select; /* read-write */ > + __le32 guest_feature; /* read-write */ > + __le16 msix_config; /* read-write */ > + __le16 num_queues; /* read-only */ > + __u8 device_status; /* read-write */ > + __u8 unused1; > + > + /* About a specific virtqueue. */ > + __le16 queue_select; /* read-write */ > + __le16 queue_size; /* read-write, power of 2, or 0. */ > + __le16 queue_msix_vector; /* read-write */ > + __le16 queue_enable; /* read-write */ > + __le16 queue_notify_off; /* read-only */ > + __le64 queue_desc; /* read-write */ > + __le64 queue_avail; /* read-write */ > + __le64 queue_used; /* read-write */ > +}; > + > +device_feature_select > + > + Selects which Feature Bits does device_feature field refer to. > + Value 0x0 selects Feature Bits 0 to 31 > + Value 0x1 selects Feature Bits 32 to 63 > + All other values cause reads from device_feature to return 0. > + > +device_feature > + > + Used by Device to report Feature Bits to Driver. > + Device Feature Bits selected by device_feature_select. > + > +guest_feature_select > + > + Selects which Feature Bits does guest_feature field refer to. > + Value 0x0 selects Feature Bits 0 to 31 > + Value 0x1 selects Feature Bits 32 to 63 > + All other values cause writes to guest_feature to be ignored, > + and reads to return 0. This isn't quite right. Zero writes to guest_feature should be ignored, non-zero writes (ie. acking a feature which wasn't offered) is undefined anywhere it occurs. We should also define what "locks in" feature negotiation: that was simple for single 32-bit field. Should we define this in a transport-independent way, or leave it to the transports? We could overload the DRIVER status bit (which Linux currently calls too early, though moving it would be harmless), or add a new one. > +guest_feature > + > + Used by Driver to acknowledge Feature Bits to Device. > + Guest Feature Bits selected by guest_feature_select. > + > +msix_config > + > + Configuration Vector for MSI-X. > + > +num_queues > + > + Specifies the maximum number of virtqueues supported by device. > + > +device_status > + > + Device Status field. Writing 0 into this field resets the > + device. > + > +queue_select > + > + Queue Select. Selects which virtqueue do other fields refer to. > + > +queue_size > + > + Queue Size. On reset, specifies the maximum queue size supported by > + the hypervisor. This can be modified by driver to reduce memory requirements. > + Set to 0 if this virtqueue is unused. > + > +queue_msix_vector > + > + Queue Vector for MSI-X. > + > +queue_enable > + > + Used to selectively prevent host from executing requests from this virtqueue. > + 1 - enabled; 0 - disabled > + > +queue_notify_off > + > + Used to calculate the offset from start of Notification structure at > + which this virtqueue is located. > + Note: this is *not* an offset in bytes. See notify_off_multiplier below. > + > +queue_desc > + > + Physical address of Descriptor Table. > + > +queue_avail > + > + Physical address of Available Ring. > + > +queue_used > + > + Physical address of Used Ring. > + > +2.3.1.2.2. ISR status structure layout > +------------------------- > +ISR status structure includes a single 8-bite ISR status field > + > +2.3.1.2.3. Notification structure layout > +------------------------- > +Notification structure is always a multiple of 2 bytes in size. > +It includes 2-byte Queue Notify fields for each virtqueue of > +the device. Note that multiple virtqueues can use the same > +Queue Notify field, if necessary. > + > +2.3.1.2.4. Device specific structure > +------------------------- > + > +Device specific structure is optional. > + > +2.3.1.2.5. Legacy Interfaces: A Note on PCI Device Layout > +------------------------- > + > +Transitional devices should present part of configuration > +registers in a legacy configuration structure in BAR0 in the first I/O > +region of the PCI device, as documented below. > > There may be different widths of accesses to the I/O region; the > “natural” access method for each field in the virtio header must be > @@ -699,10 +835,7 @@ Note that this is possible because while the virtio header is PCI > the native endian of the guest (where such distinction is > applicable). > > -2.3.1.2.1. PCI Device Virtio Header > ----------------------------------- > - > -The virtio header looks as follows: > +When used through the legacy interface, the virtio header looks as follows: > > +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+ > Bits 32 32 32 16 16 16 8 8 > @@ -741,25 +874,167 @@ device-specific headers: > > +------------++--------------------+ > > +Note that only Feature Bits 0 to 31 are accessible through the > +Legacy Interface. When used through the Legacy Interface, > +Transitional Devices must assume that Feature Bits 32 to 63 > +are not acknowledged by Driver. > + > 2.3.1.3. PCI-specific Initialization And Device Operation > -------------------------------------------------------- > > -The page size for a virtqueue on a PCI virtio device is defined as > -4096 bytes. > - > 2.3.1.3.1. Device Initialization > ------------------------------- > > +This documents PCI-specific steps executed during Device Initialization. > +As the first step, driver must detect device configuration layout > +to locate configuration fields in memory,I/O or configuration space of the > +device. > + > +100.100.1.3.1.1. Virtio Device Configuration Layout Detection > +------------------------------- > + > +As a prerequisite to device initialization, driver executes a > +PCI capability list scan, detecting virtio configuration layout using Virtio > +Structure PCI capabilities. > + > +Virtio Device Configuration Layout includes virtio configuration header, Notification > +and ISR Status and device configuration structures. > +Each structure can be mapped by a Base Address register (BAR) belonging to > +the function, located beginning at 10h in Configuration Space, > +or accessed though PCI configuration space. > + > +Actual location of each structure is specified using vendor-specific PCI capability located > +on capability list in PCI configuration space of the device. > +This virtio structure capability uses little-endian format; all bits are > +read-only: > + > +struct virtio_pci_cap { > + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ > + __u8 cap_next; /* Generic PCI field: next ptr. */ > + __u8 cap_len; /* Generic PCI field: capability length */ > + __u8 cfg_type; /* Identifies the structure. */ > + __u8 bar; /* Where to find it. */ > + __u8 padding[3];/* Pad to full dword. */ > + __le32 offset; /* Offset within bar. */ > + __le32 length; /* Length of the structure, in bytes. */ > +}; > + > +This structure can optionally followed by extra data, depending on > +other fields, as documented below. > + > +The fields are interpreted as follows: > + > +cap_vndr > + 0x09; Identifies a vendor-specific capability. > + > +cap_next > + Link to next capability in the capability list in the configuration space. > + > +cap_len > + Length of the capability structure, including the whole of > + struct virtio_pci_cap, and extra data if any. > + This length might include padding, or fields unused by the driver. > + > +cfg_type > + identifies the structure, according to the following table. > + > + /* Common configuration */ > + #define VIRTIO_PCI_CAP_COMMON_CFG 1 > + /* Notifications */ > + #define VIRTIO_PCI_CAP_NOTIFY_CFG 2 > + /* ISR Status */ > + #define VIRTIO_PCI_CAP_ISR_CFG 3 > + /* Device specific configuration */ > + #define VIRTIO_PCI_CAP_DEVICE_CFG 4 > + > + Any other value - reserved for future use. Drivers must > + ignore any vendor-specific capability structure which has > + a reserved cfg_type value. > + > + More than one capability can identify the same structure - this makes it > + possible for the device to expose multiple interfaces to drivers. The order of > + the capabilities in the capability list specifies the order of preference > + suggested by the device; drivers should use the first interface that they can > + support. For example, on some hypervisors, notifications using IO accesses are > + faster than memory accesses. In this case, hypervisor can expose two > + capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG: > + the first one addressing an I/O BAR, the second one addressing a memory BAR. > + Driver will use the I/O BAR if I/O resources are available, and fall back on > + memory BAR when I/O resources are unavailable. > + > +bar > + values 0x0 to 0x5 specify a Base Address register (BAR) belonging to > + the function located beginning at 10h in Configuration Space > + and used to map the structure into Memory or I/O Space. > + The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space > + or I/O Space. > + > + Any other value - reserved for future use. Drivers must > + ignore any vendor-specific capability structure which has > + a reserved bar value. > + > +offset > + indicates where the structure begins relative to the base address associated > + with the BAR. > + > +length > + indicates the length of the structure. > + This size might include padding, or fields unused by the driver. > + Drivers are also recommended to only map part of configuration structure > + large enough for device operation. > + For example, a future device might present a large structure size of several > + MBytes. > + As current devices never utilize structures larger than 4KBytes in size, > + driver can limit the mapped structure size to e.g. > + 4KBytes to allow forward compatibility with such devices without loss of > + functionality and without wasting resources. > + > + > +If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed > +by additional fields: > + > +struct virtio_pci_notify_cap { > + struct virtio_pci_cap cap; > + __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */ > +}; > + > +notify_off_multiplier > + > + Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0. > + Value 0x1 is reserved. > + For a given virtqueue, the address to use for notifications is calculated as follows: > + > + queue_notify_off * notify_off_multiplier + offset > + > + If notify_off_multiplier is 0, all virtqueues use the same address in > + the Notifications structure! > + > + > +100.100.1.3.1.1. Legacy Interface: A Note on Device Layout Detection > +------------------------------- > + > +Legacy drivers skipped Device Layout Detection step, assuming legacy > +configuration space in BAR0 in I/O space unconditionally. > + > +Legacy devices did not have the Virtio PCI Capability in their > +capability list. > + > +Therefore: > + > +Transitional devices should expose the Legacy Interface in I/O > +space in BAR0. > + > +Transitional drivers should look for the Virtio PCI > +Capabilities on the capability list. > +If there are not present, driver should assume a legacy device. > + > 2.3.1.3.1.1. Queue Vector Configuration > -------------------------------------- > > When MSI-X capability is present and enabled in the device > -(through standard PCI configuration space) 4 bytes at byte offset > -20 are used to map configuration change and queue interrupts to > -MSI-X vectors. In this case, the ISR Status field is unused, and > -device specific configuration starts at byte offset 24 in virtio > -header structure. When MSI-X capability is not enabled, device > -specific configuration starts at byte offset 20 in virtio header. > +(through standard PCI configuration space) Configuration/Queue > +MSI-X Vector registers are used to map configuration change and queue > +interrupts to MSI-X vectors. In this case, the ISR Status is unused. > > Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of > Configuration/Queue Vector registers, maps interrupts triggered > @@ -801,23 +1076,30 @@ This is done as follows, for each virtqueue a device has: > always a power of 2. This controls how big the virtqueue is > (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist. > > -3. Allocate and zero virtqueue in contiguous physical memory, on > - a 4096 byte alignment. Write the physical address, divided by > - 4096 to the Queue Address field.[6] > +3. Optionally, select a smaller virtqueue size and write it in the Queue Size > + field. > + > +4. Allocate and zero Descriptor Table, Available and Used rings for the > + virtqueue in contiguous physical memory. > > -4. Optionally, if MSI-X capability is present and enabled on the > +5. Optionally, if MSI-X capability is present and enabled on the > device, select a vector to use to request interrupts triggered > by virtqueue events. Write the MSI-X Table entry number > corresponding to this vector in Queue Vector field. Read the > Queue Vector field: on success, previously written value is > returned; on failure, NO_VECTOR value is returned. > > +100.100.1.3.1.4.1. Legacy Interface: A Note on Virtqueue Configuration > +----------------------------------- > +When using the legacy interface, the page size for a virtqueue on a PCI virtio > +device is defined as 4096 bytes. Driver writes the physical address, divided > +by 4096 to the Queue Address field [6]. > + > 2.3.1.3.2. Notifying The Device > ------------------------------ > > Device notification occurs by writing the 16-bit virtqueue index > -of this virtqueue to the Queue Notify field of the virtio header > -in the first I/O region of the PCI device. > +of this virtqueue to the Queue Notify field. > > 2.3.1.3.3. Virtqueue Interrupts From The Device > ---------------------------------------------- > @@ -2867,7 +3149,10 @@ the non-PCI implementations (currently lguest and S/390). > This is only allowed if the driver does not use any features > which would alter this early use of the device. > > -[5] ie. once you enable MSI-X on the device, the other fields move. > +[5] When MSI-X capability is enabled, device specific configuration starts at > +byte offset 24 in virtio header structure. When MSI-X capability is not > +enabled, device specific configuration starts at byte offset 20 in virtio > +header. ie. once you enable MSI-X on the device, the other fields move. > If you turn it off again, they move back! Minor nit: my general approach has been to move the contents of legacy footnotes into the appropriate section itself. This all looks pretty sane. Cheers, Rusty.


  • 4.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-21-2013 20:43
    On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > - split data path, common config and device specific config > > - support for new VQ layout > > > > VIRTIO-21 > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > --- > > > > changes from v2: > > pci: add support for queue size negotiation > > Make it possible for guest to use a smaller queue size than the > > maximum with virtio over PCI (MMIO already has this ability). > > > > clarify that device status 0 resets the device > > > > changes from v1: > > minimal patchset, > > stripped all controversial changes away: > > endian-ness, framing, revision id, config based access. > > > > made some minor clarifications > > --- > > virtio-v1.0-wd01-part1-specification.txt 331 ++++++++++++++++++++++++++++--- > > 1 file changed, 308 insertions(+), 23 deletions(-) > > > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > > index b4298bb..e6e7eee 100644 > > --- a/virtio-v1.0-wd01-part1-specification.txt > > +++ b/virtio-v1.0-wd01-part1-specification.txt > > @@ -684,9 +684,145 @@ for informational purposes by the guest). > > 2.3.1.2. PCI Device Layout > > ------------------------- > > > > -To configure the device, we use the first I/O region of the PCI > > -device. This contains a virtio header followed by a > > -device-specific region. > > +To configure the device, > > +use I/O and/or memory regions and/or PCI configuration space of the PCI device. > > +These contain the virtio header registers, the notification register, the > > +ISR status register and device specific registers, as specified by Virtio > > ++ Structure PCI Capabilities > > + > > +There may be different widths of accesses to the I/O region; the > > +“natural” access method for each field must be > > +used (i.e. 32-bit accesses for 32-bit fields, etc). > > + > > +PCI Device Configuration Layout includes the common configuration, > > +ISR, notification and device specific configuration > > +structures. > > + > > +Unless explicitly specified otherwise, all multi-byte fields are little-endian. > > + > > +2.3.1.2.1. Common configuration structure layout > > +------------------------- > > +Common configuration structure layout is documented below: > > + > > +struct virtio_pci_common_cfg { > > + /* About the whole device. */ > > + __le32 device_feature_select; /* read-write */ > > + __le32 device_feature; /* read-only */ > > + __le32 guest_feature_select; /* read-write */ > > + __le32 guest_feature; /* read-write */ > > + __le16 msix_config; /* read-write */ > > + __le16 num_queues; /* read-only */ > > + __u8 device_status; /* read-write */ > > + __u8 unused1; > > + > > + /* About a specific virtqueue. */ > > + __le16 queue_select; /* read-write */ > > + __le16 queue_size; /* read-write, power of 2, or 0. */ > > + __le16 queue_msix_vector; /* read-write */ > > + __le16 queue_enable; /* read-write */ > > + __le16 queue_notify_off; /* read-only */ > > + __le64 queue_desc; /* read-write */ > > + __le64 queue_avail; /* read-write */ > > + __le64 queue_used; /* read-write */ > > +}; > > + > > +device_feature_select > > + > > + Selects which Feature Bits does device_feature field refer to. > > + Value 0x0 selects Feature Bits 0 to 31 > > + Value 0x1 selects Feature Bits 32 to 63 > > + All other values cause reads from device_feature to return 0. > > + > > +device_feature > > + > > + Used by Device to report Feature Bits to Driver. > > + Device Feature Bits selected by device_feature_select. > > + > > +guest_feature_select > > + > > + Selects which Feature Bits does guest_feature field refer to. > > + Value 0x0 selects Feature Bits 0 to 31 > > + Value 0x1 selects Feature Bits 32 to 63 > > + All other values cause writes to guest_feature to be ignored, > > + and reads to return 0. > > This isn't quite right. Zero writes to guest_feature should be ignored, > non-zero writes (ie. acking a feature which wasn't offered) is undefined > anywhere it occurs. Yes I meant Zero writes. That's a general rule: drivers must not ack features that devices did not offer. I'll make this clear here. > We should also define what "locks in" feature negotiation: that was > simple for single 32-bit field. Should we define this in a > transport-independent way, or leave it to the transports? > > We could overload the DRIVER status bit (which Linux currently calls too > early, though moving it would be harmless), or add a new one. I'm surprised. I always read 2.2.1. Device Initialization as an explicit requirement that DRIVER_OK locks the features, and that's in a transport-independent section and works for existing guests. And if that's not explicit enough, would the proposed resolution for VIRTIO-30 make it explicit enough? > > +guest_feature > > + > > + Used by Driver to acknowledge Feature Bits to Device. > > + Guest Feature Bits selected by guest_feature_select. > > + > > +msix_config > > + > > + Configuration Vector for MSI-X. > > + > > +num_queues > > + > > + Specifies the maximum number of virtqueues supported by device. > > + > > +device_status > > + > > + Device Status field. Writing 0 into this field resets the > > + device. > > + > > +queue_select > > + > > + Queue Select. Selects which virtqueue do other fields refer to. > > + > > +queue_size > > + > > + Queue Size. On reset, specifies the maximum queue size supported by > > + the hypervisor. This can be modified by driver to reduce memory requirements. > > + Set to 0 if this virtqueue is unused. > > + > > +queue_msix_vector > > + > > + Queue Vector for MSI-X. > > + > > +queue_enable > > + > > + Used to selectively prevent host from executing requests from this virtqueue. > > + 1 - enabled; 0 - disabled > > + > > +queue_notify_off > > + > > + Used to calculate the offset from start of Notification structure at > > + which this virtqueue is located. > > + Note: this is *not* an offset in bytes. See notify_off_multiplier below. > > + > > +queue_desc > > + > > + Physical address of Descriptor Table. > > + > > +queue_avail > > + > > + Physical address of Available Ring. > > + > > +queue_used > > + > > + Physical address of Used Ring. > > + > > +2.3.1.2.2. ISR status structure layout > > +------------------------- > > +ISR status structure includes a single 8-bite ISR status field > > + > > +2.3.1.2.3. Notification structure layout > > +------------------------- > > +Notification structure is always a multiple of 2 bytes in size. > > +It includes 2-byte Queue Notify fields for each virtqueue of > > +the device. Note that multiple virtqueues can use the same > > +Queue Notify field, if necessary. > > + > > +2.3.1.2.4. Device specific structure > > +------------------------- > > + > > +Device specific structure is optional. > > + > > +2.3.1.2.5. Legacy Interfaces: A Note on PCI Device Layout > > +------------------------- > > + > > +Transitional devices should present part of configuration > > +registers in a legacy configuration structure in BAR0 in the first I/O > > +region of the PCI device, as documented below. > > > > There may be different widths of accesses to the I/O region; the > > “natural” access method for each field in the virtio header must be > > @@ -699,10 +835,7 @@ Note that this is possible because while the virtio header is PCI > > the native endian of the guest (where such distinction is > > applicable). > > > > -2.3.1.2.1. PCI Device Virtio Header > > ----------------------------------- > > - > > -The virtio header looks as follows: > > +When used through the legacy interface, the virtio header looks as follows: > > > > +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+ > > Bits 32 32 32 16 16 16 8 8 > > @@ -741,25 +874,167 @@ device-specific headers: > > > > +------------++--------------------+ > > > > +Note that only Feature Bits 0 to 31 are accessible through the > > +Legacy Interface. When used through the Legacy Interface, > > +Transitional Devices must assume that Feature Bits 32 to 63 > > +are not acknowledged by Driver. > > + > > 2.3.1.3. PCI-specific Initialization And Device Operation > > -------------------------------------------------------- > > > > -The page size for a virtqueue on a PCI virtio device is defined as > > -4096 bytes. > > - > > 2.3.1.3.1. Device Initialization > > ------------------------------- > > > > +This documents PCI-specific steps executed during Device Initialization. > > +As the first step, driver must detect device configuration layout > > +to locate configuration fields in memory,I/O or configuration space of the > > +device. > > + > > +100.100.1.3.1.1. Virtio Device Configuration Layout Detection > > +------------------------------- > > + > > +As a prerequisite to device initialization, driver executes a > > +PCI capability list scan, detecting virtio configuration layout using Virtio > > +Structure PCI capabilities. > > + > > +Virtio Device Configuration Layout includes virtio configuration header, Notification > > +and ISR Status and device configuration structures. > > +Each structure can be mapped by a Base Address register (BAR) belonging to > > +the function, located beginning at 10h in Configuration Space, > > +or accessed though PCI configuration space. > > + > > +Actual location of each structure is specified using vendor-specific PCI capability located > > +on capability list in PCI configuration space of the device. > > +This virtio structure capability uses little-endian format; all bits are > > +read-only: > > + > > +struct virtio_pci_cap { > > + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ > > + __u8 cap_next; /* Generic PCI field: next ptr. */ > > + __u8 cap_len; /* Generic PCI field: capability length */ > > + __u8 cfg_type; /* Identifies the structure. */ > > + __u8 bar; /* Where to find it. */ > > + __u8 padding[3];/* Pad to full dword. */ > > + __le32 offset; /* Offset within bar. */ > > + __le32 length; /* Length of the structure, in bytes. */ > > +}; > > + > > +This structure can optionally followed by extra data, depending on > > +other fields, as documented below. > > + > > +The fields are interpreted as follows: > > + > > +cap_vndr > > + 0x09; Identifies a vendor-specific capability. > > + > > +cap_next > > + Link to next capability in the capability list in the configuration space. > > + > > +cap_len > > + Length of the capability structure, including the whole of > > + struct virtio_pci_cap, and extra data if any. > > + This length might include padding, or fields unused by the driver. > > + > > +cfg_type > > + identifies the structure, according to the following table. > > + > > + /* Common configuration */ > > + #define VIRTIO_PCI_CAP_COMMON_CFG 1 > > + /* Notifications */ > > + #define VIRTIO_PCI_CAP_NOTIFY_CFG 2 > > + /* ISR Status */ > > + #define VIRTIO_PCI_CAP_ISR_CFG 3 > > + /* Device specific configuration */ > > + #define VIRTIO_PCI_CAP_DEVICE_CFG 4 > > + > > + Any other value - reserved for future use. Drivers must > > + ignore any vendor-specific capability structure which has > > + a reserved cfg_type value. > > + > > + More than one capability can identify the same structure - this makes it > > + possible for the device to expose multiple interfaces to drivers. The order of > > + the capabilities in the capability list specifies the order of preference > > + suggested by the device; drivers should use the first interface that they can > > + support. For example, on some hypervisors, notifications using IO accesses are > > + faster than memory accesses. In this case, hypervisor can expose two > > + capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG: > > + the first one addressing an I/O BAR, the second one addressing a memory BAR. > > + Driver will use the I/O BAR if I/O resources are available, and fall back on > > + memory BAR when I/O resources are unavailable. > > + > > +bar > > + values 0x0 to 0x5 specify a Base Address register (BAR) belonging to > > + the function located beginning at 10h in Configuration Space > > + and used to map the structure into Memory or I/O Space. > > + The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space > > + or I/O Space. > > + > > + Any other value - reserved for future use. Drivers must > > + ignore any vendor-specific capability structure which has > > + a reserved bar value. > > + > > +offset > > + indicates where the structure begins relative to the base address associated > > + with the BAR. > > + > > +length > > + indicates the length of the structure. > > + This size might include padding, or fields unused by the driver. > > + Drivers are also recommended to only map part of configuration structure > > + large enough for device operation. > > + For example, a future device might present a large structure size of several > > + MBytes. > > + As current devices never utilize structures larger than 4KBytes in size, > > + driver can limit the mapped structure size to e.g. > > + 4KBytes to allow forward compatibility with such devices without loss of > > + functionality and without wasting resources. > > + > > + > > +If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed > > +by additional fields: > > + > > +struct virtio_pci_notify_cap { > > + struct virtio_pci_cap cap; > > + __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */ > > +}; > > + > > +notify_off_multiplier > > + > > + Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0. > > + Value 0x1 is reserved. > > + For a given virtqueue, the address to use for notifications is calculated as follows: > > + > > + queue_notify_off * notify_off_multiplier + offset > > + > > + If notify_off_multiplier is 0, all virtqueues use the same address in > > + the Notifications structure! > > + > > + > > +100.100.1.3.1.1. Legacy Interface: A Note on Device Layout Detection > > +------------------------------- > > + > > +Legacy drivers skipped Device Layout Detection step, assuming legacy > > +configuration space in BAR0 in I/O space unconditionally. > > + > > +Legacy devices did not have the Virtio PCI Capability in their > > +capability list. > > + > > +Therefore: > > + > > +Transitional devices should expose the Legacy Interface in I/O > > +space in BAR0. > > + > > +Transitional drivers should look for the Virtio PCI > > +Capabilities on the capability list. > > +If there are not present, driver should assume a legacy device. > > + > > 2.3.1.3.1.1. Queue Vector Configuration > > -------------------------------------- > > > > When MSI-X capability is present and enabled in the device > > -(through standard PCI configuration space) 4 bytes at byte offset > > -20 are used to map configuration change and queue interrupts to > > -MSI-X vectors. In this case, the ISR Status field is unused, and > > -device specific configuration starts at byte offset 24 in virtio > > -header structure. When MSI-X capability is not enabled, device > > -specific configuration starts at byte offset 20 in virtio header. > > +(through standard PCI configuration space) Configuration/Queue > > +MSI-X Vector registers are used to map configuration change and queue > > +interrupts to MSI-X vectors. In this case, the ISR Status is unused. > > > > Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of > > Configuration/Queue Vector registers, maps interrupts triggered > > @@ -801,23 +1076,30 @@ This is done as follows, for each virtqueue a device has: > > always a power of 2. This controls how big the virtqueue is > > (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist. > > > > -3. Allocate and zero virtqueue in contiguous physical memory, on > > - a 4096 byte alignment. Write the physical address, divided by > > - 4096 to the Queue Address field.[6] > > +3. Optionally, select a smaller virtqueue size and write it in the Queue Size > > + field. > > + > > +4. Allocate and zero Descriptor Table, Available and Used rings for the > > + virtqueue in contiguous physical memory. > > > > -4. Optionally, if MSI-X capability is present and enabled on the > > +5. Optionally, if MSI-X capability is present and enabled on the > > device, select a vector to use to request interrupts triggered > > by virtqueue events. Write the MSI-X Table entry number > > corresponding to this vector in Queue Vector field. Read the > > Queue Vector field: on success, previously written value is > > returned; on failure, NO_VECTOR value is returned. > > > > +100.100.1.3.1.4.1. Legacy Interface: A Note on Virtqueue Configuration > > +----------------------------------- > > +When using the legacy interface, the page size for a virtqueue on a PCI virtio > > +device is defined as 4096 bytes. Driver writes the physical address, divided > > +by 4096 to the Queue Address field [6]. > > + > > 2.3.1.3.2. Notifying The Device > > ------------------------------ > > > > Device notification occurs by writing the 16-bit virtqueue index > > -of this virtqueue to the Queue Notify field of the virtio header > > -in the first I/O region of the PCI device. > > +of this virtqueue to the Queue Notify field. > > > > 2.3.1.3.3. Virtqueue Interrupts From The Device > > ---------------------------------------------- > > @@ -2867,7 +3149,10 @@ the non-PCI implementations (currently lguest and S/390). > > This is only allowed if the driver does not use any features > > which would alter this early use of the device. > > > > -[5] ie. once you enable MSI-X on the device, the other fields move. > > +[5] When MSI-X capability is enabled, device specific configuration starts at > > +byte offset 24 in virtio header structure. When MSI-X capability is not > > +enabled, device specific configuration starts at byte offset 20 in virtio > > +header. ie. once you enable MSI-X on the device, the other fields move. > > If you turn it off again, they move back! > > Minor nit: my general approach has been to move the contents of legacy > footnotes into the appropriate section itself. > > This all looks pretty sane. > > Cheers, > Rusty.


  • 5.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-21-2013 20:46
    On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote:
    > "Michael S. Tsirkin" <mst@redhat.com> writes:
    > > - split data path, common config and device specific config
    > > - support for new VQ layout
    > >
    > > VIRTIO-21
    > >
    > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    > >
    > > ---
    > >
    > > changes from v2:
    > > pci: add support for queue size negotiation
    > > Make it possible for guest to use a smaller queue size than the
    > > maximum with virtio over PCI (MMIO already has this ability).
    > >
    > > clarify that device status 0 resets the device
    > >
    > > changes from v1:
    > > minimal patchset,
    > > stripped all controversial changes away:
    > > endian-ness, framing, revision id, config based access.
    > >
    > > made some minor clarifications
    > > ---
    > > virtio-v1.0-wd01-part1-specification.txt | 331 ++++++++++++++++++++++++++++---
    > > 1 file changed, 308 insertions(+), 23 deletions(-)
    > >
    > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
    > > index b4298bb..e6e7eee 100644
    > > --- a/virtio-v1.0-wd01-part1-specification.txt
    > > +++ b/virtio-v1.0-wd01-part1-specification.txt
    > > @@ -684,9 +684,145 @@ for informational purposes by the guest).
    > > 2.3.1.2. PCI Device Layout
    > > -------------------------
    > >
    > > -To configure the device, we use the first I/O region of the PCI
    > > -device. This contains a virtio header followed by a
    > > -device-specific region.
    > > +To configure the device,
    > > +use I/O and/or memory regions and/or PCI configuration space of the PCI device.
    > > +These contain the virtio header registers, the notification register, the
    > > +ISR status register and device specific registers, as specified by Virtio
    > > ++ Structure PCI Capabilities
    > > +
    > > +There may be different widths of accesses to the I/O region; the
    > > +“natural” access method for each field must be
    > > +used (i.e. 32-bit accesses for 32-bit fields, etc).
    > > +
    > > +PCI Device Configuration Layout includes the common configuration,
    > > +ISR, notification and device specific configuration
    > > +structures.
    > > +
    > > +Unless explicitly specified otherwise, all multi-byte fields are little-endian.
    > > +
    > > +2.3.1.2.1. Common configuration structure layout
    > > +-------------------------
    > > +Common configuration structure layout is documented below:
    > > +
    > > +struct virtio_pci_common_cfg {
    > > + /* About the whole device. */
    > > + __le32 device_feature_select; /* read-write */
    > > + __le32 device_feature; /* read-only */
    > > + __le32 guest_feature_select; /* read-write */
    > > + __le32 guest_feature; /* read-write */
    > > + __le16 msix_config; /* read-write */
    > > + __le16 num_queues; /* read-only */
    > > + __u8 device_status; /* read-write */
    > > + __u8 unused1;
    > > +
    > > + /* About a specific virtqueue. */
    > > + __le16 queue_select; /* read-write */
    > > + __le16 queue_size; /* read-write, power of 2, or 0. */
    > > + __le16 queue_msix_vector; /* read-write */
    > > + __le16 queue_enable; /* read-write */
    > > + __le16 queue_notify_off; /* read-only */
    > > + __le64 queue_desc; /* read-write */
    > > + __le64 queue_avail; /* read-write */
    > > + __le64 queue_used; /* read-write */
    > > +};
    > > +
    > > +device_feature_select
    > > +
    > > + Selects which Feature Bits does device_feature field refer to.
    > > + Value 0x0 selects Feature Bits 0 to 31
    > > + Value 0x1 selects Feature Bits 32 to 63
    > > + All other values cause reads from device_feature to return 0.
    > > +
    > > +device_feature
    > > +
    > > + Used by Device to report Feature Bits to Driver.
    > > + Device Feature Bits selected by device_feature_select.
    > > +
    > > +guest_feature_select
    > > +
    > > + Selects which Feature Bits does guest_feature field refer to.
    > > + Value 0x0 selects Feature Bits 0 to 31
    > > + Value 0x1 selects Feature Bits 32 to 63
    > > + All other values cause writes to guest_feature to be ignored,
    > > + and reads to return 0.
    >
    > This isn't quite right. Zero writes to guest_feature should be ignored,
    > non-zero writes (ie. acking a feature which wasn't offered) is undefined
    > anywhere it occurs.

    Yes I meant Zero writes.
    That's a general rule: drivers must not ack
    features that devices did not offer.
    I'll make this clear here.

    > We should also define what "locks in" feature negotiation: that was
    > simple for single 32-bit field. Should we define this in a
    > transport-independent way, or leave it to the transports?
    >
    > We could overload the DRIVER status bit (which Linux currently calls too
    > early, though moving it would be harmless), or add a new one.

    I'm surprised.
    I always read 2.2.1. Device Initialization as
    an explicit requirement that DRIVER_OK locks the features,
    and that's in a transport-independent section and
    works for existing guests.

    And if that's not explicit enough, would the proposed
    resolution for VIRTIO-30 make it explicit enough?

    > > +guest_feature
    > > +
    > > + Used by Driver to acknowledge Feature Bits to Device.
    > > + Guest Feature Bits selected by guest_feature_select.
    > > +
    > > +msix_config
    > > +
    > > + Configuration Vector for MSI-X.
    > > +
    > > +num_queues
    > > +
    > > + Specifies the maximum number of virtqueues supported by device.
    > > +
    > > +device_status
    > > +
    > > + Device Status field. Writing 0 into this field resets the
    > > + device.
    > > +
    > > +queue_select
    > > +
    > > + Queue Select. Selects which virtqueue do other fields refer to.
    > > +
    > > +queue_size
    > > +
    > > + Queue Size. On reset, specifies the maximum queue size supported by
    > > + the hypervisor. This can be modified by driver to reduce memory requirements.
    > > + Set to 0 if this virtqueue is unused.
    > > +
    > > +queue_msix_vector
    > > +
    > > + Queue Vector for MSI-X.
    > > +
    > > +queue_enable
    > > +
    > > + Used to selectively prevent host from executing requests from this virtqueue.
    > > + 1 - enabled; 0 - disabled
    > > +
    > > +queue_notify_off
    > > +
    > > + Used to calculate the offset from start of Notification structure at
    > > + which this virtqueue is located.
    > > + Note: this is *not* an offset in bytes. See notify_off_multiplier below.
    > > +
    > > +queue_desc
    > > +
    > > + Physical address of Descriptor Table.
    > > +
    > > +queue_avail
    > > +
    > > + Physical address of Available Ring.
    > > +
    > > +queue_used
    > > +
    > > + Physical address of Used Ring.
    > > +
    > > +2.3.1.2.2. ISR status structure layout
    > > +-------------------------
    > > +ISR status structure includes a single 8-bite ISR status field
    > > +
    > > +2.3.1.2.3. Notification structure layout
    > > +-------------------------
    > > +Notification structure is always a multiple of 2 bytes in size.
    > > +It includes 2-byte Queue Notify fields for each virtqueue of
    > > +the device. Note that multiple virtqueues can use the same
    > > +Queue Notify field, if necessary.
    > > +
    > > +2.3.1.2.4. Device specific structure
    > > +-------------------------
    > > +
    > > +Device specific structure is optional.
    > > +
    > > +2.3.1.2.5. Legacy Interfaces: A Note on PCI Device Layout
    > > +-------------------------
    > > +
    > > +Transitional devices should present part of configuration
    > > +registers in a legacy configuration structure in BAR0 in the first I/O
    > > +region of the PCI device, as documented below.
    > >
    > > There may be different widths of accesses to the I/O region; the
    > > “natural” access method for each field in the virtio header must be
    > > @@ -699,10 +835,7 @@ Note that this is possible because while the virtio header is PCI
    > > the native endian of the guest (where such distinction is
    > > applicable).
    > >
    > > -2.3.1.2.1. PCI Device Virtio Header
    > > -----------------------------------
    > > -
    > > -The virtio header looks as follows:
    > > +When used through the legacy interface, the virtio header looks as follows:
    > >
    > > +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
    > > | Bits || 32 | 32 | 32 | 16 | 16 | 16 | 8 | 8 |
    > > @@ -741,25 +874,167 @@ device-specific headers:
    > > | || |
    > > +------------++--------------------+
    > >
    > > +Note that only Feature Bits 0 to 31 are accessible through the
    > > +Legacy Interface. When used through the Legacy Interface,
    > > +Transitional Devices must assume that Feature Bits 32 to 63
    > > +are not acknowledged by Driver.
    > > +
    > > 2.3.1.3. PCI-specific Initialization And Device Operation
    > > --------------------------------------------------------
    > >
    > > -The page size for a virtqueue on a PCI virtio device is defined as
    > > -4096 bytes.
    > > -
    > > 2.3.1.3.1. Device Initialization
    > > -------------------------------
    > >
    > > +This documents PCI-specific steps executed during Device Initialization.
    > > +As the first step, driver must detect device configuration layout
    > > +to locate configuration fields in memory,I/O or configuration space of the
    > > +device.
    > > +
    > > +100.100.1.3.1.1. Virtio Device Configuration Layout Detection
    > > +-------------------------------
    > > +
    > > +As a prerequisite to device initialization, driver executes a
    > > +PCI capability list scan, detecting virtio configuration layout using Virtio
    > > +Structure PCI capabilities.
    > > +
    > > +Virtio Device Configuration Layout includes virtio configuration header, Notification
    > > +and ISR Status and device configuration structures.
    > > +Each structure can be mapped by a Base Address register (BAR) belonging to
    > > +the function, located beginning at 10h in Configuration Space,
    > > +or accessed though PCI configuration space.
    > > +
    > > +Actual location of each structure is specified using vendor-specific PCI capability located
    > > +on capability list in PCI configuration space of the device.
    > > +This virtio structure capability uses little-endian format; all bits are
    > > +read-only:
    > > +
    > > +struct virtio_pci_cap {
    > > + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
    > > + __u8 cap_next; /* Generic PCI field: next ptr. */
    > > + __u8 cap_len; /* Generic PCI field: capability length */
    > > + __u8 cfg_type; /* Identifies the structure. */
    > > + __u8 bar; /* Where to find it. */
    > > + __u8 padding[3];/* Pad to full dword. */
    > > + __le32 offset; /* Offset within bar. */
    > > + __le32 length; /* Length of the structure, in bytes. */
    > > +};
    > > +
    > > +This structure can optionally followed by extra data, depending on
    > > +other fields, as documented below.
    > > +
    > > +The fields are interpreted as follows:
    > > +
    > > +cap_vndr
    > > + 0x09; Identifies a vendor-specific capability.
    > > +
    > > +cap_next
    > > + Link to next capability in the capability list in the configuration space.
    > > +
    > > +cap_len
    > > + Length of the capability structure, including the whole of
    > > + struct virtio_pci_cap, and extra data if any.
    > > + This length might include padding, or fields unused by the driver.
    > > +
    > > +cfg_type
    > > + identifies the structure, according to the following table.
    > > +
    > > + /* Common configuration */
    > > + #define VIRTIO_PCI_CAP_COMMON_CFG 1
    > > + /* Notifications */
    > > + #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
    > > + /* ISR Status */
    > > + #define VIRTIO_PCI_CAP_ISR_CFG 3
    > > + /* Device specific configuration */
    > > + #define VIRTIO_PCI_CAP_DEVICE_CFG 4
    > > +
    > > + Any other value - reserved for future use. Drivers must
    > > + ignore any vendor-specific capability structure which has
    > > + a reserved cfg_type value.
    > > +
    > > + More than one capability can identify the same structure - this makes it
    > > + possible for the device to expose multiple interfaces to drivers. The order of
    > > + the capabilities in the capability list specifies the order of preference
    > > + suggested by the device; drivers should use the first interface that they can
    > > + support. For example, on some hypervisors, notifications using IO accesses are
    > > + faster than memory accesses. In this case, hypervisor can expose two
    > > + capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG:
    > > + the first one addressing an I/O BAR, the second one addressing a memory BAR.
    > > + Driver will use the I/O BAR if I/O resources are available, and fall back on
    > > + memory BAR when I/O resources are unavailable.
    > > +
    > > +bar
    > > + values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
    > > + the function located beginning at 10h in Configuration Space
    > > + and used to map the structure into Memory or I/O Space.
    > > + The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
    > > + or I/O Space.
    > > +
    > > + Any other value - reserved for future use. Drivers must
    > > + ignore any vendor-specific capability structure which has
    > > + a reserved bar value.
    > > +
    > > +offset
    > > + indicates where the structure begins relative to the base address associated
    > > + with the BAR.
    > > +
    > > +length
    > > + indicates the length of the structure.
    > > + This size might include padding, or fields unused by the driver.
    > > + Drivers are also recommended to only map part of configuration structure
    > > + large enough for device operation.
    > > + For example, a future device might present a large structure size of several
    > > + MBytes.
    > > + As current devices never utilize structures larger than 4KBytes in size,
    > > + driver can limit the mapped structure size to e.g.
    > > + 4KBytes to allow forward compatibility with such devices without loss of
    > > + functionality and without wasting resources.
    > > +
    > > +
    > > +If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed
    > > +by additional fields:
    > > +
    > > +struct virtio_pci_notify_cap {
    > > + struct virtio_pci_cap cap;
    > > + __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
    > > +};
    > > +
    > > +notify_off_multiplier
    > > +
    > > + Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0.
    > > + Value 0x1 is reserved.
    > > + For a given virtqueue, the address to use for notifications is calculated as follows:
    > > +
    > > + queue_notify_off * notify_off_multiplier + offset
    > > +
    > > + If notify_off_multiplier is 0, all virtqueues use the same address in
    > > + the Notifications structure!
    > > +
    > > +
    > > +100.100.1.3.1.1. Legacy Interface: A Note on Device Layout Detection
    > > +-------------------------------
    > > +
    > > +Legacy drivers skipped Device Layout Detection step, assuming legacy
    > > +configuration space in BAR0 in I/O space unconditionally.
    > > +
    > > +Legacy devices did not have the Virtio PCI Capability in their
    > > +capability list.
    > > +
    > > +Therefore:
    > > +
    > > +Transitional devices should expose the Legacy Interface in I/O
    > > +space in BAR0.
    > > +
    > > +Transitional drivers should look for the Virtio PCI
    > > +Capabilities on the capability list.
    > > +If there are not present, driver should assume a legacy device.
    > > +
    > > 2.3.1.3.1.1. Queue Vector Configuration
    > > --------------------------------------
    > >
    > > When MSI-X capability is present and enabled in the device
    > > -(through standard PCI configuration space) 4 bytes at byte offset
    > > -20 are used to map configuration change and queue interrupts to
    > > -MSI-X vectors. In this case, the ISR Status field is unused, and
    > > -device specific configuration starts at byte offset 24 in virtio
    > > -header structure. When MSI-X capability is not enabled, device
    > > -specific configuration starts at byte offset 20 in virtio header.
    > > +(through standard PCI configuration space) Configuration/Queue
    > > +MSI-X Vector registers are used to map configuration change and queue
    > > +interrupts to MSI-X vectors. In this case, the ISR Status is unused.
    > >
    > > Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
    > > Configuration/Queue Vector registers, maps interrupts triggered
    > > @@ -801,23 +1076,30 @@ This is done as follows, for each virtqueue a device has:
    > > always a power of 2. This controls how big the virtqueue is
    > > (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist.
    > >
    > > -3. Allocate and zero virtqueue in contiguous physical memory, on
    > > - a 4096 byte alignment. Write the physical address, divided by
    > > - 4096 to the Queue Address field.[6]
    > > +3. Optionally, select a smaller virtqueue size and write it in the Queue Size
    > > + field.
    > > +
    > > +4. Allocate and zero Descriptor Table, Available and Used rings for the
    > > + virtqueue in contiguous physical memory.
    > >
    > > -4. Optionally, if MSI-X capability is present and enabled on the
    > > +5. Optionally, if MSI-X capability is present and enabled on the
    > > device, select a vector to use to request interrupts triggered
    > > by virtqueue events. Write the MSI-X Table entry number
    > > corresponding to this vector in Queue Vector field. Read the
    > > Queue Vector field: on success, previously written value is
    > > returned; on failure, NO_VECTOR value is returned.
    > >
    > > +100.100.1.3.1.4.1. Legacy Interface: A Note on Virtqueue Configuration
    > > +-----------------------------------
    > > +When using the legacy interface, the page size for a virtqueue on a PCI virtio
    > > +device is defined as 4096 bytes. Driver writes the physical address, divided
    > > +by 4096 to the Queue Address field [6].
    > > +
    > > 2.3.1.3.2. Notifying The Device
    > > ------------------------------
    > >
    > > Device notification occurs by writing the 16-bit virtqueue index
    > > -of this virtqueue to the Queue Notify field of the virtio header
    > > -in the first I/O region of the PCI device.
    > > +of this virtqueue to the Queue Notify field.
    > >
    > > 2.3.1.3.3. Virtqueue Interrupts From The Device
    > > ----------------------------------------------
    > > @@ -2867,7 +3149,10 @@ the non-PCI implementations (currently lguest and S/390).
    > > This is only allowed if the driver does not use any features
    > > which would alter this early use of the device.
    > >
    > > -[5] ie. once you enable MSI-X on the device, the other fields move.
    > > +[5] When MSI-X capability is enabled, device specific configuration starts at
    > > +byte offset 24 in virtio header structure. When MSI-X capability is not
    > > +enabled, device specific configuration starts at byte offset 20 in virtio
    > > +header. ie. once you enable MSI-X on the device, the other fields move.
    > > If you turn it off again, they move back!
    >
    > Minor nit: my general approach has been to move the contents of legacy
    > footnotes into the appropriate section itself.
    >
    > This all looks pretty sane.
    >
    > Cheers,
    > Rusty.



  • 6.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 01:27
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote:
    >> We should also define what "locks in" feature negotiation: that was
    >> simple for single 32-bit field. Should we define this in a
    >> transport-independent way, or leave it to the transports?
    >>
    >> We could overload the DRIVER status bit (which Linux currently calls too
    >> early, though moving it would be harmless), or add a new one.
    >
    > I'm surprised.
    > I always read 2.2.1. Device Initialization as
    > an explicit requirement that DRIVER_OK locks the features,
    > and that's in a transport-independent section and
    > works for existing guests.
    >
    > And if that's not explicit enough, would the proposed
    > resolution for VIRTIO-30 make it explicit enough?

    Linux violates this, as it makes the device live *then* sets DRIVER_OK.

    This means we can use the device (and *do*, for virtio_blk partition
    scanning) before finalizing features.

    We could become compliant by setting DRIVER_OK before calling the
    driver, but that's be misleading: we'd not expect DRIVER_OK if the
    driver init failed.

    So I think we will need a new bit, FEATURES_OK?

    Cheers,
    Rusty.




  • 7.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 01:31
    "Michael S. Tsirkin" <mst@redhat.com> writes: > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: >> We should also define what "locks in" feature negotiation: that was >> simple for single 32-bit field. Should we define this in a >> transport-independent way, or leave it to the transports? >> >> We could overload the DRIVER status bit (which Linux currently calls too >> early, though moving it would be harmless), or add a new one. > > I'm surprised. > I always read 2.2.1. Device Initialization as > an explicit requirement that DRIVER_OK locks the features, > and that's in a transport-independent section and > works for existing guests. > > And if that's not explicit enough, would the proposed > resolution for VIRTIO-30 make it explicit enough? Linux violates this, as it makes the device live *then* sets DRIVER_OK. This means we can use the device (and *do*, for virtio_blk partition scanning) before finalizing features. We could become compliant by setting DRIVER_OK before calling the driver, but that's be misleading: we'd not expect DRIVER_OK if the driver init failed. So I think we will need a new bit, FEATURES_OK? Cheers, Rusty.


  • 8.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 06:24
    On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: > >> We should also define what "locks in" feature negotiation: that was > >> simple for single 32-bit field. Should we define this in a > >> transport-independent way, or leave it to the transports? > >> > >> We could overload the DRIVER status bit (which Linux currently calls too > >> early, though moving it would be harmless), or add a new one. > > > > I'm surprised. > > I always read 2.2.1. Device Initialization as > > an explicit requirement that DRIVER_OK locks the features, > > and that's in a transport-independent section and > > works for existing guests. > > > > And if that's not explicit enough, would the proposed > > resolution for VIRTIO-30 make it explicit enough? > > Linux violates this, as it makes the device live *then* sets DRIVER_OK. Interesting. Actually, e.g. net drivers do this, too: they pre-fill the RX VQs. > This means we can use the device (and *do*, for virtio_blk partition > scanning What does this refer to? virtblk_getgeo? That only seems to poke at config space, not VQs. >) before finalizing features. > > We could become compliant by setting DRIVER_OK before calling the > driver, but that's be misleading: we'd not expect DRIVER_OK if the > driver init failed. > So I think we will need a new bit, FEATURES_OK? > > Cheers, > Rusty. But what's the meaning of DRIVER_OK then? Maybe it means "device can use VQs"? And if device does not use VQs, does it matter that driver can add buffers there? If not maybe we can let driver play with features all it wants. Also all this is really part of discussion for VIRTIO-30 I think? It's not really related to the layout - the issue is there in old layout as well. Correct? -- MST


  • 9.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 06:27
    On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote:
    > "Michael S. Tsirkin" <mst@redhat.com> writes:
    > > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote:
    > >> We should also define what "locks in" feature negotiation: that was
    > >> simple for single 32-bit field. Should we define this in a
    > >> transport-independent way, or leave it to the transports?
    > >>
    > >> We could overload the DRIVER status bit (which Linux currently calls too
    > >> early, though moving it would be harmless), or add a new one.
    > >
    > > I'm surprised.
    > > I always read 2.2.1. Device Initialization as
    > > an explicit requirement that DRIVER_OK locks the features,
    > > and that's in a transport-independent section and
    > > works for existing guests.
    > >
    > > And if that's not explicit enough, would the proposed
    > > resolution for VIRTIO-30 make it explicit enough?
    >
    > Linux violates this, as it makes the device live *then* sets DRIVER_OK.

    Interesting.
    Actually, e.g. net drivers do this, too: they pre-fill the RX VQs.

    > This means we can use the device (and *do*, for virtio_blk partition
    > scanning

    What does this refer to? virtblk_getgeo?
    That only seems to poke at config space, not VQs.

    >) before finalizing features.
    >
    > We could become compliant by setting DRIVER_OK before calling the
    > driver, but that's be misleading: we'd not expect DRIVER_OK if the
    > driver init failed.
    > So I think we will need a new bit, FEATURES_OK?
    >
    > Cheers,
    > Rusty.

    But what's the meaning of DRIVER_OK then?
    Maybe it means "device can use VQs"?

    And if device does not use VQs, does it matter that
    driver can add buffers there?
    If not maybe we can let driver play with features all
    it wants.

    Also all this is really part of discussion for VIRTIO-30 I think?
    It's not really related to the layout - the issue is there
    in old layout as well.
    Correct?

    --
    MST



  • 10.  Re: [virtio] Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 10:33
    Il 23/09/2013 08:26, Michael S. Tsirkin ha scritto: > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >>> On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: >>>> We should also define what "locks in" feature negotiation: that was >>>> simple for single 32-bit field. Should we define this in a >>>> transport-independent way, or leave it to the transports? >>>> >>>> We could overload the DRIVER status bit (which Linux currently calls too >>>> early, though moving it would be harmless), or add a new one. >>> >>> I'm surprised. >>> I always read 2.2.1. Device Initialization as >>> an explicit requirement that DRIVER_OK locks the features, >>> and that's in a transport-independent section and >>> works for existing guests. >>> >>> And if that's not explicit enough, would the proposed >>> resolution for VIRTIO-30 make it explicit enough? >> >> Linux violates this, as it makes the device live *then* sets DRIVER_OK. > > Interesting. > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs. > >> This means we can use the device (and *do*, for virtio_blk partition >> scanning > > What does this refer to? virtblk_getgeo? It's done by add_disk (in block/genhd.c). Paolo > That only seems to poke at config space, not VQs. > >> ) before finalizing features. >> >> We could become compliant by setting DRIVER_OK before calling the >> driver, but that's be misleading: we'd not expect DRIVER_OK if the >> driver init failed. >> So I think we will need a new bit, FEATURES_OK? >> >> Cheers, >> Rusty. > > But what's the meaning of DRIVER_OK then? > Maybe it means "device can use VQs"? > > And if device does not use VQs, does it matter that > driver can add buffers there? > If not maybe we can let driver play with features all > it wants. > > Also all this is really part of discussion for VIRTIO-30 I think? > It's not really related to the layout - the issue is there > in old layout as well. > Correct? >


  • 11.  Re: [virtio] Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 11:06
    On Mon, Sep 23, 2013 at 12:32:48PM +0200, Paolo Bonzini wrote: > Il 23/09/2013 08:26, Michael S. Tsirkin ha scritto: > > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >>> On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: > >>>> We should also define what "locks in" feature negotiation: that was > >>>> simple for single 32-bit field. Should we define this in a > >>>> transport-independent way, or leave it to the transports? > >>>> > >>>> We could overload the DRIVER status bit (which Linux currently calls too > >>>> early, though moving it would be harmless), or add a new one. > >>> > >>> I'm surprised. > >>> I always read 2.2.1. Device Initialization as > >>> an explicit requirement that DRIVER_OK locks the features, > >>> and that's in a transport-independent section and > >>> works for existing guests. > >>> > >>> And if that's not explicit enough, would the proposed > >>> resolution for VIRTIO-30 make it explicit enough? > >> > >> Linux violates this, as it makes the device live *then* sets DRIVER_OK. > > > > Interesting. > > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs. > > > >> This means we can use the device (and *do*, for virtio_blk partition > >> scanning > > > > What does this refer to? virtblk_getgeo? > > It's done by add_disk (in block/genhd.c). > > Paolo Yes, that's true. Same with most drivers: they expose device to linux as the last step. And there's no clean way to solve this, really. I think devices should buffer kicks received before driver_ok and execute them later. > > That only seems to poke at config space, not VQs. > > > >> ) before finalizing features. > >> > >> We could become compliant by setting DRIVER_OK before calling the > >> driver, but that's be misleading: we'd not expect DRIVER_OK if the > >> driver init failed. > >> So I think we will need a new bit, FEATURES_OK? I would call this more generally, DEVICE_OK. This also means DRIVER_OK must not fail as I proposed for VIRTIO-30. Let's switch to that thread, I'll respond there with a summary of this discussion, and a new proposal. > >> > >> Cheers, > >> Rusty. > > > > But what's the meaning of DRIVER_OK then? > > Maybe it means "device can use VQs"? > > > > And if device does not use VQs, does it matter that > > driver can add buffers there? > > If not maybe we can let driver play with features all > > it wants. > > > > Also all this is really part of discussion for VIRTIO-30 I think? > > It's not really related to the layout - the issue is there > > in old layout as well. > > Correct? > >


  • 12.  Re: [virtio] Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-23-2013 11:08
    On Mon, Sep 23, 2013 at 12:32:48PM +0200, Paolo Bonzini wrote:
    > Il 23/09/2013 08:26, Michael S. Tsirkin ha scritto:
    > > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote:
    > >> "Michael S. Tsirkin" <mst@redhat.com> writes:
    > >>> On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote:
    > >>>> We should also define what "locks in" feature negotiation: that was
    > >>>> simple for single 32-bit field. Should we define this in a
    > >>>> transport-independent way, or leave it to the transports?
    > >>>>
    > >>>> We could overload the DRIVER status bit (which Linux currently calls too
    > >>>> early, though moving it would be harmless), or add a new one.
    > >>>
    > >>> I'm surprised.
    > >>> I always read 2.2.1. Device Initialization as
    > >>> an explicit requirement that DRIVER_OK locks the features,
    > >>> and that's in a transport-independent section and
    > >>> works for existing guests.
    > >>>
    > >>> And if that's not explicit enough, would the proposed
    > >>> resolution for VIRTIO-30 make it explicit enough?
    > >>
    > >> Linux violates this, as it makes the device live *then* sets DRIVER_OK.
    > >
    > > Interesting.
    > > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs.
    > >
    > >> This means we can use the device (and *do*, for virtio_blk partition
    > >> scanning
    > >
    > > What does this refer to? virtblk_getgeo?
    >
    > It's done by add_disk (in block/genhd.c).
    >
    > Paolo

    Yes, that's true. Same with most drivers:
    they expose device to linux as the last step.

    And there's no clean way to solve this, really.
    I think devices should buffer kicks received
    before driver_ok and execute them later.

    > > That only seems to poke at config space, not VQs.
    > >
    > >> ) before finalizing features.
    > >>
    > >> We could become compliant by setting DRIVER_OK before calling the
    > >> driver, but that's be misleading: we'd not expect DRIVER_OK if the
    > >> driver init failed.
    > >> So I think we will need a new bit, FEATURES_OK?

    I would call this more generally, DEVICE_OK.

    This also means DRIVER_OK must not fail as I
    proposed for VIRTIO-30.
    Let's switch to that thread, I'll respond there with
    a summary of this discussion, and a new proposal.

    > >>
    > >> Cheers,
    > >> Rusty.
    > >
    > > But what's the meaning of DRIVER_OK then?
    > > Maybe it means "device can use VQs"?
    > >
    > > And if device does not use VQs, does it matter that
    > > driver can add buffers there?
    > > If not maybe we can let driver play with features all
    > > it wants.
    > >
    > > Also all this is really part of discussion for VIRTIO-30 I think?
    > > It's not really related to the layout - the issue is there
    > > in old layout as well.
    > > Correct?
    > >



  • 13.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-24-2013 06:07
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote:
    >> "Michael S. Tsirkin" <mst@redhat.com> writes:
    >> > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote:
    >> >> We should also define what "locks in" feature negotiation: that was
    >> >> simple for single 32-bit field. Should we define this in a
    >> >> transport-independent way, or leave it to the transports?
    >> >>
    >> >> We could overload the DRIVER status bit (which Linux currently calls too
    >> >> early, though moving it would be harmless), or add a new one.
    >> >
    >> > I'm surprised.
    >> > I always read 2.2.1. Device Initialization as
    >> > an explicit requirement that DRIVER_OK locks the features,
    >> > and that's in a transport-independent section and
    >> > works for existing guests.
    >> >
    >> > And if that's not explicit enough, would the proposed
    >> > resolution for VIRTIO-30 make it explicit enough?
    >>
    >> Linux violates this, as it makes the device live *then* sets DRIVER_OK.
    >
    > Interesting.
    > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs.
    >
    >> This means we can use the device (and *do*, for virtio_blk partition
    >> scanning
    >
    > What does this refer to? virtblk_getgeo?
    > That only seems to poke at config space, not VQs.
    >
    >>) before finalizing features.

    IIRC add_disk() does partition scanning.

    >> We could become compliant by setting DRIVER_OK before calling the
    >> driver, but that's be misleading: we'd not expect DRIVER_OK if the
    >> driver init failed.
    >> So I think we will need a new bit, FEATURES_OK?
    >>
    >> Cheers,
    >> Rusty.
    >
    > But what's the meaning of DRIVER_OK then?

    It's informational, telling us how far the device got before it failed.

    DRIVER_OK basically indicates that as far as guest is concerned, device
    is live.

    > Maybe it means "device can use VQs"?
    >
    > And if device does not use VQs, does it matter that
    > driver can add buffers there?
    > If not maybe we can let driver play with features all
    > it wants.

    That doesn't really work with the Linux model. And if a future feature
    changed ring layout, we'd *really* need to know whether the guest
    understood that before it touched the vq.

    > Also all this is really part of discussion for VIRTIO-30 I think?
    > It's not really related to the layout - the issue is there
    > in old layout as well.
    > Correct?

    Yes, except that it was easier to work around when a single write
    updated all the features. Still I've opened a new issue...

    Thanks,
    Rusty.




  • 14.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-24-2013 06:31
    "Michael S. Tsirkin" <mst@redhat.com> writes: > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: >> >> We should also define what "locks in" feature negotiation: that was >> >> simple for single 32-bit field. Should we define this in a >> >> transport-independent way, or leave it to the transports? >> >> >> >> We could overload the DRIVER status bit (which Linux currently calls too >> >> early, though moving it would be harmless), or add a new one. >> > >> > I'm surprised. >> > I always read 2.2.1. Device Initialization as >> > an explicit requirement that DRIVER_OK locks the features, >> > and that's in a transport-independent section and >> > works for existing guests. >> > >> > And if that's not explicit enough, would the proposed >> > resolution for VIRTIO-30 make it explicit enough? >> >> Linux violates this, as it makes the device live *then* sets DRIVER_OK. > > Interesting. > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs. > >> This means we can use the device (and *do*, for virtio_blk partition >> scanning > > What does this refer to? virtblk_getgeo? > That only seems to poke at config space, not VQs. > >>) before finalizing features. IIRC add_disk() does partition scanning. >> We could become compliant by setting DRIVER_OK before calling the >> driver, but that's be misleading: we'd not expect DRIVER_OK if the >> driver init failed. >> So I think we will need a new bit, FEATURES_OK? >> >> Cheers, >> Rusty. > > But what's the meaning of DRIVER_OK then? It's informational, telling us how far the device got before it failed. DRIVER_OK basically indicates that as far as guest is concerned, device is live. > Maybe it means "device can use VQs"? > > And if device does not use VQs, does it matter that > driver can add buffers there? > If not maybe we can let driver play with features all > it wants. That doesn't really work with the Linux model. And if a future feature changed ring layout, we'd *really* need to know whether the guest understood that before it touched the vq. > Also all this is really part of discussion for VIRTIO-30 I think? > It's not really related to the layout - the issue is there > in old layout as well. > Correct? Yes, except that it was easier to work around when a single write updated all the features. Still I've opened a new issue... Thanks, Rusty.


  • 15.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-24-2013 06:46
    On Tue, Sep 24, 2013 at 03:36:42PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote: > >> >> We should also define what "locks in" feature negotiation: that was > >> >> simple for single 32-bit field. Should we define this in a > >> >> transport-independent way, or leave it to the transports? > >> >> > >> >> We could overload the DRIVER status bit (which Linux currently calls too > >> >> early, though moving it would be harmless), or add a new one. > >> > > >> > I'm surprised. > >> > I always read 2.2.1. Device Initialization as > >> > an explicit requirement that DRIVER_OK locks the features, > >> > and that's in a transport-independent section and > >> > works for existing guests. > >> > > >> > And if that's not explicit enough, would the proposed > >> > resolution for VIRTIO-30 make it explicit enough? > >> > >> Linux violates this, as it makes the device live *then* sets DRIVER_OK. > > > > Interesting. > > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs. > > > >> This means we can use the device (and *do*, for virtio_blk partition > >> scanning > > > > What does this refer to? virtblk_getgeo? > > That only seems to poke at config space, not VQs. > > > >>) before finalizing features. > > IIRC add_disk() does partition scanning. > > >> We could become compliant by setting DRIVER_OK before calling the > >> driver, but that's be misleading: we'd not expect DRIVER_OK if the > >> driver init failed. > >> So I think we will need a new bit, FEATURES_OK? > >> > >> Cheers, > >> Rusty. > > > > But what's the meaning of DRIVER_OK then? > > It's informational, telling us how far the device got before it failed. > > DRIVER_OK basically indicates that as far as guest is concerned, device > is live. > > > Maybe it means "device can use VQs"? > > > > And if device does not use VQs, does it matter that > > driver can add buffers there? > > If not maybe we can let driver play with features all > > it wants. > > That doesn't really work with the Linux model. And if a future feature > changed ring layout, we'd *really* need to know whether the guest > understood that before it touched the vq. OK so I added DEVICE_OK. Could you look at the latest draft? > > Also all this is really part of discussion for VIRTIO-30 I think? > > It's not really related to the layout - the issue is there > > in old layout as well. > > Correct? > > Yes, except that it was easier to work around when a single write > updated all the features. Still I've opened a new issue... > > Thanks, > Rusty. Hmm you don't think VIRTIO-30 is a good place to track it all?


  • 16.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-24-2013 06:48
    On Tue, Sep 24, 2013 at 03:36:42PM +0930, Rusty Russell wrote:
    > "Michael S. Tsirkin" <mst@redhat.com> writes:
    > > On Mon, Sep 23, 2013 at 10:57:24AM +0930, Rusty Russell wrote:
    > >> "Michael S. Tsirkin" <mst@redhat.com> writes:
    > >> > On Thu, Sep 19, 2013 at 11:53:24AM +0930, Rusty Russell wrote:
    > >> >> We should also define what "locks in" feature negotiation: that was
    > >> >> simple for single 32-bit field. Should we define this in a
    > >> >> transport-independent way, or leave it to the transports?
    > >> >>
    > >> >> We could overload the DRIVER status bit (which Linux currently calls too
    > >> >> early, though moving it would be harmless), or add a new one.
    > >> >
    > >> > I'm surprised.
    > >> > I always read 2.2.1. Device Initialization as
    > >> > an explicit requirement that DRIVER_OK locks the features,
    > >> > and that's in a transport-independent section and
    > >> > works for existing guests.
    > >> >
    > >> > And if that's not explicit enough, would the proposed
    > >> > resolution for VIRTIO-30 make it explicit enough?
    > >>
    > >> Linux violates this, as it makes the device live *then* sets DRIVER_OK.
    > >
    > > Interesting.
    > > Actually, e.g. net drivers do this, too: they pre-fill the RX VQs.
    > >
    > >> This means we can use the device (and *do*, for virtio_blk partition
    > >> scanning
    > >
    > > What does this refer to? virtblk_getgeo?
    > > That only seems to poke at config space, not VQs.
    > >
    > >>) before finalizing features.
    >
    > IIRC add_disk() does partition scanning.
    >
    > >> We could become compliant by setting DRIVER_OK before calling the
    > >> driver, but that's be misleading: we'd not expect DRIVER_OK if the
    > >> driver init failed.
    > >> So I think we will need a new bit, FEATURES_OK?
    > >>
    > >> Cheers,
    > >> Rusty.
    > >
    > > But what's the meaning of DRIVER_OK then?
    >
    > It's informational, telling us how far the device got before it failed.
    >
    > DRIVER_OK basically indicates that as far as guest is concerned, device
    > is live.
    >
    > > Maybe it means "device can use VQs"?
    > >
    > > And if device does not use VQs, does it matter that
    > > driver can add buffers there?
    > > If not maybe we can let driver play with features all
    > > it wants.
    >
    > That doesn't really work with the Linux model. And if a future feature
    > changed ring layout, we'd *really* need to know whether the guest
    > understood that before it touched the vq.

    OK so I added DEVICE_OK. Could you look at the latest draft?

    > > Also all this is really part of discussion for VIRTIO-30 I think?
    > > It's not really related to the layout - the issue is there
    > > in old layout as well.
    > > Correct?
    >
    > Yes, except that it was easier to work around when a single write
    > updated all the features. Still I've opened a new issue...
    >
    > Thanks,
    > Rusty.

    Hmm you don't think VIRTIO-30 is a good place to track it all?



  • 17.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-25-2013 02:22
    "Michael S. Tsirkin" <mst@redhat.com> writes:
    > On Tue, Sep 24, 2013 at 03:36:42PM +0930, Rusty Russell wrote:
    >> "Michael S. Tsirkin" <mst@redhat.com> writes:
    >> Yes, except that it was easier to work around when a single write
    >> updated all the features. Still I've opened a new issue...
    >>
    >> Thanks,
    >> Rusty.
    >
    > Hmm you don't think VIRTIO-30 is a good place to track it all?

    Sorry, I misread. Yes, VIRTIO-30 is fine. I closed mine, and will
    comment on the VIRTIO-30 thread...

    Cheers,
    Rusty.





  • 18.  Re: [virtio-dev] [PATCH v3] pci: new configuration layout

    Posted 09-25-2013 04:17
    "Michael S. Tsirkin" <mst@redhat.com> writes: > On Tue, Sep 24, 2013 at 03:36:42PM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> Yes, except that it was easier to work around when a single write >> updated all the features. Still I've opened a new issue... >> >> Thanks, >> Rusty. > > Hmm you don't think VIRTIO-30 is a good place to track it all? Sorry, I misread. Yes, VIRTIO-30 is fine. I closed mine, and will comment on the VIRTIO-30 thread... Cheers, Rusty.