On Fri, Oct 04, 2013 at 01:06:43PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <
mst@redhat.com> writes: > > On Tue, Sep 24, 2013 at 08:18:20PM +0930, Rusty Russell wrote: > >> I don't think qemu supports changing a disk underneath a device anyway? > > > > QEMU supports resizing I think. > > > >> And in practice, I'm not sure it would ever change fast enough to > >> trigger this race... > > > > Well it's easy to make guest very slow :) > > > >> So, should we try to fix it? > >> Rusty. > > > > Not really sure but I thought I'd put it on the table. > > OK, I got ambitious! How's this look? > > Cheers, > Rusty. > > commit 9aebe9c69be1461dd0110dfc61459a4ac7bda6f9 > Author: Rusty Russell <
rusty@au1.ibm.com> > Date: Fri Oct 4 13:01:35 2013 +0930 > > Configuration (read) atomicity. > > Aka issue VIRTIO-35. > > This is solved per transport: > 1) PCI: use the 8 bit reserved field. > Assume that if you really change that fast, you'll do it lazily on > config space read. I guess it's possible ... but isn't it easier to just make it a 32 bit field than add this lazy update logic to devices? > 2) MMIO: use offset 0x18. > Qemu returns 0 when they read here anyway, so it's almost 100% > backwards compatible (legacy drivers should still multi-read > for any transport). > 3) CCW: no transport changes. > They always read/write the entire thing. This just shows that > Cornelia is smarter than I am. > > Signed-off-by: Rusty Russell <
rusty@au1.ibm.com> > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > index b5a5835..2945db4 100644 > --- a/virtio-v1.0-wd01-part1-specification.txt > +++ b/virtio-v1.0-wd01-part1-specification.txt > @@ -201,11 +201,36 @@ Interface' in the section title. > ------------------------- > > Configuration space is generally used for rarely-changing or > -initialization-time parameters. > +initialization-time parameters. Drivers must not assume reads from > +fields greater than 32 bits wide are atomic, nor or reads from > +multiple fields. > + > +Each transport provides a generation count for the configuration > +space, which must change whenever there is a possibility that two MUST? > +accesses to the configuration space can see different versions of that > +space. > + > +Thus drivers should read configuration space fields like so: SHOULD? > + > + u32 before, after; > + do { > + before = get_config_generation(device); > + // read config entry/entries. > + after = get_config_generation(device); > + } while (after != before); BTW this implies reads must never have side effects (PCI ISR is the only exception, isn't it?). > > Note that this space is generally the guest's native endian, > rather than PCI's little-endian. > > +2.1.3.1. Legacy Interface: Configuration Space > +------------------------- > + > +Legacy devices did not have a configuration generation field, thus are > +susceptible to race conditions if configuration is updated. This > +effects the block capacity and network mac fields; best practice is to > +read these fields multiple times until two reads generate a consistent > +result. > + > 2.1.4. Virtqueues > ---------------- > > @@ -768,7 +793,7 @@ struct virtio_pci_common_cfg { > __le16 msix_config; /* read-write */ > __le16 num_queues; /* read-only */ > __u8 device_status; /* read-write */ > - __u8 unused1; > + __u8 config_generation; /* read-only */ > > /* About a specific virtqueue. */ > __le16 queue_select; /* read-write */ > @@ -820,6 +845,14 @@ device_status > Device Status field. Writing 0 into this field resets the > device. > > +config_generation > + > + Configuration atomicity value. Changes every time the > + configuration noticeably changes. This means the device may > + only change the value after a configuration read operation, > + but it must change if there is any risk of a device seeing an > + inconsistent configuration state. > + > queue_select > > Queue Select. Selects which virtqueue do other fields refer to. > @@ -941,6 +974,9 @@ Legacy Interface. When used through the Legacy Interface, > Transitional Devices must assume that Feature Bits 32 to 63 > are not acknowledged by Driver. > > +As legacy devices had no configuration generation field, see "2.1.3.1. > +Legacy Interface: Configuration Space" for workarounds. > + > 2.3.1.3. PCI-specific Initialization And Device Operation > -------------------------------------------------------- > > @@ -1281,6 +1317,13 @@ configuration space. The following list presents their layout: > must write a value to the HostFeaturesSel register before > reading from the HostFeatures register. > > +• 0x018 R ConfigGeneration > + Configuration atomicity value. > + Changes every time the configuration noticeably changes. This means the > + device may only change the value after a configuration read operation, > + but it must change if there is any risk of a device seeing an inconsistent > + configuration state. > + > • 0x020 W GuestFeatures > Flags representing device features understood and activated by > the driver. > @@ -1396,6 +1439,12 @@ Writing to registers described as “R” and reading from > registers described as “W” is not permitted and can cause > undefined behavior. > > +2.3.2.2.1. Legacy Interface: MMIO Device Layout > +-------------------------- > +The ConfigGeneration field does not exist in legacy devices; fortunately > +it would return 0 if accessed. Nonetheless, the workarounds in > +"2.1.3.1. Legacy Interface: Configuration Space" should still be used. > + > 100.3.2.2.1. Virtqueue Layout > ------------------------------ > > @@ -1631,7 +1680,9 @@ For changing configuration information, the guest may use > CCW_CMD_WRITE_CONF, specifying the guest memory for the host to > read from. > > -In both cases, the complete configuration space is transmitted. > +In both cases, the complete configuration space is transmitted. This > +allows the guest to compare the new configuration space with the old > +version, and keep a generation count internally whenever it changes. > > 2.3.3.2.5. Setting Up Indicators > -------------------------------- > @@ -3415,12 +3466,15 @@ transmit output. > --------------------------------------- > > Configuration space should only be used for initialization-time > -parameters. It is a limited resource with no synchronization, so for > -most uses it is better to use a virtqueue to update configuration > -information (the network device does this for filtering, > +parameters. It is a limited resource with no synchronization between > +writable fields, so for most uses it is better to use a virtqueue to update > +configuration information (the network device does this for filtering, > otherwise the table in the config space could potentially be very > large). > > +Devices must not assume that configuration fields over 32 bits wide > +are atomically writable. > + > 2.7.3. What Device Number? > -------------------------- >