OASIS Virtual I/O Device (VIRTIO) TC

 View Only
Expand all | Collapse all

[PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

  • 1.  [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-10-2015 13:34
    VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance (up to 2x, though more realistically +30-40%) in latency-bound workloads. However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great idea, because it simplifies simple drivers (e.g. firmware) that are okay with a writethrough cache but still need data to persist after power loss. What really should have been removed is just the possibility that devices not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in the new world of conformance statements. Restore these, with the following changes: * clarify and use conformance statements in order to define writeback and writethrough caching according to what is commonly done by high-end storage. * clarify (with conformance statements) the influence of the VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. * strengthen the requirement for persisting writes to MUST after a VIRTIO_BLK_T_FLUSH request (and in other cases too involving the new features). The suggested behavior upon feature negotiation is okay for the Linux implementation of virtio1, even after the implementation is modified to support the two new features. VIRTIO-144 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- v4->v5: - s/permanent device backend storage/persistent device backend storage/ - consider case where device backend storage is not persistent v3->v4: - do not mention FEATURES_OK wrt initialization of field{writeback} - s/non-volatile storage/permanent device backend storage/ - s/propose/offer/ - use MUST/SHOULD/MAY in the legacy interface sections - clarify undefined behavior - clarify workaround for lack of writethrough emulation in older legacy devices Remaining issues: - do we need to explicitly allow non-volatile storage (tmpfs, cache=unsafe, etc.)? - is it RECOMMENDED ("SHOULD") or OPTIONAL ("MAY") that devices keep data safe when they do not offer VIRTIO_BLK_F_FLUSH? Note that the driver cannot expect anything even if we say "SHOULD". conformance.tex 2 + content.tex 136 +++++++++++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 117 insertions(+), 21 deletions(-) diff --git a/conformance.tex b/conformance.tex index 7b7df32..f59e360 100644 --- a/conformance.tex +++ b/conformance.tex @@ -105,6 +105,7 @@ A network driver MUST conform to the following normative statements: A block driver MUST conform to the following normative statements: egin{itemize} +item
    ef{drivernormative:Device Types / Block Device / Device Initialization} item
    ef{drivernormative:Device Types / Block Device / Device Operation} end{itemize} @@ -224,6 +225,7 @@ A network device MUST conform to the following normative statements: A block device MUST conform to the following normative statements: egin{itemize} +item
    ef{devicenormative:Device Types / Block Device / Device Initialization} item
    ef{devicenormative:Device Types / Block Device / Device Operation} end{itemize} diff --git a/content.tex b/content.tex index 3672588..93386d5 100644 --- a/content.tex +++ b/content.tex @@ -4041,8 +4041,13 @@ device except where noted. item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in field{blk_size}. +item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. + item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O alignment. + +item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback + and writethrough modes. end{description} subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} @@ -4051,16 +4056,12 @@ device except where noted. item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. - -item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. - -item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback - and writethrough modes. end{description} -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers -MUST only negotiate this feature if they are capable of sending -VIRTIO_BLK_T_FLUSH commands. +egin{note} + In the legacy interface, VIRTIO_BLK_F_FLUSH was also + called VIRTIO_BLK_F_WCE. +end{note} subsection{Device configuration layout}label{sec:Device Types / Block Device / Device configuration layout} @@ -4089,7 +4090,7 @@ struct virtio_blk_config { // optimal (suggested maximum) I/O size in blocks le32 opt_io_size; } topology; - u8 reserved; + u8 writeback; }; end{lstlisting} @@ -4119,21 +4120,56 @@ according to the native endian of the guest rather than field{topology} struct can be read to determine the physical block size and optimal I/O lengths for the driver to use. This also does not affect the units in the protocol, only performance. + +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache + mode can be read or set through the field{writeback} field. 0 corresponds + to a writethrough cache, 1 to a writeback cachefootnote{Consistent with +
    ef{devicenormative:Device Types / Block Device / Device Operation}, + a writethrough cache can be defined broadly as a cache that commits + writes to persistent device backend storage before reporting their + completion. For example, a battery-backed writeback cache actually + counts as writethrough according to this definition.}. The cache mode + after reset is undefined. end{enumerate} +drivernormative{subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} + +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of +sending VIRTIO_BLK_T_FLUSH commands. + +If the VIRTIO_BLK_F_CONFIG_WCE feature is not negotiated, but +VIRTIO_BLK_F_FLUSH was offered by the device, the driver MAY deduce +the presence of a writethrough cache. Otherwise, the driver SHOULD +assume presence of a writeback cache. + +devicenormative{subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} + +Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it +if they offer VIRTIO_BLK_F_CONFIG_WCE. + +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but the VIRTIO_BLK_F_FLUSH feature +is not, the device MUST initialize field{writeback} to 0. + subsubsection{Legacy Interface: Device Initialization}label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} -The field{reserved} field used to be called field{writeback}. If the -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be -read from field{writeback}; the -driver can also write to the field in order to toggle the cache -between writethrough (0) and writeback (1) mode. If the feature is -not available, the driver can instead look at the result of -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. +Because legacy devices do not have FEATURES_OK, transitional devices +MUST implement slightly different behavior around feature negotiation +when used through the legacy interface. In particular: + +egin{itemize} +item they MUST NOT modify field{writeback} as a result + of a driver's write to the status register. + Some legacy drivers wrote to field{writeback} between + setting the DRIVER status bit and setting the DRIVER_OK status bit. + Transitional devices MUST NOT overwrite field{writeback} when + the DRIVER_OK status bit is set. + +item they MUST NOT modify field{writeback} as a result of + a driver's write to the guest features registers, for example if the + driver sets the VIRTIO_BLK_F_CONFIG_WCE guest feature bit but does not + set the VIRTIO_BLK_F_FLUSH bit. +end{itemize} -Some older legacy devices did not operate in writethrough mode even -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} @@ -4183,14 +4219,66 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. A driver MUST set field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY +switch to writethrough or writeback mode by writing respectively 0 and +1 to the field{writeback} field. After writing a 0 to field{writeback}, +the driver MUST NOT assume that any volatile writes have been committed +to persistent device backend storage. + devicenormative{subsubsection}{Device Operation}{Device Types / Block Device / Device Operation} A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT write any data. -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure -that any writes which were completed are committed to non-volatile storage. +A write is considered volatile when it is submitted; the contents of +sectors covered by a volatile write are undefined in persistent device +backend storage until the write becomes stable. A write becomes stable +once it is completed and one or more of the following conditions is true: + +egin{enumerate} +itemlabel{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor + VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was + offered by the device; + +itemlabel{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the + field{writeback} field in configuration space was 0 extbf{all the time between + the submission of the write and its completion}; + +itemlabel{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent extbf{after the write is + completed} and is completed itself. +end{enumerate} + +If the device is backed by persistent storage, the device MUST ensure that +stable writes are committed to it, before reporting completion of the write +(cases~
    ef{item:flush1} and~
    ef{item:flush2}) or the flush +(case~
    ef{item:flush3}). Failure to do so can cause data loss +in case of a crash. + +If the driver changes field{writeback} between the submission of the write +and its completion, the write could be either volatile or stable when +its completion is reported; in other words, the exact behavior is undefined. + +% According to the device requirements for device initialization: +% Offer(CONFIG_WCE) => Offer(FLUSH). +% +% After reversing the implication: +% not Offer(FLUSH) => not Offer(CONFIG_WCE). + +If VIRTIO_BLK_F_FLUSH was not offered by the + devicefootnote{Note that in this case, according to +
    ef{devicenormative:Device Types / Block Device / Device Initialization}, + the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the +device MAY also commit writes to persistent device backend storage before +reporting their completion. Unlike case~
    ef{item:flush1}, however, this +is not an absolute requirement of the specification. + +egin{note} + An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit + completed writes will not be resilient to data loss in case of crashes. + Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement + for implementations that do not wish to be safe against such data losses. +end{note} subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} When using the legacy interface, transitional devices and drivers @@ -4233,6 +4321,12 @@ serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request does that. end{note} +Some older legacy devices did not commit completed writes to persistent +device backend storage when VIRTIO_BLK_F_FLUSH was offered but not +negotiated. In order to work around this, the driver MAY set the +field{writeback} to 0 (if available) or it MAY send an explicit flush +request after every completed write. + If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet command requests, each of these requests is of form: -- 2.4.3


  • 2.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-10-2015 13:49
    On Mon, 10 Aug 2015 15:33:49 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > +egin{itemize} > +item they MUST NOT modify field{writeback} as a result > + of a driver's write to the status register. > + Some legacy drivers wrote to field{writeback} between > + setting the DRIVER status bit and setting the DRIVER_OK status bit. > + Transitional devices MUST NOT overwrite field{writeback} when > + the DRIVER_OK status bit is set. > + > +item they MUST NOT modify field{writeback} as a result of > + a driver's write to the guest features registers, for example if the > + driver sets the VIRTIO_BLK_F_CONFIG_WCE guest feature bit but does not > + set the VIRTIO_BLK_F_FLUSH bit. > +end{itemize} Minor nit (I'm not sure if we do this elsewhere in the document already): "write to the <foo> register" sounds PCI(/MMIO?) specific. This is probably better called "driver setting the device status" and "driver writing the guest features" or something like that. But we can also correct this later.


  • 3.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 06:00
    On Mon, Aug 10, 2015 at 03:33:49PM +0200, Paolo Bonzini wrote: > VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance > (up to 2x, though more realistically +30-40%) in latency-bound workloads. > However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. > > In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great > idea, because it simplifies simple drivers (e.g. firmware) that are okay > with a writethrough cache but still need data to persist after power loss. > What really should have been removed is just the possibility that devices > not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in > the new world of conformance statements. > > Restore these, with the following changes: > > * clarify and use conformance statements in order to define writeback > and writethrough caching according to what is commonly done by high-end > storage. > > * clarify (with conformance statements) the influence of the > VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of > VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. > > * strengthen the requirement for persisting writes to MUST after > a VIRTIO_BLK_T_FLUSH request (and in other cases too involving the > new features). > > The suggested behavior upon feature negotiation is okay for the Linux > implementation of virtio1, even after the implementation is modified to > support the two new features. > > VIRTIO-144 > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > v4->v5: > - s/permanent device backend storage/persistent device backend storage/ > - consider case where device backend storage is not persistent > > v3->v4: > - do not mention FEATURES_OK wrt initialization of field{writeback} > - s/non-volatile storage/permanent device backend storage/ > - s/propose/offer/ > - use MUST/SHOULD/MAY in the legacy interface sections > - clarify undefined behavior > - clarify workaround for lack of writethrough emulation in > older legacy devices > > Remaining issues: > - do we need to explicitly allow non-volatile storage > (tmpfs, cache=unsafe, etc.)? > - is it RECOMMENDED ("SHOULD") or OPTIONAL ("MAY") that devices > keep data safe when they do not offer VIRTIO_BLK_F_FLUSH? Note that > the driver cannot expect anything even if we say "SHOULD". > > conformance.tex 2 + > content.tex 136 +++++++++++++++++++++++++++++++++++++++++++++++--------- > 2 files changed, 117 insertions(+), 21 deletions(-) > > diff --git a/conformance.tex b/conformance.tex > index 7b7df32..f59e360 100644 > --- a/conformance.tex > +++ b/conformance.tex > @@ -105,6 +105,7 @@ A network driver MUST conform to the following normative statements: > A block driver MUST conform to the following normative statements: > > egin{itemize} > +item
    ef{drivernormative:Device Types / Block Device / Device Initialization} > item
    ef{drivernormative:Device Types / Block Device / Device Operation} > end{itemize} > > @@ -224,6 +225,7 @@ A network device MUST conform to the following normative statements: > A block device MUST conform to the following normative statements: > > egin{itemize} > +item
    ef{devicenormative:Device Types / Block Device / Device Initialization} > item
    ef{devicenormative:Device Types / Block Device / Device Operation} > end{itemize} > > diff --git a/content.tex b/content.tex > index 3672588..93386d5 100644 > --- a/content.tex > +++ b/content.tex > @@ -4041,8 +4041,13 @@ device except where noted. > > item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in field{blk_size}. > > +item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > + > item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O > alignment. > + > +item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > + and writethrough modes. > end{description} > > subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} > @@ -4051,16 +4056,12 @@ device except where noted. > item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. > > item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. > - > -item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > - > -item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > - and writethrough modes. > end{description} > > -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers > -MUST only negotiate this feature if they are capable of sending > -VIRTIO_BLK_T_FLUSH commands. > +egin{note} > + In the legacy interface, VIRTIO_BLK_F_FLUSH was also > + called VIRTIO_BLK_F_WCE. > +end{note} > > subsection{Device configuration layout}label{sec:Device Types / Block Device / Device configuration layout} > > @@ -4089,7 +4090,7 @@ struct virtio_blk_config { > // optimal (suggested maximum) I/O size in blocks > le32 opt_io_size; > } topology; > - u8 reserved; > + u8 writeback; > }; > end{lstlisting} > > @@ -4119,21 +4120,56 @@ according to the native endian of the guest rather than > field{topology} struct can be read to determine the physical block size and optimal > I/O lengths for the driver to use. This also does not affect the units > in the protocol, only performance. > + > +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > + mode can be read or set through the field{writeback} field. 0 corresponds > + to a writethrough cache, 1 to a writeback cachefootnote{Consistent with > +
    ef{devicenormative:Device Types / Block Device / Device Operation}, > + a writethrough cache can be defined broadly as a cache that commits > + writes to persistent device backend storage before reporting their > + completion. For example, a battery-backed writeback cache actually > + counts as writethrough according to this definition.}. The cache mode > + after reset is undefined. Is it really undefined, or can it be either writeback or writethrough, and can be determined by reading the field{writeback} field reset value? > end{enumerate} > > +drivernormative{subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > + > +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of > +sending VIRTIO_BLK_T_FLUSH commands. > + > +If the VIRTIO_BLK_F_CONFIG_WCE feature is not negotiated, but > +VIRTIO_BLK_F_FLUSH was offered by the device, the driver MAY deduce > +the presence of a writethrough cache. Otherwise, the driver SHOULD > +assume presence of a writeback cache. > + > +devicenormative{subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > + > +Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it > +if they offer VIRTIO_BLK_F_CONFIG_WCE. > + > +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but the VIRTIO_BLK_F_FLUSH feature > +is not, the device MUST initialize field{writeback} to 0. > + > subsubsection{Legacy Interface: Device Initialization}label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} > > -The field{reserved} field used to be called field{writeback}. If the > -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be > -read from field{writeback}; the > -driver can also write to the field in order to toggle the cache > -between writethrough (0) and writeback (1) mode. If the feature is > -not available, the driver can instead look at the result of > -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode > -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. > +Because legacy devices do not have FEATURES_OK, transitional devices > +MUST implement slightly different behavior around feature negotiation > +when used through the legacy interface. In particular: > + > +egin{itemize} > +item they MUST NOT modify field{writeback} as a result > + of a driver's write to the status register. > + Some legacy drivers wrote to field{writeback} between > + setting the DRIVER status bit and setting the DRIVER_OK status bit. > + Transitional devices MUST NOT overwrite field{writeback} when > + the DRIVER_OK status bit is set. > + > +item they MUST NOT modify field{writeback} as a result of > + a driver's write to the guest features registers, for example if the > + driver sets the VIRTIO_BLK_F_CONFIG_WCE guest feature bit but does not > + set the VIRTIO_BLK_F_FLUSH bit. > +end{itemize} > > -Some older legacy devices did not operate in writethrough mode even > -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. > > subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} > > @@ -4183,14 +4219,66 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. > A driver MUST set field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. > A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. > > +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY > +switch to writethrough or writeback mode by writing respectively 0 and > +1 to the field{writeback} field. After writing a 0 to field{writeback}, > +the driver MUST NOT assume that any volatile writes have been committed > +to persistent device backend storage. > + > devicenormative{subsubsection}{Device Operation}{Device Types / Block Device / Device Operation} > > A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR > for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT > write any data. > > -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure > -that any writes which were completed are committed to non-volatile storage. > +A write is considered volatile when it is submitted; the contents of > +sectors covered by a volatile write are undefined in persistent device > +backend storage until the write becomes stable. A write becomes stable > +once it is completed and one or more of the following conditions is true: > + > +egin{enumerate} > +itemlabel{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor > + VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was > + offered by the device; > + > +itemlabel{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the > + field{writeback} field in configuration space was 0 extbf{all the time between > + the submission of the write and its completion}; > + > +itemlabel{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent extbf{after the write is > + completed} and is completed itself. > +end{enumerate} > + > +If the device is backed by persistent storage, the device MUST ensure that > +stable writes are committed to it, before reporting completion of the write > +(cases~
    ef{item:flush1} and~
    ef{item:flush2}) or the flush > +(case~
    ef{item:flush3}). Failure to do so can cause data loss > +in case of a crash. > + > +If the driver changes field{writeback} between the submission of the write > +and its completion, the write could be either volatile or stable when > +its completion is reported; in other words, the exact behavior is undefined. > + > +% According to the device requirements for device initialization: > +% Offer(CONFIG_WCE) => Offer(FLUSH). > +% > +% After reversing the implication: > +% not Offer(FLUSH) => not Offer(CONFIG_WCE). > + > +If VIRTIO_BLK_F_FLUSH was not offered by the > + devicefootnote{Note that in this case, according to > +
    ef{devicenormative:Device Types / Block Device / Device Initialization}, > + the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the > +device MAY also commit writes to persistent device backend storage before > +reporting their completion. Unlike case~
    ef{item:flush1}, however, this > +is not an absolute requirement of the specification. > + > +egin{note} > + An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit > + completed writes will not be resilient to data loss in case of crashes. > + Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement > + for implementations that do not wish to be safe against such data losses. > +end{note} > > subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} > When using the legacy interface, transitional devices and drivers > @@ -4233,6 +4321,12 @@ serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request > does that. > end{note} > > +Some older legacy devices did not commit completed writes to persistent > +device backend storage when VIRTIO_BLK_F_FLUSH was offered but not > +negotiated. In order to work around this, the driver MAY set the > +field{writeback} to 0 (if available) or it MAY send an explicit flush > +request after every completed write. > + > If the device has VIRTIO_BLK_F_SCSI feature, it can also support > scsi packet command requests, each of these requests is of form: > > -- > 2.4.3 > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 4.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 13:01
    On 11/08/2015 07:59, Michael S. Tsirkin wrote: > > +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > > + mode can be read or set through the field{writeback} field. 0 corresponds > > + to a writethrough cache, 1 to a writeback cachefootnote{Consistent with > > +
    ef{devicenormative:Device Types / Block Device / Device Operation}, > > + a writethrough cache can be defined broadly as a cache that commits > > + writes to persistent device backend storage before reporting their > > + completion. For example, a battery-backed writeback cache actually > > + counts as writethrough according to this definition.}. The cache mode > > + after reset is undefined. > > Is it really undefined, or can it be either > writeback or writethrough, and can be determined by reading the > field{writeback} field reset value? For transitional devices, it can be either writeback or writethrough and can be determined by reading field{writeback}. For non-transitional devices, it can be determined after you've written FEATURES_OK. Until then, you shouldn't read the field. Paolo


  • 5.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 13:24
    On Tue, Aug 11, 2015 at 03:01:22PM +0200, Paolo Bonzini wrote: > > > On 11/08/2015 07:59, Michael S. Tsirkin wrote: > > > +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > > > + mode can be read or set through the field{writeback} field. 0 corresponds > > > + to a writethrough cache, 1 to a writeback cachefootnote{Consistent with > > > +
    ef{devicenormative:Device Types / Block Device / Device Operation}, > > > + a writethrough cache can be defined broadly as a cache that commits > > > + writes to persistent device backend storage before reporting their > > > + completion. For example, a battery-backed writeback cache actually > > > + counts as writethrough according to this definition.}. The cache mode > > > + after reset is undefined. > > > > Is it really undefined, or can it be either > > writeback or writethrough, and can be determined by reading the > > field{writeback} field reset value? > > For transitional devices, it can be either writeback or writethrough and > can be determined by reading field{writeback}. I think you when using the legacy interface. > For non-transitional devices, it can be determined after you've written > FEATURES_OK. Until then, you shouldn't read the field. > > Paolo I think you mean when using the modern interface (default under virtio 1 spec). OK to summarize, how about: In free text: The cache mode after reset can be either writeback or writethrough. The actual mode can be determined by reading field{writeback}. In conformance paragraph: The driver MUST NOT read field{writeback} before setting the FEATURES_OK field{status} bit. And in legacy section: Legacy drivers and drivers did not support the FEATURES_OK field{status} bit, and used the device before setting the DRIVER_OK bit. Therefore: When using the legacy interface, the driver MAY read field{writeback} before setting the FEATURES_OK field{status} field. When using the legacy interface, the transitional devices MAY assume that driver features won't be changed after reading field{writeback}. -- MST


  • 6.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 13:42
    On Tue, 11 Aug 2015 16:23:39 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > OK to summarize, how about: > > In free text: > > The cache mode after reset can be either writeback or writethrough. > The actual mode can be determined by reading field{writeback}. > > In conformance paragraph: > > The driver MUST NOT read field{writeback} before setting > the FEATURES_OK field{status} bit. > > > And in legacy section: > > Legacy drivers and drivers did not support the FEATURES_OK field{status} bit, > and used the device before setting the DRIVER_OK bit. > Therefore: > > When using the legacy interface, the driver MAY read field{writeback} > before setting the FEATURES_OK field{status} field. But if it is using the legacy interface, it won't ever set FEATURES_OK, no? The wording is a bit confusing. "As the legacy interface did not support FEATURES_OK to signal end of feature negotation, the driver MAY read field{writeback} without the FEATURES_OK field{status} bit having been set." ? Do we need to mandate that the device MUST assume the driver is operating in legacy mode in that case? > When using the legacy interface, the transitional devices MAY assume > that driver features won't be changed after reading field{writeback}. "For a driver using the legacy interface, the transitional device MAY assume that the driver won't change features after reading field{writeback}." ?


  • 7.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 13:47
    On Tue, Aug 11, 2015 at 03:41:20PM +0200, Cornelia Huck wrote: > On Tue, 11 Aug 2015 16:23:39 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > OK to summarize, how about: > > > > In free text: > > > > The cache mode after reset can be either writeback or writethrough. > > The actual mode can be determined by reading field{writeback}. > > > > In conformance paragraph: > > > > The driver MUST NOT read field{writeback} before setting > > the FEATURES_OK field{status} bit. > > > > > > And in legacy section: > > > > Legacy drivers and drivers did not support the FEATURES_OK field{status} bit, > > and used the device before setting the DRIVER_OK bit. > > Therefore: > > > > When using the legacy interface, the driver MAY read field{writeback} > > before setting the FEATURES_OK field{status} field. > > But if it is using the legacy interface, it won't ever set FEATURES_OK, > no? The wording is a bit confusing. > > "As the legacy interface did not support FEATURES_OK to signal end of > feature negotation, the driver MAY read field{writeback} without the > FEATURES_OK field{status} bit having been set." > > ? Right. > Do we need to mandate that the device MUST assume the driver is operating > in legacy mode in that case? Not sure I understand - in what case? > > When using the legacy interface, the transitional devices MAY assume > > that driver features won't be changed after reading field{writeback}. > > "For a driver using the legacy interface, the transitional device MAY > assume that the driver won't change features after reading > field{writeback}." > > ? Except we say "When using the legacy interface" everywhere, let's keep it consistent pls. When using the legacy interface, the transitional device MAY assume that the driver won't change features after reading field{writeback}." point is both driver and device are using the legacy interface. -- MST


  • 8.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 14:04
    On Tue, 11 Aug 2015 16:47:05 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Aug 11, 2015 at 03:41:20PM +0200, Cornelia Huck wrote: > > On Tue, 11 Aug 2015 16:23:39 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > OK to summarize, how about: > > > > > > In free text: > > > > > > The cache mode after reset can be either writeback or writethrough. > > > The actual mode can be determined by reading field{writeback}. > > > > > > In conformance paragraph: > > > > > > The driver MUST NOT read field{writeback} before setting > > > the FEATURES_OK field{status} bit. > > > > > > > > > And in legacy section: > > > > > > Legacy drivers and drivers did not support the FEATURES_OK field{status} bit, > > > and used the device before setting the DRIVER_OK bit. > > > Therefore: > > > > > > When using the legacy interface, the driver MAY read field{writeback} > > > before setting the FEATURES_OK field{status} field. > > > > But if it is using the legacy interface, it won't ever set FEATURES_OK, > > no? The wording is a bit confusing. > > > > "As the legacy interface did not support FEATURES_OK to signal end of > > feature negotation, the driver MAY read field{writeback} without the > > FEATURES_OK field{status} bit having been set." > > > > ? > > Right. > > > Do we need to mandate that the device MUST assume the driver is operating > > in legacy mode in that case? > > Not sure I understand - in what case? If it reads writeback without FEATURES_OK having been set. > > > > When using the legacy interface, the transitional devices MAY assume > > > that driver features won't be changed after reading field{writeback}. > > > > "For a driver using the legacy interface, the transitional device MAY > > assume that the driver won't change features after reading > > field{writeback}." > > > > ? > > Except we say "When using the legacy interface" everywhere, > let's keep it consistent pls. > > When using the legacy interface, the transitional device MAY > assume that the driver won't change features after reading > field{writeback}." > > point is both driver and device are using the legacy interface. Fine with me.


  • 9.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 16:53
    On 11/08/2015 16:02, Cornelia Huck wrote: > > > Do we need to mandate that the device MUST assume the driver is operating > > > in legacy mode in that case? > > > > Not sure I understand - in what case? > > If it reads writeback without FEATURES_OK having been set. No, I don't think it's a problem. For a transitional device: - the driver MAY read or write writeback before setting the DRIVER or DRIVER_OK status bit. So the device simply has to let the driver do that. It's okay if it also lets a (buggy) modern driver do that. - the device MUST NOT modify field{writeback} as a result of a driver's write to the status register. Even a non-transitional device only has to modify field{writeback} when it sees FEATURES_OK, so at this point your device knows that the driver is not operating through the legacy interface. So if it sees DRIVER or DRIVER_OK the device can simply do nothing---if it hurts don't do it. - the device MUST NOT modify field{writeback} as a result of a driver's write to the guest features registers. Same as the previous buller, if it hurts don't do it. v6 will come tomorrow. Paolo


  • 10.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 13:53
    On 11/08/2015 15:23, Michael S. Tsirkin wrote: > OK to summarize, how about: > > In free text: > > The cache mode after reset can be either writeback or writethrough. > The actual mode can be determined by reading field{writeback}. ... after feature negotiation. Otherwise okay. > In conformance paragraph: > > The driver MUST NOT read field{writeback} before setting > the FEATURES_OK field{status} bit. Okay. > > And in legacy section: > > Legacy drivers and drivers did not support the FEATURES_OK field{status} bit, > and used the device before setting the DRIVER_OK bit. > Therefore: > > When using the legacy interface, the driver MAY read field{writeback} > before setting the FEATURES_OK field{status} field. > When using the legacy interface, the transitional devices MAY assume > that driver features won't be changed after reading field{writeback}. There is already a legacy section, so we can just add this there: item the driver MAY read or write field{writeback} before setting the DRIVER or DRIVER_OK field{status} bit Paolo


  • 11.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 14:11
    On Tue, Aug 11, 2015 at 03:53:03PM +0200, Paolo Bonzini wrote: > > > On 11/08/2015 15:23, Michael S. Tsirkin wrote: > > OK to summarize, how about: > > > > In free text: > > > > The cache mode after reset can be either writeback or writethrough. > > The actual mode can be determined by reading field{writeback}. > > ... after feature negotiation. Otherwise okay. > > > In conformance paragraph: > > > > The driver MUST NOT read field{writeback} before setting > > the FEATURES_OK field{status} bit. > > Okay. > > > > > And in legacy section: > > > > Legacy drivers and drivers did not support the FEATURES_OK field{status} bit, > > and used the device before setting the DRIVER_OK bit. > > Therefore: > > > > When using the legacy interface, the driver MAY read field{writeback} > > before setting the FEATURES_OK field{status} field. > > When using the legacy interface, the transitional devices MAY assume > > that driver features won't be changed after reading field{writeback}. > > There is already a legacy section, so we can just add this there: > > item the driver MAY read or write field{writeback} before setting the > DRIVER or DRIVER_OK field{status} bit > > Paolo Yes but the device needs to know the features to set it, right? That's why I proposed "the transitional devices MAY assume that driver features won't be changed".


  • 12.  Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 08-11-2015 17:01
    On 11/08/2015 16:10, Michael S. Tsirkin wrote: > > > When using the legacy interface, the transitional devices MAY assume > > > that driver features won't be changed after reading field{writeback}. > > > > There is already a legacy section, so we can just add this there: > > > > item the driver MAY read or write field{writeback} before setting the > > DRIVER or DRIVER_OK field{status} bit > > Yes but the device needs to know the features to set it, right? > That's why I proposed "the transitional devices MAY assume > that driver features won't be changed". No, I don't think that's necessary. The simplest way for transitional devices to obey the spec is to "modify" writeback only: - at reset time. In this case the spec says that modification can happen - when the driver sets DRIVER_OK if the driver hasn't set the CONFIG_WCE driver feature. In this case the field doesn't really exist at all. I'll put this in v6. Paolo