OASIS Virtual I/O Device (VIRTIO) TC

  • 1.  [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 07:57
    VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance (up to 2x, though more realistically +30-40%) in latency-bound workloads. However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great idea, because it simplifies simple drivers (e.g. firmware) that are okay with a writethrough cache but still need data to persist after power loss. What really should have been removed is just the possibility that devices not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in the new world of conformance statements. Restore these, with the following changes: * clarify and use conformance statements in order to define writeback and writethrough caching according to what is commonly done by high-end storage. * clarify (with conformance statements) the influence of the VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. * strengthen the requirement for persisting writes after a VIRTIO_BLK_T_FLUSH request. This is now a MUST. This fixes VIRTIO-144. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- content.tex 76 ++++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 52 insertions(+), 24 deletions(-) diff --git a/content.tex b/content.tex index 8de905f..4e03844 100644 --- a/content.tex +++ b/content.tex @@ -3723,26 +3723,31 @@ device except where noted. item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in field{blk_size}. +item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. + item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O alignment. + +item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback + and writethrough modes. end{description} +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it +if they propose VIRTIO_BLK_F_CONFIG_WCE. Drivers SHOULD only negotiate +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands. + subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} egin{description} item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. - -item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. - -item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback - and writethrough modes. end{description} -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers -MUST only negotiate this feature if they are capable of sending -VIRTIO_BLK_T_FLUSH commands. +egin{note} Naming of features + In the legacy interface, VIRTIO_BLK_F_FLUSH was also + called VIRTIO_BLK_F_WCE. +end{note} subsection{Device configuration layout}label{sec:Device Types / Block Device / Device configuration layout} @@ -3771,7 +3776,7 @@ struct virtio_blk_config { // optimal (suggested maximum) I/O size in blocks le32 opt_io_size; } topology; - u8 reserved; + u8 writeback; }; end{lstlisting} @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than field{topology} struct can be read to determine the physical block size and optimal I/O lengths for the driver to use. This also does not affect the units in the protocol, only performance. -end{enumerate} - -subsubsection{Legacy Interface: Device Initialization}label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} -The field{reserved} field used to be called field{writeback}. If the -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be -read from field{writeback}; the -driver can also write to the field in order to toggle the cache -between writethrough (0) and writeback (1) mode. If the feature is -not available, the driver can instead look at the result of -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache + mode can be read from field{writeback}. 0 corresponds to a writethrough + cache, 1 to a writeback cachefootnote{Consistent with +
    ef{sec:Device Types / Block Device / Device Operation}, + a writethrough cache can be defined broadly as a cache that commits + writes to non-volatile storage before reporting their completion. + For example, a battery-backed writeback cache actually counts as + writethrough according to this definition.}. + +item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver + can deduct the cache mode by looking at the result of negotiating + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if + VIRTIO_BLK_F_FLUSH is negotiated. +end{enumerate} -Some older legacy devices did not operate in writethrough mode even -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while +VIRTIO_BLK_F_CONFIG_WCE is, field{writeback} MUST read 0 immediately +after feature negotiation. Alternatively, a device MAY simply fail +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but +does not propose VIRTIO_BLK_F_FLUSH. subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} @@ -3871,8 +3882,25 @@ A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT write any data. -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure -that any writes which were completed are committed to non-volatile storage. +If the field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the +device SHOULD ensure that all writes are committed to non-volatile +storage before reporting completion. It MUST do so if it proposed +field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. + +If the field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes +which were completed are committed to non-volatile storage. + +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the +field{writeback} field in configuration space is 0, the device MUST +ensure that all writes are committed to non-volatile storage before +reporting completion. + +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY +switch to writethrough or writeback mode by writing respectively 0 and +1 to the field{writeback} field. After writing a 0 to field{writeback}, +the driver MUST NOT assume that writes submitted while the field was 1 have +been committed to non-volatile storage. subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} When using the legacy interface, transitional devices and drivers -- 2.4.3


  • 2.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 09:27
    On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote: > VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance > (up to 2x, though more realistically +30-40%) in latency-bound workloads. > However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. > > In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great > idea, because it simplifies simple drivers (e.g. firmware) that are okay > with a writethrough cache but still need data to persist after power loss. > What really should have been removed is just the possibility that devices > not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in > the new world of conformance statements. > > Restore these, with the following changes: > > * clarify and use conformance statements in order to define writeback > and writethrough caching according to what is commonly done by high-end > storage. > > * clarify (with conformance statements) the influence of the > VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of > VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. > > * strengthen the requirement for persisting writes after a > VIRTIO_BLK_T_FLUSH request. This is now a MUST. > > This fixes VIRTIO-144. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > content.tex 76 ++++++++++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 52 insertions(+), 24 deletions(-) > > diff --git a/content.tex b/content.tex > index 8de905f..4e03844 100644 > --- a/content.tex > +++ b/content.tex > @@ -3723,26 +3723,31 @@ device except where noted. > > item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in field{blk_size}. > > +item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > + > item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O > alignment. > + > +item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > + and writethrough modes. > end{description} > > +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it > +if they propose VIRTIO_BLK_F_CONFIG_WCE. Drivers SHOULD only negotiate > +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands. > + Sorry this is a non-normative section, so you can't say MUST and SHOULD here. You'll need to add a normative one, and link from conformance sections. It's ok to have a bit of duplication between normative and non-normative ones if this helps readability. > subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} > > egin{description} > item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. > > item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. > - > -item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > - Did you mean to remove this one? > -item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > - and writethrough modes. > end{description} > > -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers > -MUST only negotiate this feature if they are capable of sending > -VIRTIO_BLK_T_FLUSH commands. > +egin{note} Naming of features > + In the legacy interface, VIRTIO_BLK_F_FLUSH was also > + called VIRTIO_BLK_F_WCE. > +end{note} > This one is ok. > subsection{Device configuration layout}label{sec:Device Types / Block Device / Device configuration layout} > > @@ -3771,7 +3776,7 @@ struct virtio_blk_config { > // optimal (suggested maximum) I/O size in blocks > le32 opt_io_size; > } topology; > - u8 reserved; > + u8 writeback; > }; > end{lstlisting} > > @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than > field{topology} struct can be read to determine the physical block size and optimal > I/O lengths for the driver to use. This also does not affect the units > in the protocol, only performance. > -end{enumerate} > - > -subsubsection{Legacy Interface: Device Initialization}label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} > > -The field{reserved} field used to be called field{writeback}. If the > -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be > -read from field{writeback}; the > -driver can also write to the field in order to toggle the cache > -between writethrough (0) and writeback (1) mode. If the feature is > -not available, the driver can instead look at the result of > -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode > -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. > +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > + mode can be read from field{writeback}. 0 corresponds to a writethrough > + cache, 1 to a writeback cachefootnote{Consistent with > +
    ef{sec:Device Types / Block Device / Device Operation}, > + a writethrough cache can be defined broadly as a cache that commits > + writes to non-volatile storage before reporting their completion. > + For example, a battery-backed writeback cache actually counts as > + writethrough according to this definition.}. This seems a bit too much text for a footnote, I'd still prefer it as a simple text note. > + > +item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver > + can deduct the cache mode by looking at the result of negotiating > + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if > + VIRTIO_BLK_F_FLUSH is negotiated. > +end{enumerate} > > -Some older legacy devices did not operate in writethrough mode even > -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. > +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while > +VIRTIO_BLK_F_CONFIG_WCE is, field{writeback} MUST read 0 immediately > +after feature negotiation. This is confusing. Previously you simply said writeback is only valid if VIRTIO_BLK_F_CONFIG_WCE is negotiated. So let's put the mention of VIRTIO_BLK_F_CONFIG_WCE in brackets or remove. Also, what's the reset value if it is negotiated? Also, should we prohibit driver from writing the field if VIRTIO_BLK_F_FLUSH is off? > Alternatively, a device MAY simply fail > +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but > +does not propose VIRTIO_BLK_F_FLUSH. > Alternatively to what? We can't have a mandatory behaviour but also make it optional. And driver does not propose features, it negotiates them. Maybe driver must/should negotiate some features or do flushes? Or maybe device must fail negotiation if it meets some criteria? > subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} > > @@ -3871,8 +3882,25 @@ A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR > for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT > write any data. > > -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure > -that any writes which were completed are committed to non-volatile storage. > +If the field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the > +device SHOULD ensure that all writes are committed to non-volatile > +storage before reporting completion. It MUST do so if it proposed > +field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. The 1st one is a SHOULD because not everyone cares about data loss. But why is the second one a MUST? Same logic seems to apply. Same below. > + > +If the field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt > +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes > +which were completed are committed to non-volatile storage. Aren't there any driver requirements? E.g. with one mode driver may assume data is safe when complete, in another is should issue flush requests to commit it to storage. > + > +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the > +field{writeback} field in configuration space is 0, the device MUST > +ensure that all writes are committed to non-volatile storage before > +reporting completion. > + > +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY > +switch to writethrough or writeback mode by writing respectively 0 and > +1 to the field{writeback} field. After writing a 0 to field{writeback}, > +the driver MUST NOT assume that writes submitted while the field was 1 since the last VIRTIO_BLK_F_FLUSH completed, correct? > have > +been committed to non-volatile storage. > > subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} > When using the legacy interface, transitional devices and drivers > -- > 2.4.3 > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 3.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 09:49
    On 02/07/2015 11:27, Michael S. Tsirkin wrote: > On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote: >> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance >> (up to 2x, though more realistically +30-40%) in latency-bound workloads. >> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. >> >> In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great >> idea, because it simplifies simple drivers (e.g. firmware) that are okay >> with a writethrough cache but still need data to persist after power loss. >> What really should have been removed is just the possibility that devices >> not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in >> the new world of conformance statements. >> >> Restore these, with the following changes: >> >> * clarify and use conformance statements in order to define writeback >> and writethrough caching according to what is commonly done by high-end >> storage. >> >> * clarify (with conformance statements) the influence of the >> VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of >> VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. >> >> * strengthen the requirement for persisting writes after a >> VIRTIO_BLK_T_FLUSH request. This is now a MUST. >> >> This fixes VIRTIO-144. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> content.tex 76 ++++++++++++++++++++++++++++++++++++++++++------------------- >> 1 file changed, 52 insertions(+), 24 deletions(-) >> >> diff --git a/content.tex b/content.tex >> index 8de905f..4e03844 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -3723,26 +3723,31 @@ device except where noted. >> >> item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in field{blk_size}. >> >> +item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. >> + >> item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O >> alignment. >> + >> +item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback >> + and writethrough modes. >> end{description} >> >> +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it >> +if they propose VIRTIO_BLK_F_CONFIG_WCE. Drivers SHOULD only negotiate >> +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands. >> + > > Sorry this is a non-normative section, so you can't say MUST and SHOULD > here. You'll need to add a normative one, and link from conformance > sections. > It's ok to have a bit of duplication between normative > and non-normative ones if this helps readability. Ok, will move this to a normative subsubsection. >> subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} >> >> egin{description} >> item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. >> >> item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. >> - >> -item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. >> - > > Did you mean to remove this one? Yes, it's not legacy anymore. >> subsection{Device configuration layout}label{sec:Device Types / Block Device / Device configuration layout} >> >> @@ -3771,7 +3776,7 @@ struct virtio_blk_config { >> // optimal (suggested maximum) I/O size in blocks >> le32 opt_io_size; >> } topology; >> - u8 reserved; >> + u8 writeback; >> }; >> end{lstlisting} >> >> @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than >> field{topology} struct can be read to determine the physical block size and optimal >> I/O lengths for the driver to use. This also does not affect the units >> in the protocol, only performance. >> -end{enumerate} >> - >> -subsubsection{Legacy Interface: Device Initialization}label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} >> >> -The field{reserved} field used to be called field{writeback}. If the >> -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be >> -read from field{writeback}; the >> -driver can also write to the field in order to toggle the cache >> -between writethrough (0) and writeback (1) mode. If the feature is >> -not available, the driver can instead look at the result of >> -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode >> -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. >> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache >> + mode can be read from field{writeback}. 0 corresponds to a writethrough >> + cache, 1 to a writeback cachefootnote{Consistent with >> +
    ef{sec:Device Types / Block Device / Device Operation}, >> + a writethrough cache can be defined broadly as a cache that commits >> + writes to non-volatile storage before reporting their completion. >> + For example, a battery-backed writeback cache actually counts as >> + writethrough according to this definition.}. > > This seems a bit too much text for a footnote, I'd still prefer it > as a simple text note. I can't do this inside an {enumerate}. It is long-ish, but it's just documenting what existing hardware does. >> + >> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver >> + can deduct the cache mode by looking at the result of negotiating >> + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if >> + VIRTIO_BLK_F_FLUSH is negotiated. >> +end{enumerate} >> >> -Some older legacy devices did not operate in writethrough mode even >> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. >> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while >> +VIRTIO_BLK_F_CONFIG_WCE is, field{writeback} MUST read 0 immediately >> +after feature negotiation. > > This is confusing. > Previously you simply said writeback is only valid if > VIRTIO_BLK_F_CONFIG_WCE is negotiated. This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated *while VIRTIO_BLK_F_CONFIG_WCE is*". > Also, what's the reset value if it is negotiated? It is not specified. The device can make it persistent or reset it to a user-specified value. > Also, should we prohibit driver from writing the field > if VIRTIO_BLK_F_FLUSH is off? It doesn't really matter I think. >> Alternatively, a device MAY simply fail >> +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but >> +does not propose VIRTIO_BLK_F_FLUSH. > > Alternatively to what? I'll change the previous sentence to "field{writeback} MUST read 0 immediately after successful feature negotiation" (adding "successful"). Then, failing feature negotiation can be an alternative because you don't reach successful negotiation. > We can't have a mandatory behaviour but also > make it optional. And driver does not propose features, it negotiates > them. Will fix. > Or maybe device must fail negotiation if it meets some criteria? Yes, but just "MAY". >> subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} >> >> @@ -3871,8 +3882,25 @@ A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR >> for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT >> write any data. >> >> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure >> -that any writes which were completed are committed to non-volatile storage. >> +If the field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the >> +device SHOULD ensure that all writes are committed to non-volatile >> +storage before reporting completion. It MUST do so if it proposed >> +field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. > > The 1st one is a SHOULD because not everyone cares about data loss. > But why is the second one a MUST? Same logic seems to apply. Because then the driver can _know_ that it is safe from data losses, by looking at proposed features. > Same below. > >> + >> +If the field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt >> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes >> +which were completed are committed to non-volatile storage. > > Aren't there any driver requirements? > E.g. with one mode driver may assume data is safe > when complete, in another is should issue flush requests > to commit it to storage. I don't follow this. Can you rephrase? >> + >> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the >> +field{writeback} field in configuration space is 0, the device MUST >> +ensure that all writes are committed to non-volatile storage before >> +reporting completion. >> + >> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY >> +switch to writethrough or writeback mode by writing respectively 0 and >> +1 to the field{writeback} field. After writing a 0 to field{writeback}, >> +the driver MUST NOT assume that writes submitted while the field was 1 > > since the last VIRTIO_BLK_F_FLUSH completed, correct? More precisely "submitted while the field was 1 and completed after the last VIRTIO_BLK_F_FLUSH was submitted". Does it have to be specified explicitly? Paolo >> have +been committed to non-volatile storage. > > > >> >> subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} >> When using the legacy interface, transitional devices and drivers >> -- >> 2.4.3 >> >> >> --------------------------------------------------------------------- >> To unsubscribe from this mail list, you must leave the OASIS TC that >> generates this mail. Follow this link to all your TCs in OASIS at: >> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 4.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 10:51
    On Thu, Jul 02, 2015 at 11:48:15AM +0200, Paolo Bonzini wrote: > > > On 02/07/2015 11:27, Michael S. Tsirkin wrote: > > On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote: > >> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance > >> (up to 2x, though more realistically +30-40%) in latency-bound workloads. > >> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. > >> > >> In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great > >> idea, because it simplifies simple drivers (e.g. firmware) that are okay > >> with a writethrough cache but still need data to persist after power loss. > >> What really should have been removed is just the possibility that devices > >> not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in > >> the new world of conformance statements. > >> > >> Restore these, with the following changes: > >> > >> * clarify and use conformance statements in order to define writeback > >> and writethrough caching according to what is commonly done by high-end > >> storage. > >> > >> * clarify (with conformance statements) the influence of the > >> VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of > >> VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. > >> > >> * strengthen the requirement for persisting writes after a > >> VIRTIO_BLK_T_FLUSH request. This is now a MUST. > >> > >> This fixes VIRTIO-144. > >> > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > >> --- > >> content.tex 76 ++++++++++++++++++++++++++++++++++++++++++------------------- > >> 1 file changed, 52 insertions(+), 24 deletions(-) > >> > >> diff --git a/content.tex b/content.tex > >> index 8de905f..4e03844 100644 > >> --- a/content.tex > >> +++ b/content.tex > >> @@ -3723,26 +3723,31 @@ device except where noted. > >> > >> item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in field{blk_size}. > >> > >> +item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > >> + > >> item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O > >> alignment. > >> + > >> +item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > >> + and writethrough modes. > >> end{description} > >> > >> +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it > >> +if they propose VIRTIO_BLK_F_CONFIG_WCE. Drivers SHOULD only negotiate > >> +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands. > >> + > > > > Sorry this is a non-normative section, so you can't say MUST and SHOULD > > here. You'll need to add a normative one, and link from conformance > > sections. > > It's ok to have a bit of duplication between normative > > and non-normative ones if this helps readability. > > Ok, will move this to a normative subsubsection. > > >> subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} > >> > >> egin{description} > >> item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. > >> > >> item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. > >> - > >> -item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > >> - > > > > Did you mean to remove this one? > > Yes, it's not legacy anymore. I don't see it readded anywhere though. If I missed it, feel free to ignore. > >> subsection{Device configuration layout}label{sec:Device Types / Block Device / Device configuration layout} > >> > >> @@ -3771,7 +3776,7 @@ struct virtio_blk_config { > >> // optimal (suggested maximum) I/O size in blocks > >> le32 opt_io_size; > >> } topology; > >> - u8 reserved; > >> + u8 writeback; > >> }; > >> end{lstlisting} > >> > >> @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than > >> field{topology} struct can be read to determine the physical block size and optimal > >> I/O lengths for the driver to use. This also does not affect the units > >> in the protocol, only performance. > >> -end{enumerate} > >> - > >> -subsubsection{Legacy Interface: Device Initialization}label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} > >> > >> -The field{reserved} field used to be called field{writeback}. If the > >> -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be > >> -read from field{writeback}; the > >> -driver can also write to the field in order to toggle the cache > >> -between writethrough (0) and writeback (1) mode. If the feature is > >> -not available, the driver can instead look at the result of > >> -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode > >> -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. > >> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > >> + mode can be read from field{writeback}. 0 corresponds to a writethrough > >> + cache, 1 to a writeback cachefootnote{Consistent with > >> +
    ef{sec:Device Types / Block Device / Device Operation}, > >> + a writethrough cache can be defined broadly as a cache that commits > >> + writes to non-volatile storage before reporting their completion. > >> + For example, a battery-backed writeback cache actually counts as > >> + writethrough according to this definition.}. > > > > This seems a bit too much text for a footnote, I'd still prefer it > > as a simple text note. > > I can't do this inside an {enumerate}. It is long-ish, but it's just > documenting what existing hardware does. Why not? I just tried this: note nests fine within enumerate. > >> + > >> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver > >> + can deduct the cache mode by looking at the result of negotiating > >> + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if > >> + VIRTIO_BLK_F_FLUSH is negotiated. > >> +end{enumerate} > >> > >> -Some older legacy devices did not operate in writethrough mode even > >> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. > >> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while > >> +VIRTIO_BLK_F_CONFIG_WCE is, field{writeback} MUST read 0 immediately > >> +after feature negotiation. > > > > This is confusing. > > Previously you simply said writeback is only valid if > > VIRTIO_BLK_F_CONFIG_WCE is negotiated. > > This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated > *while VIRTIO_BLK_F_CONFIG_WCE is*". Would be clearer to revert the order then. Is this logic something that legacy devices implemented? I don't see anything like this in QEMU. > > Also, what's the reset value if it is negotiated? > > It is not specified. The device can make it persistent or reset it to a > user-specified value. Does driver have to read it on init then? > > Also, should we prohibit driver from writing the field > > if VIRTIO_BLK_F_FLUSH is off? > > It doesn't really matter I think. > > >> Alternatively, a device MAY simply fail > >> +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but > >> +does not propose VIRTIO_BLK_F_FLUSH. > > > > Alternatively to what? > > I'll change the previous sentence to "field{writeback} MUST read 0 > immediately after successful feature negotiation" (adding "successful"). > Then, failing feature negotiation can be an alternative because you > don't reach successful negotiation. I think this text could benefit from separating between two concepts: - device writenback mode - writeback field operation should refer to current mode, separately field might or might not make it discoverable. I see this in QEMU: /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send * cache flushes. Thus, the "auto writethrough" behavior is never * necessary for guests that support the VIRTIO_BLK_F_CONFIG_WCE feature. * Leaving it enabled would break the following sequence: * * Guest started with "-drive cache=writethrough" * Guest sets status to 0 * Guest sets DRIVER bit in status field * Guest reads host features (WCE=0, CONFIG_WCE=1) * Guest writes guest features (WCE=0, CONFIG_WCE=1) * Guest writes 1 to the WCE configuration field (writeback mode) * Guest sets DRIVER_OK bit in status field * * s->blk would erroneously be placed in writethrough mode. */ if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) { aio_context_acquire(blk_get_aio_context(s->blk)); blk_set_enable_write_cache(s->blk, virtio_has_feature(vdev, VIRTIO_BLK_F_WCE)); aio_context_release(blk_get_aio_context(s->blk)); } is this what this text is trying to say? > > We can't have a mandatory behaviour but also > > make it optional. And driver does not propose features, it negotiates > > them. > > Will fix. > > > Or maybe device must fail negotiation if it meets some criteria? > > Yes, but just "MAY". Devices may fail negotiation if any dependencies are not met. Is this the case? See Feature bit requirements that we have for the net device, I think that's formulated in a clearer way. > >> subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} > >> > >> @@ -3871,8 +3882,25 @@ A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR > >> for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT > >> write any data. > >> > >> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure > >> -that any writes which were completed are committed to non-volatile storage. > >> +If the field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the > >> +device SHOULD ensure that all writes are committed to non-volatile > >> +storage before reporting completion. It MUST do so if it proposed > >> +field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. > > > > The 1st one is a SHOULD because not everyone cares about data loss. > > But why is the second one a MUST? Same logic seems to apply. > > Because then the driver can _know_ that it is safe from data losses, by > looking at proposed features. I still don't see a difference. the point of the 1st SHOULD is that maybe it's used with e.g. -snapshot so data will be discarded on qemu exit anyway. Maybe we should specify that data loss is in case of a crash. This seems to apply to all cases that might lead to data loss. IOW driver doesn't really know it's safe from losses since backend can still lose data. All driver cares about is what is required from driver to avoid losses. > > Same below. > > > >> + > >> +If the field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt > >> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes > >> +which were completed are committed to non-volatile storage. > > > > Aren't there any driver requirements? > > E.g. with one mode driver may assume data is safe > > when complete, in another is should issue flush requests > > to commit it to storage. > > I don't follow this. Can you rephrase? If writeback is enabled, device can cache requests, and driver must issue explicit flushes, correct? If writeback is disabled, driver may assume data is committed after request completes. Above should be conformance statements for the driver. > >> + > >> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the > >> +field{writeback} field in configuration space is 0, the device MUST > >> +ensure that all writes are committed to non-volatile storage before > >> +reporting completion. > >> + > >> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY > >> +switch to writethrough or writeback mode by writing respectively 0 and > >> +1 to the field{writeback} field. After writing a 0 to field{writeback}, > >> +the driver MUST NOT assume that writes submitted while the field was 1 > > > > since the last VIRTIO_BLK_F_FLUSH completed, correct? > > More precisely "submitted while the field was 1 and completed after the > last VIRTIO_BLK_F_FLUSH was submitted". Does it have to be specified > explicitly? > > Paolo If we want it understood correctly, I guess yes. > >> have +been committed to non-volatile storage. > > > > > > > >> > >> subsubsection{Legacy Interface: Device Operation}label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} > >> When using the legacy interface, transitional devices and drivers > >> -- > >> 2.4.3 > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe from this mail list, you must leave the OASIS TC that > >> generates this mail. Follow this link to all your TCs in OASIS at: > >> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 5.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 13:02
    On 02/07/2015 12:51, Michael S. Tsirkin wrote: >>>> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache >>>> + mode can be read from field{writeback}. 0 corresponds to a writethrough >>>> + cache, 1 to a writeback cachefootnote{Consistent with >>>> +
    ef{sec:Device Types / Block Device / Device Operation}, >>>> + a writethrough cache can be defined broadly as a cache that commits >>>> + writes to non-volatile storage before reporting their completion. >>>> + For example, a battery-backed writeback cache actually counts as >>>> + writethrough according to this definition.}. >>> >>> This seems a bit too much text for a footnote, I'd still prefer it >>> as a simple text note. >> >> I can't do this inside an {enumerate}. It is long-ish, but it's just >> documenting what existing hardware does. > > Why not? I just tried this: note nests fine within enumerate. I get an error from LaTeX. >>>> + >>>> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver >>>> + can deduct the cache mode by looking at the result of negotiating >>>> + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if >>>> + VIRTIO_BLK_F_FLUSH is negotiated. >>>> +end{enumerate} >>>> >>>> -Some older legacy devices did not operate in writethrough mode even >>>> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. >>>> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while >>>> +VIRTIO_BLK_F_CONFIG_WCE is, field{writeback} MUST read 0 immediately >>>> +after feature negotiation. >>> >>> This is confusing. >>> Previously you simply said writeback is only valid if >>> VIRTIO_BLK_F_CONFIG_WCE is negotiated. >> >> This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated >> *while VIRTIO_BLK_F_CONFIG_WCE is*". > > Would be clearer to revert the order then. Sure, I can do that. > Is this logic something that legacy devices implemented? > I don't see anything like this in QEMU. It wasn't possible because of the lack of a FEATURES_OK state. I'll have to document legacy behavior better. >>> Also, what's the reset value if it is negotiated? >> >> It is not specified. The device can make it persistent or reset it to a >> user-specified value. > > Does driver have to read it on init then? Read it or set it. Linux does the latter. > I think this text could benefit from separating > between two concepts: > - device writenback mode > - writeback field > > operation should refer to current mode, separately > field might or might not make it discoverable. > > I see this in QEMU: > > /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send > * cache flushes. Thus, the "auto writethrough" behavior is never > * necessary for guests that support the VIRTIO_BLK_F_CONFIG_WCE feature. > * Leaving it enabled would break the following sequence: > * > * Guest started with "-drive cache=writethrough" > * Guest sets status to 0 > * Guest sets DRIVER bit in status field > * Guest reads host features (WCE=0, CONFIG_WCE=1) > * Guest writes guest features (WCE=0, CONFIG_WCE=1) > * Guest writes 1 to the WCE configuration field (writeback mode) > * Guest sets DRIVER_OK bit in status field > * > * s->blk would erroneously be placed in writethrough mode. > */ > if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) { > aio_context_acquire(blk_get_aio_context(s->blk)); > blk_set_enable_write_cache(s->blk, > virtio_has_feature(vdev, VIRTIO_BLK_F_WCE)); > aio_context_release(blk_get_aio_context(s->blk)); > } > > > is this what this text is trying to say? This is part of the behavior that I want to clarify. > > > > >>> We can't have a mandatory behaviour but also >>> make it optional. And driver does not propose features, it negotiates >>> them. >> >> Will fix. >> >>> Or maybe device must fail negotiation if it meets some criteria? >> >> Yes, but just "MAY". > > Devices may fail negotiation if any dependencies are not met. > Is this the case? > See Feature bit requirements that we have for the net device, > I think that's formulated in a clearer way. Good. But I don't want to make it mandatory. I'll see what I can come up with in v3. > >>>> subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} >>>> >>>> @@ -3871,8 +3882,25 @@ A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR >>>> for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT >>>> write any data. >>>> >>>> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure >>>> -that any writes which were completed are committed to non-volatile storage. >>>> +If the field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the >>>> +device SHOULD ensure that all writes are committed to non-volatile >>>> +storage before reporting completion. It MUST do so if it proposed >>>> +field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. >>> >>> The 1st one is a SHOULD because not everyone cares about data loss. >>> But why is the second one a MUST? Same logic seems to apply. >> >> Because then the driver can _know_ that it is safe from data losses, by >> looking at proposed features. > > I still don't see a difference. the point of the 1st > SHOULD is that maybe it's used with e.g. -snapshot > so data will be discarded on qemu exit anyway. No, that's not the point of the 1st SHOULD. From the point of view of the guest, if you use "-snapshot" it is totally irrelevant whether the host commits to non-volatile storage. As you wrote, data will be discarded so there is no separation between volatile and non-volatile storage. The point of the first SHOULD is simply to document reasonable behavior for the case of simple drivers that do not care about data loss. Devices that do not support VIRTIO_BLK_F_FLUSH probably would not obey that SHOULD. But devices that do support VIRTIO_BLK_F_FLUSH need to do better, hence the MUST. Do you think the MUST prevents the trick with "-snapshot"? I think it doesn't, because the guest cannot see the difference. (And I think even "MUST"s can be overridden by a user's "I know what I am doing" option such as QEMU's cache.no-flush=on). > Maybe we should specify that data loss is in case of a crash. Yes. > This seems to apply to all cases that might lead to data loss. > > IOW driver doesn't really know it's safe from losses since > backend can still lose data. All driver cares about is > what is required from driver to avoid losses. That's the point: do not propose VIRTIO_BLK_T_FLUSH if you can lose unflushed data from a crash. >>>> +If the field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt >>>> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes >>>> +which were completed are committed to non-volatile storage. > > If writeback is enabled, device can cache requests, > and driver must issue explicit flushes, correct? > If writeback is disabled, driver may assume data > is committed after request completes. > > Above should be conformance statements for the driver. I'll add something, but I don't think there should be conformance statement for the drivers about this. Every "MUST" for the driver can be rephrased to a "MAY assume" on the driver and vice versa. > >>>> + >>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the >>>> +field{writeback} field in configuration space is 0, the device MUST >>>> +ensure that all writes are committed to non-volatile storage before >>>> +reporting completion. >>>> + >>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY >>>> +switch to writethrough or writeback mode by writing respectively 0 and >>>> +1 to the field{writeback} field. After writing a 0 to field{writeback}, >>>> +the driver MUST NOT assume that writes submitted while the field was 1 >>> >>> since the last VIRTIO_BLK_F_FLUSH completed, correct? >> >> More precisely "submitted while the field was 1 and completed after the >> last VIRTIO_BLK_F_FLUSH was submitted". Does it have to be specified >> explicitly? > > If we want it understood correctly, I guess yes. Ok. I have reworked the whole description of when a write becomes stable. It is longer but I think it leaves no space for interpretation. Paolo


  • 6.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 17:13
    On Thu, Jul 02, 2015 at 03:01:55PM +0200, Paolo Bonzini wrote: > > > On 02/07/2015 12:51, Michael S. Tsirkin wrote: > >>>> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > >>>> + mode can be read from field{writeback}. 0 corresponds to a writethrough > >>>> + cache, 1 to a writeback cachefootnote{Consistent with > >>>> +
    ef{sec:Device Types / Block Device / Device Operation}, > >>>> + a writethrough cache can be defined broadly as a cache that commits > >>>> + writes to non-volatile storage before reporting their completion. > >>>> + For example, a battery-backed writeback cache actually counts as > >>>> + writethrough according to this definition.}. > >>> > >>> This seems a bit too much text for a footnote, I'd still prefer it > >>> as a simple text note. > >> > >> I can't do this inside an {enumerate}. It is long-ish, but it's just > >> documenting what existing hardware does. > > > > Why not? I just tried this: note nests fine within enumerate. > > I get an error from LaTeX. Do you mean XeTex? Also - how did you write it? > >>>> + > >>>> +item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver > >>>> + can deduct the cache mode by looking at the result of negotiating > >>>> + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if > >>>> + VIRTIO_BLK_F_FLUSH is negotiated. > >>>> +end{enumerate} > >>>> > >>>> -Some older legacy devices did not operate in writethrough mode even > >>>> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. > >>>> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while > >>>> +VIRTIO_BLK_F_CONFIG_WCE is, field{writeback} MUST read 0 immediately > >>>> +after feature negotiation. > >>> > >>> This is confusing. > >>> Previously you simply said writeback is only valid if > >>> VIRTIO_BLK_F_CONFIG_WCE is negotiated. > >> > >> This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated > >> *while VIRTIO_BLK_F_CONFIG_WCE is*". > > > > Would be clearer to revert the order then. > > Sure, I can do that. > > > Is this logic something that legacy devices implemented? > > I don't see anything like this in QEMU. > > It wasn't possible because of the lack of a FEATURES_OK state. I'll > have to document legacy behavior better. > > >>> Also, what's the reset value if it is negotiated? > >> > >> It is not specified. The device can make it persistent or reset it to a > >> user-specified value. > > > > Does driver have to read it on init then? > > Read it or set it. Linux does the latter. Is there value in specifying a non zero value on reset? Maybe we can make this 0 unconditionally? > > I think this text could benefit from separating > > between two concepts: > > - device writenback mode > > - writeback field > > > > operation should refer to current mode, separately > > field might or might not make it discoverable. > > > > I see this in QEMU: > > > > /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send > > * cache flushes. Thus, the "auto writethrough" behavior is never > > * necessary for guests that support the VIRTIO_BLK_F_CONFIG_WCE feature. > > * Leaving it enabled would break the following sequence: > > * > > * Guest started with "-drive cache=writethrough" > > * Guest sets status to 0 > > * Guest sets DRIVER bit in status field > > * Guest reads host features (WCE=0, CONFIG_WCE=1) > > * Guest writes guest features (WCE=0, CONFIG_WCE=1) > > * Guest writes 1 to the WCE configuration field (writeback mode) > > * Guest sets DRIVER_OK bit in status field > > * > > * s->blk would erroneously be placed in writethrough mode. > > */ > > if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) { > > aio_context_acquire(blk_get_aio_context(s->blk)); > > blk_set_enable_write_cache(s->blk, > > virtio_has_feature(vdev, VIRTIO_BLK_F_WCE)); > > aio_context_release(blk_get_aio_context(s->blk)); > > } > > > > > > is this what this text is trying to say? > > This is part of the behavior that I want to clarify. Can't say I see how this maps into the text. you don't document what happens without VIRTIO_BLK_F_CONFIG_WCE at all, do you? > > > > > > > > > >>> We can't have a mandatory behaviour but also > >>> make it optional. And driver does not propose features, it negotiates > >>> them. > >> > >> Will fix. > >> > >>> Or maybe device must fail negotiation if it meets some criteria? > >> > >> Yes, but just "MAY". > > > > Devices may fail negotiation if any dependencies are not met. > > Is this the case? > > See Feature bit requirements that we have for the net device, > > I think that's formulated in a clearer way. > > Good. But I don't want to make it mandatory. I'll see what I can come > up with in v3. > > > > >>>> subsection{Device Operation}label{sec:Device Types / Block Device / Device Operation} > >>>> > >>>> @@ -3871,8 +3882,25 @@ A device MUST set the field{status} byte to VIRTIO_BLK_S_IOERR > >>>> for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT > >>>> write any data. > >>>> > >>>> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure > >>>> -that any writes which were completed are committed to non-volatile storage. > >>>> +If the field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the > >>>> +device SHOULD ensure that all writes are committed to non-volatile > >>>> +storage before reporting completion. It MUST do so if it proposed > >>>> +field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. > >>> > >>> The 1st one is a SHOULD because not everyone cares about data loss. > >>> But why is the second one a MUST? Same logic seems to apply. > >> > >> Because then the driver can _know_ that it is safe from data losses, by > >> looking at proposed features. > > > > I still don't see a difference. the point of the 1st > > SHOULD is that maybe it's used with e.g. -snapshot > > so data will be discarded on qemu exit anyway. > > No, that's not the point of the 1st SHOULD. From the point of view of > the guest, if you use "-snapshot" it is totally irrelevant whether the > host commits to non-volatile storage. As you wrote, data will be > discarded so there is no separation between volatile and non-volatile > storage. > > The point of the first SHOULD is simply to document reasonable behavior > for the case of simple drivers that do not care about data loss. Hmm why do such *drivers* exist, or should be allowed? > Devices that do not support VIRTIO_BLK_F_FLUSH probably would not obey > that SHOULD. But devices that do support VIRTIO_BLK_F_FLUSH need to do > better, hence the MUST. > > Do you think the MUST prevents the trick with "-snapshot"? I think it > doesn't, because the guest cannot see the difference. Well MUST means if you don't do it you violate spec. > (And I think even > "MUST"s can be overridden by a user's "I know what I am doing" option > such as QEMU's cache.no-flush=on). I think that's exactly the SHOULD. SHOULD means "do this always unless you really know what you are doing". Check out the RFC. > > Maybe we should specify that data loss is in case of a crash. > > Yes. > > > This seems to apply to all cases that might lead to data loss. > > > > IOW driver doesn't really know it's safe from losses since > > backend can still lose data. All driver cares about is > > what is required from driver to avoid losses. > > That's the point: do not propose VIRTIO_BLK_T_FLUSH if you can lose > unflushed data from a crash. > > >>>> +If the field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt > >>>> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes > >>>> +which were completed are committed to non-volatile storage. > > > > If writeback is enabled, device can cache requests, > > and driver must issue explicit flushes, correct? > > If writeback is disabled, driver may assume data > > is committed after request completes. > > > > Above should be conformance statements for the driver. > > I'll add something, but I don't think there should be conformance > statement for the drivers about this. Every "MUST" for the driver can > be rephrased to a "MAY assume" on the driver and vice versa. True but this is an important topic so it bears repetition I think. > > > >>>> + > >>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the > >>>> +field{writeback} field in configuration space is 0, the device MUST > >>>> +ensure that all writes are committed to non-volatile storage before > >>>> +reporting completion. > >>>> + > >>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY > >>>> +switch to writethrough or writeback mode by writing respectively 0 and > >>>> +1 to the field{writeback} field. After writing a 0 to field{writeback}, > >>>> +the driver MUST NOT assume that writes submitted while the field was 1 > >>> > >>> since the last VIRTIO_BLK_F_FLUSH completed, correct? > >> > >> More precisely "submitted while the field was 1 and completed after the > >> last VIRTIO_BLK_F_FLUSH was submitted". Does it have to be specified > >> explicitly? > > > > If we want it understood correctly, I guess yes. > > Ok. I have reworked the whole description of when a write becomes > stable. It is longer but I think it leaves no space for interpretation. > > Paolo


  • 7.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 18:40
    On 02/07/2015 19:12, Michael S. Tsirkin wrote: >>>> I can't do this inside an {enumerate}. It is long-ish, but it's just >>>> documenting what existing hardware does. >>> >>> Why not? I just tried this: note nests fine within enumerate. >> >> I get an error from LaTeX. > > Do you mean XeTex? Also - how did you write it? item foo egin{note} blah end{note} item bar But let's discuss the text for now. >>>>> Also, what's the reset value if it is negotiated? >>>> >>>> It is not specified. The device can make it persistent or reset it to a >>>> user-specified value. >>> >>> Does driver have to read it on init then? >> >> Read it or set it. Linux does the latter. > > Is there value in specifying a non zero value on reset? Yes: it makes the guest do what the host administrator thinks it should do by default, effectively forwarding the host's writethrough/writeback choice to the guest. > Maybe we can make this 0 unconditionally? That would be a behavior change that I'm wary of introducing. >>> if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) { >>> aio_context_acquire(blk_get_aio_context(s->blk)); >>> blk_set_enable_write_cache(s->blk, >>> virtio_has_feature(vdev, VIRTIO_BLK_F_WCE)); >>> aio_context_release(blk_get_aio_context(s->blk)); >>> } >>> >>> >>> is this what this text is trying to say? >> >> This is part of the behavior that I want to clarify. > > Can't say I see how this maps into the text. > you don't document what happens without VIRTIO_BLK_F_CONFIG_WCE > at all, do you? It's the "neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was proposed by the device" bit. Note that the spec mentions it as part of device operation, not implementation, because the existence of a "blk_set_enable_write_cache" API is a QEMU detail that doesn't belong in the spec. >> The point of the first SHOULD is simply to document reasonable behavior >> for the case of simple drivers that do not care about data loss. > > Hmm why do such *drivers* exist, or should be allowed? They exist because they're old (MS-DOS anyone?). Because they're writing transient data. Because they are configured with the "there's a UPS or a battery-backed cache somewhere" option. Linux SCSI disks have such an option ("echo temporary write through > /sys/block/sda/device/scsi_disk/*/cache_type"), though virtio-blk doesn't. Which also answers why they should be allowed: because it's not our business. >> Devices that do not support VIRTIO_BLK_F_FLUSH probably would not obey >> that SHOULD. But devices that do support VIRTIO_BLK_F_FLUSH need to do >> better, hence the MUST. >> >> Do you think the MUST prevents the trick with "-snapshot"? I think it >> doesn't, because the guest cannot see the difference. > > Well MUST means if you don't do it you violate spec. Not necessarily. If the guest does not see the difference (e.g., won't see the data after a crash anyway, because of -snapshot), you can do whatever you want. It's the same "as is" rule that governs compiler optimizations---or pretty much any other optimization. >> (And I think even >> "MUST"s can be overridden by a user's "I know what I am doing" option >> such as QEMU's cache.no-flush=on). > > I think that's exactly the SHOULD. SHOULD means "do this always unless > you really know what you are doing". Check out the RFC. There are (at least) two levels in addition to MUST and SHOULD: a) it is required to do X. ~X puts you outside the spec. No ifs, no buts b) it is required to do X, but you can do ~X if no one notices. If this causes problems, it's your bug. c) it is required to do X, but the user may force you to do ~X instead. If this causes problems, it's the user's bug. d) it is recommended to do X. ~X is a fact of life. (a) is MUST. (d) is SHOULD. The question is, what is (b) and (c)? If we use SHOULD here ("X = avoid data losses after crashes"), we're saying that data loss after a crash is a fact of life. I don't think that's reasonable. So, I believe (b) is also MUST and there are even specs that acknowledge this, for example in RFC2616 (HTTP): When a received protocol element is parsed, the recipient MUST be able to parse any value of reasonable length that is applicable to the recipient's role and that matches the grammar defined by the corresponding ABNF rules. Note, however, that some received protocol elements might not be parsed. For example, an intermediary forwarding a message might parse a header-field into generic field-name and field-value components, but then forward the header field without further parsing inside the field-value. And I strongly suspect it's more than just intermediaries who do this... (c) means the user can put you outside the spec, but that's fair enough. Perhaps the user knows that you're really in the (b) situation. For example you're installing a new VM, and if there's a crash you'll just throw it away. No one will notice if data isn't flushed. >>> Above should be conformance statements for the driver. >> >> I'll add something, but I don't think there should be conformance >> statement for the drivers about this. Every "MUST" for the driver can >> be rephrased to a "MAY assume" on the driver and vice versa. > > True but this is an important topic so it bears repetition I think. v3 definitely goes into more excruciating detail. Paolo


  • 8.  Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

    Posted 07-02-2015 14:16
    On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote: > VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance > (up to 2x, though more realistically +30-40%) in latency-bound workloads. > However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. > > In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great > idea, because it simplifies simple drivers (e.g. firmware) that are okay > with a writethrough cache but still need data to persist after power loss. > What really should have been removed is just the possibility that devices > not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in > the new world of conformance statements. > > Restore these, with the following changes: > > * clarify and use conformance statements in order to define writeback > and writethrough caching according to what is commonly done by high-end > storage. > > * clarify (with conformance statements) the influence of the > VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of > VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. > > * strengthen the requirement for persisting writes after a > VIRTIO_BLK_T_FLUSH request. This is now a MUST. > > This fixes VIRTIO-144. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > content.tex 76 ++++++++++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 52 insertions(+), 24 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Attachment: pgpnwaoeGMryU.pgp Description: PGP signature