OASIS Virtual I/O Device (VIRTIO) TC

  • 1.  [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-26-2018 08:17
    Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES support,
    this will impact the performance when using SSD backend over file systems.

    Here is the proposal to extend existing virtio-blk protocol to support
    DISCARD/WRITE ZEROES commands.

    Basic idea here is using 16 Bytes payload to support 1 descriptor, users
    can put several segments together with 1 DISCARD command.

    struct virtio_blk_discard_write_zeroes {
    le64 slba;
    le32 nlba;
    struct {
    le32 unmap:1;
    le32 reserved:31;
    } flags;
    };

    For the purpose to support such feature, we need to introduce 2 new feature
    flags: VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, and 2 new command
    types: VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES. Also we introduce
    3 new parameters in the configuration space of virtio-blk: max_discard_sectors/
    max_discard_seg/max_write_zeroes_sectors. These parameters will tell the
    OS what's the granularity when issuing such commands.

    If both DISCARD and WRITE ZEROES are supported, unmap flag bit maybe used
    for WRITE ZEROES command with DISCARD bit enabled.

    Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
    ---
    content.tex | 41 ++++++++++++++++++++++++++++++++++++++++-
    1 file changed, 40 insertions(+), 1 deletion(-)

    diff --git a/content.tex b/content.tex
    index c7ef7fd..5555b18 100644
    --- a/content.tex
    +++ b/content.tex
    @@ -4127,6 +4127,13 @@ device except where noted.

    \item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
    and writethrough modes.
    +
    +\item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maximum
    + discard sectors size in \field{max_discard_sectors} and maximum discard
    + segment number in field{max_discard_seg}.
    +
    +\item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes command,
    + maximum write zeroes sectors size in \field{max_write_zeroes_sectors}.
    \end{description}

    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
    @@ -4170,6 +4177,9 @@ struct virtio_blk_config {
    le32 opt_io_size;
    } topology;
    u8 writeback;
    + le32 max_discard_sectors;
    + le32 max_discard_seg;
    + le32 max_write_zeroes_sectors;
    };
    \end{lstlisting}

    @@ -4211,6 +4221,15 @@ according to the native endian of the guest rather than
    after reset can be either writeback or writethrough. The actual
    mode can be determined by reading \field{writeback} after feature
    negotiation.
    +
    +\item If the VIRTIO_BLK_F_DISCARD feature is negotiated,
    + \field{max_discard_sectors} and \field{max_discard_seg} can be read
    + to determine the maximum discard sectors and maximum number of discard
    + segments for the block driver to use.
    +
    +\item if the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated,
    + \field{max_write_zeroes_sectors} can be read to determine the maximum
    + write zeroes sectors for the block driver to use.
    \end{enumerate}

    \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
    @@ -4270,21 +4289,41 @@ struct virtio_blk_req {
    u8 data[][512];
    u8 status;
    };
    +
    +struct virtio_blk_discard_write_zeroes {
    + le64 slba;
    + le32 nlba;
    + struct {
    + le32 unmap:1;
    + le32 reserved:31;
    + } flags;
    +};
    \end{lstlisting}

    The type of the request is either a read (VIRTIO_BLK_T_IN), a write
    -(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH).
    +(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes
    +(VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH).

    \begin{lstlisting}
    #define VIRTIO_BLK_T_IN 0
    #define VIRTIO_BLK_T_OUT 1
    #define VIRTIO_BLK_T_FLUSH 4
    +#define VIRTIO_BLK_T_DISCARD 16
    +#define VIRTIO_BLK_T_WRITE_ZEROES 32
    \end{lstlisting}

    The \field{sector} number indicates the offset (multiplied by 512) where
    the read or write is to occur. This field is unused and set to 0
    for scsi packet commands and for flush commands.

    +The \field{data} used for discard or write zeroes command can be
    +described by structure virtio_blk_discard_write_zeroes, while here,
    +\field{sector} is not used for the two commands. \field{slba} indicates
    +the starting sector of the segment, \field{nlba} indicates the number
    +of sectors for discard/write zeroes commands, \field{unmap} only used
    +for write zeroes command, which means write zeroes to specified sectors
    +while keep these sectors discarded.
    +
    The final \field{status} byte is written by the device: either
    VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver
    error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device:
    --
    1.9.3




  • 2.  Re: [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-26-2018 15:21
    On 26/02/2018 09:16, Changpeng Liu wrote:
    > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES support,
    > this will impact the performance when using SSD backend over file systems.
    >
    > Here is the proposal to extend existing virtio-blk protocol to support
    > DISCARD/WRITE ZEROES commands.
    >
    > Basic idea here is using 16 Bytes payload to support 1 descriptor, users
    > can put several segments together with 1 DISCARD command.
    >
    > struct virtio_blk_discard_write_zeroes {
    > le64 slba;
    > le32 nlba;
    > struct {
    > le32 unmap:1;
    > le32 reserved:31;
    > } flags;
    > };
    >
    > For the purpose to support such feature, we need to introduce 2 new feature
    > flags: VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, and 2 new command
    > types: VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES. Also we introduce
    > 3 new parameters in the configuration space of virtio-blk: max_discard_sectors/
    > max_discard_seg/max_write_zeroes_sectors. These parameters will tell the
    > OS what's the granularity when issuing such commands.
    >
    > If both DISCARD and WRITE ZEROES are supported, unmap flag bit maybe used
    > for WRITE ZEROES command with DISCARD bit enabled.
    >
    > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
    > ---
    > content.tex | 41 ++++++++++++++++++++++++++++++++++++++++-
    > 1 file changed, 40 insertions(+), 1 deletion(-)
    >
    > diff --git a/content.tex b/content.tex
    > index c7ef7fd..5555b18 100644
    > --- a/content.tex
    > +++ b/content.tex
    > @@ -4127,6 +4127,13 @@ device except where noted.
    >
    > \item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
    > and writethrough modes.
    > +
    > +\item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maximum
    > + discard sectors size in \field{max_discard_sectors} and maximum discard
    > + segment number in field{max_discard_seg}.
    > +
    > +\item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes command,
    > + maximum write zeroes sectors size in \field{max_write_zeroes_sectors}.
    > \end{description}
    >
    > \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
    > @@ -4170,6 +4177,9 @@ struct virtio_blk_config {
    > le32 opt_io_size;
    > } topology;
    > u8 writeback;

    Let's add three padding bytes here, for clarity. Please add a note that
    these three bytes MUST be zero in the "Device Requirements: Device
    Initialization" section.

    > + le32 max_discard_sectors;
    > + le32 max_discard_seg;

    Perhaps add here:

    le32 discard_sector_alignment;

    (and document it below)?

    > + le32 max_write_zeroes_sectors;

    Perhaps we can put a bit here saying whether write_zeroes will ever look
    at the \field{unmap} bit? Like:

    u8 write_zeroes_may_unmap;

    > };
    > \end{lstlisting}
    >
    > @@ -4211,6 +4221,15 @@ according to the native endian of the guest rather than
    > after reset can be either writeback or writethrough. The actual
    > mode can be determined by reading \field{writeback} after feature
    > negotiation.
    > +
    > +\item If the VIRTIO_BLK_F_DISCARD feature is negotiated,
    > + \field{max_discard_sectors} and \field{max_discard_seg} can be read
    > + to determine the maximum discard sectors and maximum number of discard
    > + segments for the block driver to use.
    > +
    > +\item if the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated,
    > + \field{max_write_zeroes_sectors} can be read to determine the maximum
    > + write zeroes sectors for the block driver to use.

    So only one struct virtio_blk_discard_write_zeroes is possible?

    > \end{enumerate}
    >
    > \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
    > @@ -4270,21 +4289,41 @@ struct virtio_blk_req {
    > u8 data[][512];
    > u8 status;
    > };
    > +
    > +struct virtio_blk_discard_write_zeroes {
    > + le64 slba;
    > + le32 nlba;

    le64 sector;
    le32 num_sectors;

    please (more consistent with the rest).

    > + struct {
    > + le32 unmap:1;
    > + le32 reserved:31;
    > + } flags;
    > +};
    > \end{lstlisting}
    >
    > The type of the request is either a read (VIRTIO_BLK_T_IN), a write
    > -(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH).
    > +(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes
    > +(VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH).
    >
    > \begin{lstlisting}
    > #define VIRTIO_BLK_T_IN 0
    > #define VIRTIO_BLK_T_OUT 1
    > #define VIRTIO_BLK_T_FLUSH 4
    > +#define VIRTIO_BLK_T_DISCARD 16
    > +#define VIRTIO_BLK_T_WRITE_ZEROES 32
    > \end{lstlisting}

    Bit 0 defines the command direction, which should be 1, while the
    command number is in bits 1-30. Since the highest-numbered command is
    currently 8, we can use 11 and 13 respectively for discard and write zeroes.

    > The \field{sector} number indicates the offset (multiplied by 512) where
    > the read or write is to occur. This field is unused and set to 0
    > for scsi packet commands and for flush commands.
    >
    > +The \field{data} used for discard or write zeroes command can be
    > +described by structure virtio_blk_discard_write_zeroes, while here,
    > +\field{sector} is not used for the two commands. \field{slba} indicates
    > +the starting sector of the segment, \field{nlba} indicates the number
    > +of sectors for discard/write zeroes commands, \field{unmap} only used
    > +for write zeroes command, which means write zeroes to specified sectors
    > +while keep these sectors discarded.

    -----
    The \field{sector} number indicates the offset (multiplied by 512) where
    the read or write is to occur. This field is unused and set to 0
    commands other than read or write.

    The \field{data} used for discard or write zeroes command is
    described by one or more virtio_blk_discard_write_zeroes structs.
    \field{sector} indicates the starting sector of the segment, while
    \field{nlba} indicates the number of sectors in each discarded range.
    \field{unmap} is only used for write zeroes command.
    -----

    Please add the following to "Device Requirements: Device Operation":

    -----
    The device MUST set the \emph{status} byte to VIRTIO_BLK_S_UNSUPP for
    discard and write zeroes commands if any unknown flag is set.
    Furthermore, the device MUST set the \emph{status} byte to
    VIRTIO_BLK_S_UNSUPP for discard commands if the \field{unmap} flag is set.

    For discard commands, the device MAY deallocate the specified range of
    sectors in the device backend storage.

    For write zeroes commands, if the \field{unmap} is set, the device MAY
    deallocate the specified range of sectors in the device backend storage,
    as if the DISCARD command had been sent. After a write zeroes command
    is completed, reads of the specified ranges of sectors MUST return
    zeroes. This is true independent of whether \field{unmap} was set or clear.
    -----

    Please add this to "Device Requirements: Driver Operation", too:

    -----
    The \field{unmap} bit MUST be zero for discard commands. The driver
    MUST NOT assume anything about the data returned by read requests after
    a range of sectors has been discarded.
    -----

    If we added the "write_zeroes_may_unmap" bit, something like this would
    go in "Device Requirements: Device Operation":

    ----
    The device SHOULD clear the \field{write_zeroes_may_unmap} field of the
    virtio configuration space if and only if a write zeroes request cannot
    result in deallocating one or more sectors. The device MAY change the
    content of the field during operation of the device; when this happens,
    the device SHOULD trigger a configuration change interrupt.
    -----

    Thanks,

    Paolo

    > The final \field{status} byte is written by the device: either
    > VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver
    > error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device:
    >




  • 3.  Re: [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-26-2018 15:21
    On 26/02/2018 09:16, Changpeng Liu wrote: > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES support, > this will impact the performance when using SSD backend over file systems. > > Here is the proposal to extend existing virtio-blk protocol to support > DISCARD/WRITE ZEROES commands. > > Basic idea here is using 16 Bytes payload to support 1 descriptor, users > can put several segments together with 1 DISCARD command. > > struct virtio_blk_discard_write_zeroes { > le64 slba; > le32 nlba; > struct { > le32 unmap:1; > le32 reserved:31; > } flags; > }; > > For the purpose to support such feature, we need to introduce 2 new feature > flags: VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, and 2 new command > types: VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES. Also we introduce > 3 new parameters in the configuration space of virtio-blk: max_discard_sectors/ > max_discard_seg/max_write_zeroes_sectors. These parameters will tell the > OS what's the granularity when issuing such commands. > > If both DISCARD and WRITE ZEROES are supported, unmap flag bit maybe used > for WRITE ZEROES command with DISCARD bit enabled. > > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> > --- > content.tex 41 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/content.tex b/content.tex > index c7ef7fd..5555b18 100644 > --- a/content.tex > +++ b/content.tex > @@ -4127,6 +4127,13 @@ device except where noted. > > item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > and writethrough modes. > + > +item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maximum > + discard sectors size in field{max_discard_sectors} and maximum discard > + segment number in field{max_discard_seg}. > + > +item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes command, > + maximum write zeroes sectors size in field{max_write_zeroes_sectors}. > end{description} > > subsubsection{Legacy Interface: Feature bits}label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} > @@ -4170,6 +4177,9 @@ struct virtio_blk_config { > le32 opt_io_size; > } topology; > u8 writeback; Let's add three padding bytes here, for clarity. Please add a note that these three bytes MUST be zero in the "Device Requirements: Device Initialization" section. > + le32 max_discard_sectors; > + le32 max_discard_seg; Perhaps add here: le32 discard_sector_alignment; (and document it below)? > + le32 max_write_zeroes_sectors; Perhaps we can put a bit here saying whether write_zeroes will ever look at the field{unmap} bit? Like: u8 write_zeroes_may_unmap; > }; > end{lstlisting} > > @@ -4211,6 +4221,15 @@ according to the native endian of the guest rather than > after reset can be either writeback or writethrough. The actual > mode can be determined by reading field{writeback} after feature > negotiation. > + > +item If the VIRTIO_BLK_F_DISCARD feature is negotiated, > + field{max_discard_sectors} and field{max_discard_seg} can be read > + to determine the maximum discard sectors and maximum number of discard > + segments for the block driver to use. > + > +item if the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated, > + field{max_write_zeroes_sectors} can be read to determine the maximum > + write zeroes sectors for the block driver to use. So only one struct virtio_blk_discard_write_zeroes is possible? > end{enumerate} > > drivernormative{subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > @@ -4270,21 +4289,41 @@ struct virtio_blk_req { > u8 data[][512]; > u8 status; > }; > + > +struct virtio_blk_discard_write_zeroes { > + le64 slba; > + le32 nlba; le64 sector; le32 num_sectors; please (more consistent with the rest). > + struct { > + le32 unmap:1; > + le32 reserved:31; > + } flags; > +}; > end{lstlisting} > > The type of the request is either a read (VIRTIO_BLK_T_IN), a write > -(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH). > +(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes > +(VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH). > > egin{lstlisting} > #define VIRTIO_BLK_T_IN 0 > #define VIRTIO_BLK_T_OUT 1 > #define VIRTIO_BLK_T_FLUSH 4 > +#define VIRTIO_BLK_T_DISCARD 16 > +#define VIRTIO_BLK_T_WRITE_ZEROES 32 > end{lstlisting} Bit 0 defines the command direction, which should be 1, while the command number is in bits 1-30. Since the highest-numbered command is currently 8, we can use 11 and 13 respectively for discard and write zeroes. > The field{sector} number indicates the offset (multiplied by 512) where > the read or write is to occur. This field is unused and set to 0 > for scsi packet commands and for flush commands. > > +The field{data} used for discard or write zeroes command can be > +described by structure virtio_blk_discard_write_zeroes, while here, > +field{sector} is not used for the two commands. field{slba} indicates > +the starting sector of the segment, field{nlba} indicates the number > +of sectors for discard/write zeroes commands, field{unmap} only used > +for write zeroes command, which means write zeroes to specified sectors > +while keep these sectors discarded. ----- The field{sector} number indicates the offset (multiplied by 512) where the read or write is to occur. This field is unused and set to 0 commands other than read or write. The field{data} used for discard or write zeroes command is described by one or more virtio_blk_discard_write_zeroes structs. field{sector} indicates the starting sector of the segment, while field{nlba} indicates the number of sectors in each discarded range. field{unmap} is only used for write zeroes command. ----- Please add the following to "Device Requirements: Device Operation": ----- The device MUST set the emph{status} byte to VIRTIO_BLK_S_UNSUPP for discard and write zeroes commands if any unknown flag is set. Furthermore, the device MUST set the emph{status} byte to VIRTIO_BLK_S_UNSUPP for discard commands if the field{unmap} flag is set. For discard commands, the device MAY deallocate the specified range of sectors in the device backend storage. For write zeroes commands, if the field{unmap} is set, the device MAY deallocate the specified range of sectors in the device backend storage, as if the DISCARD command had been sent. After a write zeroes command is completed, reads of the specified ranges of sectors MUST return zeroes. This is true independent of whether field{unmap} was set or clear. ----- Please add this to "Device Requirements: Driver Operation", too: ----- The field{unmap} bit MUST be zero for discard commands. The driver MUST NOT assume anything about the data returned by read requests after a range of sectors has been discarded. ----- If we added the "write_zeroes_may_unmap" bit, something like this would go in "Device Requirements: Device Operation": ---- The device SHOULD clear the field{write_zeroes_may_unmap} field of the virtio configuration space if and only if a write zeroes request cannot result in deallocating one or more sectors. The device MAY change the content of the field during operation of the device; when this happens, the device SHOULD trigger a configuration change interrupt. ----- Thanks, Paolo > The final field{status} byte is written by the device: either > VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver > error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device: >


  • 4.  RE: [virtio-dev] Re: [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-27-2018 05:46


    >


  • 5.  Re: [virtio-dev] Re: [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-27-2018 08:42
    On 27/02/2018 06:45, Liu, Changpeng wrote:
    >> Perhaps we can put a bit here saying whether write_zeroes will ever look
    >> at the \field{unmap} bit? Like:
    >>
    >> u8 write_zeroes_may_unmap;
    >
    > I do take this into consideration, if write zeroes is supported and discard is not supported,
    > even the OS send write zeroes command with unmap bit enabled, the backend will also
    > ignore this bit, so the configuration parameter seems useless.

    The reason is that it lets the guest OS know if WRITE ZEROES with UNMAP
    will be fast or not. It's not super important, but it is easy enough to
    add and also easy enough to fill in for the host.

    Paolo



  • 6.  Re: [virtio-dev] Re: [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-27-2018 08:42
    On 27/02/2018 06:45, Liu, Changpeng wrote: >> Perhaps we can put a bit here saying whether write_zeroes will ever look >> at the field{unmap} bit? Like: >> >> u8 write_zeroes_may_unmap; > > I do take this into consideration, if write zeroes is supported and discard is not supported, > even the OS send write zeroes command with unmap bit enabled, the backend will also > ignore this bit, so the configuration parameter seems useless. The reason is that it lets the guest OS know if WRITE ZEROES with UNMAP will be fast or not. It's not super important, but it is easy enough to add and also easy enough to fill in for the host. Paolo


  • 7.  RE: [virtio-dev] Re: [RFC] virtio-blk: add discard and write zeroes features to specification

    Posted 02-28-2018 01:08


    >