virtio-comment

 View Only
Expand all | Collapse all

[PATCH v3] virtio-blk: add discard and write zeroes features to specification

  • 1.  [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-06-2018 02:50
    Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES support,
    this will impact the performance when using SSD backend over file systems.

    Here is the proposal to extend existing virtio-blk protocol to support
    DISCARD/WRITE ZEROES commands.

    Basic idea here is using 16 Bytes payload to support 1 descriptor, users
    can put several segments together with 1 DISCARD/WRITE ZEROES command.

    struct virtio_blk_discard_write_zeroes {
    le64 sector;
    le32 num_sectors;
    struct {
    le32 unmap:1;
    le32 reserved:31;
    } flags;
    };

    For the purpose to support such feature, we need to introduce 2 new feature
    flags: VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, and 2 new command
    types: VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES. Also we introduce
    several new parameters in the configuration space of virtio-blk:
    max_discard_sectors/max_discard_seg/max_write_zeroes_sectors.
    These parameters will tell the OS what's the granularity when
    issuing such commands.

    If both DISCARD and WRITE ZEROES are supported, unmap flag bit maybe used
    for WRITE ZEROES command with DISCARD bit enabled.

    Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
    ---
    content.tex | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
    1 file changed, 81 insertions(+), 3 deletions(-)

    diff --git a/content.tex b/content.tex
    index c7ef7fd..c4b3190 100644
    --- a/content.tex
    +++ b/content.tex
    @@ -4127,6 +4127,14 @@ device except where noted.

    \item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
    and writethrough modes.
    +
    +\item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maximum
    + discard sectors size in \field{max_discard_sectors} and maximum discard
    + segment number in \field{max_discard_seg}.
    +
    +\item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes command,
    + maximum write zeroes sectors size in \field{max_write_zeroes_sectors} and
    + maximum write zeroes segment number in \field{max_write_zeroes_seg}.
    \end{description}

    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
    @@ -4148,6 +4156,12 @@ The \field{capacity} of the device (expressed in 512-byte sectors) is always
    present. The availability of the others all depend on various feature
    bits as indicated above.

    +The parameters in the configuration space of the device \field{max_discard_sectors}
    +\field{discard_sector_alignment} are expressed in 512-bytes sectors if the
    +VIRTIO_BLK_F_DISCARD feature bit is negotiated. The \field{max_write_zeroes_sectors}
    +is expressed in 512-byte sectors if the VIRTIO_BLK_F_WRITE_ZEROES feature
    +bit is negotiated.
    +
    \begin{lstlisting}
    struct virtio_blk_config {
    le64 capacity;
    @@ -4170,6 +4184,14 @@ struct virtio_blk_config {
    le32 opt_io_size;
    } topology;
    u8 writeback;
    + u8 unused0[3];
    + le32 max_discard_sectors;
    + le32 max_discard_seg;
    + le32 discard_sector_alignment;
    + le32 max_write_zeroes_sectors;
    + le32 max_write_zeroes_seg;
    + u8 write_zeroes_may_unmap;
    + u8 unused1[3];
    };
    \end{lstlisting}

    @@ -4211,6 +4233,17 @@ according to the native endian of the guest rather than
    after reset can be either writeback or writethrough. The actual
    mode can be determined by reading \field{writeback} after feature
    negotiation.
    +
    +\item If the VIRTIO_BLK_F_DISCARD feature is negotiated,
    + \field{max_discard_sectors} and \field{max_discard_seg} can be read
    + to determine the maximum discard sectors and maximum number of discard
    + segments for the block driver to use. \field{discard_sector_alignment}
    + can be used by OS when splitting a request based on alignment.
    +
    +\item if the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated,
    + \field{max_write_zeroes_sectors} and \field{max_write_zeroes_seg} can
    + be read to determine the maximum write zeroes sectors and maximum
    + number of write zeroes segments for the block driver to use.
    \end{enumerate}

    \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
    @@ -4234,6 +4267,9 @@ if they offer VIRTIO_BLK_F_CONFIG_WCE.
    If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH
    is not, the device MUST initialize \field{writeback} to 0.

    +The device MUST initialize padding bytes \field{unused0} and
    +\field{unused1} to 0.
    +
    \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}

    Because legacy devices do not have FEATURES_OK, transitional devices
    @@ -4270,20 +4306,38 @@ struct virtio_blk_req {
    u8 data[][512];
    u8 status;
    };
    +
    +struct virtio_blk_discard_write_zeroes {
    + le64 sector;
    + le32 num_sectors;
    + struct {
    + le32 unmap:1;
    + le32 reserved:31;
    + } flags;
    +};
    \end{lstlisting}

    The type of the request is either a read (VIRTIO_BLK_T_IN), a write
    -(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH).
    +(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes
    +(VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH).

    \begin{lstlisting}
    #define VIRTIO_BLK_T_IN 0
    #define VIRTIO_BLK_T_OUT 1
    #define VIRTIO_BLK_T_FLUSH 4
    +#define VIRTIO_BLK_T_DISCARD 11
    +#define VIRTIO_BLK_T_WRITE_ZEROES 13
    \end{lstlisting}

    The \field{sector} number indicates the offset (multiplied by 512) where
    -the read or write is to occur. This field is unused and set to 0
    -for scsi packet commands and for flush commands.
    +the read or write is to occur. This field is unused and set to 0 for
    +commands other than read or write.
    +
    +The \field{data} used for discard or write zeroes command is described
    +by one or more virtio_blk_discard_write_zeroes structs. \field{sector}
    +indicates the starting offset (multiplied by 512) of the segment, while
    +\field{num_sectors} indicates the number of sectors in each discarded
    +range. \field{unmap} is only used for write zeroes command.

    The final \field{status} byte is written by the device: either
    VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver
    @@ -4311,12 +4365,36 @@ switch to writethrough or writeback mode by writing respectively 0 and
    the driver MUST NOT assume that any volatile writes have been committed
    to persistent device backend storage.

    +The \field{unmap} bit MUST be zero for discard commands. The driver
    +MUST NOT assume anything about the data returned by read requests after
    +a range of sectors has been discarded.
    +
    \devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}

    A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
    for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
    write any data.

    +The device MUST set the \field{status} byte to VIRTIO_BLK_S_UNSUPP for
    +discard and write zeroes commands if any unknown flag is set.
    +Furthermore, the device MUST set the \field{status} byte to
    +VIRTIO_BLK_S_UNSUPP for discard commands if the \field{unmap} flag is set.
    +
    +For discard commands, the device MAY deallocate the specified range of
    +sectors in the device backend storage.
    +
    +For write zeroes commands, if the \field{unmap} is set, the device MAY
    +deallocate the specified range of sectors in the device backend storage,
    +as if the DISCARD command had been sent. After a write zeroes command
    +is completed, reads of the specified ranges of sectors MUST return
    +zeroes. This is true independent of whether \field{unmap} was set or clear.
    +
    +The device SHOULD clear the \field{write_zeroes_may_unmap} field of the
    +virtio configuration space if and only if a write zeroes request cannot
    +result in deallocating one or more sectors. The device MAY change the
    +content of the field during operation of the device; when this happens,
    +the device SHOULD trigger a configuration change interrupt.
    +
    A write is considered volatile when it is submitted; the contents of
    sectors covered by a volatile write are undefined in persistent device
    backend storage until the write becomes stable. A write becomes stable
    --
    1.9.3




  • 2.  Re: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-07-2018 11:29
    On Tue, Mar 06, 2018 at 10:50:20AM +0800, Changpeng Liu wrote:
    > +The \field{data} used for discard or write zeroes command is described
    > +by one or more virtio_blk_discard_write_zeroes structs. \field{sector}
    > +indicates the starting offset (multiplied by 512) of the segment, while

    Did you mean "divided by 512 bytes" instead of "multiplied by 512"?



  • 3.  Re: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-07-2018 11:29
    On Tue, Mar 06, 2018 at 10:50:20AM +0800, Changpeng Liu wrote:
    > +The \field{data} used for discard or write zeroes command is described
    > +by one or more virtio_blk_discard_write_zeroes structs. \field{sector}
    > +indicates the starting offset (multiplied by 512) of the segment, while

    Did you mean "divided by 512 bytes" instead of "multiplied by 512"?



  • 4.  RE: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-08-2018 01:01


    >


  • 5.  Re: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-08-2018 13:05
    On 08/03/2018 02:01, Liu, Changpeng wrote:
    >
    >
    >>


  • 6.  Re: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-08-2018 13:05
    On 08/03/2018 02:01, Liu, Changpeng wrote:
    >
    >
    >>


  • 7.  RE: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-09-2018 02:31


    >


  • 8.  RE: [virtio-dev] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-08-2018 01:01


    >


  • 9.  Re: [virtio-comment] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-12-2018 16:20
    Hi Changpeng,

    On 03/06/18 03:50, Changpeng Liu wrote:
    > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES support,
    > this will impact the performance when using SSD backend over file systems.
    >
    > Here is the proposal to extend existing virtio-blk protocol to support
    > DISCARD/WRITE ZEROES commands.
    >
    > Basic idea here is using 16 Bytes payload to support 1 descriptor, users
    > can put several segments together with 1 DISCARD/WRITE ZEROES command.
    >
    > struct virtio_blk_discard_write_zeroes {
    > le64 sector;
    > le32 num_sectors;
    > struct {
    > le32 unmap:1;
    > le32 reserved:31;
    > } flags;
    > };
    >
    > For the purpose to support such feature, we need to introduce 2 new feature
    > flags: VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, and 2 new command
    > types: VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES. Also we introduce
    > several new parameters in the configuration space of virtio-blk:
    > max_discard_sectors/max_discard_seg/max_write_zeroes_sectors.
    > These parameters will tell the OS what's the granularity when
    > issuing such commands.
    >
    > If both DISCARD and WRITE ZEROES are supported, unmap flag bit maybe used
    > for WRITE ZEROES command with DISCARD bit enabled.
    >
    > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
    > ---
    > content.tex | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
    > 1 file changed, 81 insertions(+), 3 deletions(-)

    virtio-scsi already supports discard and write zeroes, and virtio-scsi
    can be backed by SSDs. Can you please describe the use case in the
    commit message (in a few words) that needs discard / write zeroes but is
    ill-served by virtio-scsi?

    I believe the spec (and implementations) should be conservative about
    feature duplication unless there's a good reason for it. I don't doubt
    you have a good reason; can you please name it in the commit message?
    I'm not questioning your motives, I'd just like to see them documented
    in the commit log.

    Thank you,
    Laszlo



  • 10.  Re: [virtio-comment] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-12-2018 16:29
    On 12/03/2018 17:19, Laszlo Ersek wrote:
    > virtio-scsi already supports discard and write zeroes, and virtio-scsi
    > can be backed by SSDs. Can you please describe the use case in the
    > commit message (in a few words) that needs discard / write zeroes but is
    > ill-served by virtio-scsi?
    >
    > I believe the spec (and implementations) should be conservative about
    > feature duplication unless there's a good reason for it. I don't doubt
    > you have a good reason; can you please name it in the commit message?
    > I'm not questioning your motives, I'd just like to see them documented
    > in the commit log.

    Hi Laszlo, the basic reason is that virtio-scsi is slower than
    virtio-blk. :)

    Indeed DISCARD/WRITE ZEROES was a reason to use virtio-scsi instead of
    virtio-blk back when it was created, but at the time DISCARD was
    considered a thin provisioning feature; that is, the idea was that
    guests could cooperate with the host to save host disk space and to
    speed up maintenance operation such as live disk migration.

    Nowadays, DISCARD has evolved into a basic disk maintenance operation
    and IMO it fits within the scope of virtio-blk, as the backend of choice
    for high performance I/O devices. (Though I wouldn't be surprised
    however if, in a few years, virtio-blk has become obsolete and we'll
    just use NVMe. Recent versions of the NVMe spec have taken some
    inspiration from Xen and virtio ring buffers, and perform great as
    virtual devices; this is a similar "convergence" to what's happening in
    virtio for the new packed ring format, just in the opposite direction).

    Paolo



  • 11.  Re: [virtio-comment] [PATCH v3] virtio-blk: add discard and write zeroes features to specification

    Posted 03-12-2018 16:29
    On 12/03/2018 17:19, Laszlo Ersek wrote:
    > virtio-scsi already supports discard and write zeroes, and virtio-scsi
    > can be backed by SSDs. Can you please describe the use case in the
    > commit message (in a few words) that needs discard / write zeroes but is
    > ill-served by virtio-scsi?
    >
    > I believe the spec (and implementations) should be conservative about
    > feature duplication unless there's a good reason for it. I don't doubt
    > you have a good reason; can you please name it in the commit message?
    > I'm not questioning your motives, I'd just like to see them documented
    > in the commit log.

    Hi Laszlo, the basic reason is that virtio-scsi is slower than
    virtio-blk. :)

    Indeed DISCARD/WRITE ZEROES was a reason to use virtio-scsi instead of
    virtio-blk back when it was created, but at the time DISCARD was
    considered a thin provisioning feature; that is, the idea was that
    guests could cooperate with the host to save host disk space and to
    speed up maintenance operation such as live disk migration.

    Nowadays, DISCARD has evolved into a basic disk maintenance operation
    and IMO it fits within the scope of virtio-blk, as the backend of choice
    for high performance I/O devices. (Though I wouldn't be surprised
    however if, in a few years, virtio-blk has become obsolete and we'll
    just use NVMe. Recent versions of the NVMe spec have taken some
    inspiration from Xen and virtio ring buffers, and perform great as
    virtual devices; this is a similar "convergence" to what's happening in
    virtio for the new packed ring format, just in the opposite direction).

    Paolo