OASIS Virtual I/O Device (VIRTIO) TC

 View Only
Expand all | Collapse all

[PATCH requirements 0/7] virtio net new features requirements

  • 1.  [PATCH requirements 0/7] virtio net new features requirements

    Posted 07-24-2023 03:35
    Hi All, This document captures the virtio net device requirements for the upcoming release 1.4 that some of us are currently working on. This is live document to be updated in coming time and work towards it for its design which can result in a draft specification. The objectives are: 1. to consider these requirements in introducing new features listed in the document and otherwise and work towards the interface design followed by drafting the specification changes. 2. Define practical list of requirements that can be achieved in 1.4 timeframe incrementally and also have the ability to implement them. Please review mainly patch 5 at the priority. Receive flow filters is the first item apart from counters to complete in this iteration to start drafting the design spec. Rest of the requirements are largly untouched other than Stefan's comment. TODO: 1. Some more refinement needed for rx low latency and header data split requirements. 2. counters requirements not yet up to date to match the discussion --- changelog: v2->v3: - addressed comments from Stefan for tx low latency and notification - redrafted the requirements to use rearm term and avoid queue enable confusion for notification - addressed all comments and refined receive flow filters requirements to take to design level v1->v2: - major update of receive flow filter requirements updated based on last two design discussions in community and offline research - examples added - link to use case and design goal added - control and operation side requirements split - more verbose v0->v1: - addressed comments from Heng Li - addressed few (not all) comments from Michael - per patch changelog Parav Pandit (7): net-features: Add requirements document for release 1.4 net-features: Add low latency transmit queue requirements net-features: Add low latency receive queue requirements net-features: Add notification coalescing requirements net-features: Add n-tuple receive flow filters requirements net-features: Add packet timestamp requirements net-features: Add header data split requirements net-workstream/features-1.4.md 321 +++++++++++++++++++++++++++++++++ 1 file changed, 321 insertions(+) create mode 100644 net-workstream/features-1.4.md -- 2.26.2


  • 2.  [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 07-24-2023 03:35
    Add requirements for the low latency transmit queue. Signed-off-by: Parav Pandit <parav@nvidia.com> --- chagelog: v1->v2: - added generic requirement to inline the request content along with the descriptor for non virtio-net devices - added requirement to inline the request content along with the descriptor for virtio flow filter queue as two features are similar v0->v1: - added design goals for which requirements are added --- net-workstream/features-1.4.md 88 ++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 4c3797b..eb95592 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -7,6 +7,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver +2. Low latency tx virtqueue for PCI transport # 3. Requirements ## 3.1 Device counters @@ -33,3 +34,90 @@ together is desired while updating the virtio net interface. ### 3.1.2 Per transmit queue counters 1. le64 tx_gso_pkts: Packets send as transmit GSO sequence 2. le64 tx_pkts: Total packets send by the device + +## 3.2 Low PCI latency virtqueues +### 3.2.1 Low PCI latency tx virtqueue +0. Design goal + a. Reduce PCI access latency in packet transmit flow + b. Avoid O(N) descriptor parser to detect a packet stream to simplify device + logic + c. Reduce number of PCI transmit completion transactions and have unified + completion flow with/without transmit timestamping + d. Avoid partial cache line writes on transmit completions + +1. Packet transmit descriptor should contain data descriptors count without any + indirection and without any O(N) search to find the end of a packet stream. + For example, a packet transmit descriptor (called vnet_tx_hdr_desc + subsequently) to contain a field num_next_desc for the packet stream + indicating that a packet is located in N data descriptors. + +2. Packet transmit descriptor should contain segmentation offload-related fields + without any indirection. For example, packet transmit descriptor to contain + gso_type, gso_size/mss, header length, csum placement byte offset, and + csum start. + +3. Packet transmit descriptor should be able to place a small size packet that + does not have any L4 data after the vnet_tx_hdr_desc in the virtqueue memory. + For example a TCP ack only packet can fit in a descriptor memory which + otherwise consume more than 25% of metadata to describe the packet. + +4. Packet transmit descriptor should be able to place a full GSO header (L2 to + L4) after header descriptor and before data descriptors. For example, the + GSO header is placed after struct vnet_tx_hdr_desc in the virtqueue memory. + When such a GSO header is positioned adjacent to the packet transmit + descriptor, and when the GSO header is not aligned to 16B, the following + data descriptor to start on the 8B aligned boundary. + +5. An example of the above requirements at high level is: + +``` +struct vitio_packed_q_desc { + /* current desc for reference */ + u64 address; + u32 len; + u16 id; + u16 flags; +}; + +/* Constant size header descriptor for tx packets */ +struct vnet_tx_hdr_desc { + u16 flags; /* indicate how to parse next fields */ + u16 id; /* desc id to come back in completion */ + u8 num_next_desc; /* indicates the number of the next 16B data desc for this + * buffer. + */ + u8 gso_type; + le16 gso_hdr_len; + le16 gso_size; + le16 csum_start; + le16 csum_offset; + u8 inline_pkt_len; /* indicates the length of the inline packet after this + * desc + */ + u8 reserved; + u8 padding[]; +}; + +/* Example of a short packet or GSO header placed in the desc section of the vq + */ +struct vnet_tx_small_pkt_desc { + u8 raw_pkt[128]; +}; + +/* Example of header followed by data descriptor */ +struct vnet_tx_hdr_desc hdr_desc; +struct vnet_data_desc desc[2]; + +``` + +6. Ability to zero pad the transmit completion when the transmit completion is + shorter than the CPU cache line size. + +7. Ability to place all transmit completion together with it per packet stream + transmit timestamp using single PCIe transcation. + +8. A generic feature of the virtqueue, to contain such header data inline for virtio + devices other than virtio-net. + +9. A flow filter virtqueue also similarly need the ability to inline the short flow + command header. -- 2.26.2


  • 3.  Re: [virtio-comment] [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 08-08-2023 08:25



  • 4.  Re: [virtio-comment] [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 08-08-2023 08:32



  • 5.  RE: [EXT] [virtio] [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 08-10-2023 19:05
    Hi Parav

    >


  • 6.  RE: [EXT] [virtio] [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 08-10-2023 19:05
    Hi Parav >


  • 7.  Re: [virtio-comment] [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 08-14-2023 11:56
    On Monday, 2023-07-24 at 06:34:16 +03, Parav Pandit wrote:
    > Add requirements for the low latency transmit queue.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > chagelog:
    > v1->v2:
    > - added generic requirement to inline the request content
    > along with the descriptor for non virtio-net devices
    > - added requirement to inline the request content along
    > with the descriptor for virtio flow filter queue as two
    > features are similar
    > v0->v1:
    > - added design goals for which requirements are added
    > ---
    > net-workstream/features-1.4.md | 88 ++++++++++++++++++++++++++++++++++
    > 1 file changed, 88 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 4c3797b..eb95592 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -7,6 +7,7 @@ together is desired while updating the virtio net interface.
    >
    > # 2. Summary
    > 1. Device counters visible to the driver
    > +2. Low latency tx virtqueue for PCI transport
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -33,3 +34,90 @@ together is desired while updating the virtio net interface.
    > ### 3.1.2 Per transmit queue counters
    > 1. le64 tx_gso_pkts: Packets send as transmit GSO sequence
    > 2. le64 tx_pkts: Total packets send by the device
    > +
    > +## 3.2 Low PCI latency virtqueues
    > +### 3.2.1 Low PCI latency tx virtqueue
    > +0. Design goal
    > + a. Reduce PCI access latency in packet transmit flow
    > + b. Avoid O(N) descriptor parser to detect a packet stream to simplify device
    > + logic
    > + c. Reduce number of PCI transmit completion transactions and have unified
    > + completion flow with/without transmit timestamping
    > + d. Avoid partial cache line writes on transmit completions
    > +
    > +1. Packet transmit descriptor should contain data descriptors count without any
    > + indirection and without any O(N) search to find the end of a packet stream.
    > + For example, a packet transmit descriptor (called vnet_tx_hdr_desc
    > + subsequently) to contain a field num_next_desc for the packet stream
    > + indicating that a packet is located in N data descriptors.
    > +
    > +2. Packet transmit descriptor should contain segmentation offload-related fields
    > + without any indirection. For example, packet transmit descriptor to contain
    > + gso_type, gso_size/mss, header length, csum placement byte offset, and
    > + csum start.
    > +
    > +3. Packet transmit descriptor should be able to place a small size packet that
    > + does not have any L4 data after the vnet_tx_hdr_desc in the virtqueue memory.
    > + For example a TCP ack only packet can fit in a descriptor memory which
    > + otherwise consume more than 25% of metadata to describe the packet.
    > +
    > +4. Packet transmit descriptor should be able to place a full GSO header (L2 to
    > + L4) after header descriptor and before data descriptors. For example, the
    > + GSO header is placed after struct vnet_tx_hdr_desc in the virtqueue memory.
    > + When such a GSO header is positioned adjacent to the packet transmit
    > + descriptor, and when the GSO header is not aligned to 16B, the following
    > + data descriptor to start on the 8B aligned boundary.
    > +
    > +5. An example of the above requirements at high level is:
    > +
    > +```
    > +struct vitio_packed_q_desc {

    "virtio_packed_q_desc"

    > + /* current desc for reference */
    > + u64 address;
    > + u32 len;
    > + u16 id;
    > + u16 flags;
    > +};
    > +
    > +/* Constant size header descriptor for tx packets */
    > +struct vnet_tx_hdr_desc {
    > + u16 flags; /* indicate how to parse next fields */
    > + u16 id; /* desc id to come back in completion */
    > + u8 num_next_desc; /* indicates the number of the next 16B data desc for this
    > + * buffer.
    > + */
    > + u8 gso_type;
    > + le16 gso_hdr_len;
    > + le16 gso_size;
    > + le16 csum_start;
    > + le16 csum_offset;
    > + u8 inline_pkt_len; /* indicates the length of the inline packet after this
    > + * desc
    > + */
    > + u8 reserved;
    > + u8 padding[];
    > +};
    > +
    > +/* Example of a short packet or GSO header placed in the desc section of the vq
    > + */
    > +struct vnet_tx_small_pkt_desc {
    > + u8 raw_pkt[128];
    > +};
    > +
    > +/* Example of header followed by data descriptor */
    > +struct vnet_tx_hdr_desc hdr_desc;
    > +struct vnet_data_desc desc[2];
    > +
    > +```
    > +
    > +6. Ability to zero pad the transmit completion when the transmit completion is
    > + shorter than the CPU cache line size.
    > +
    > +7. Ability to place all transmit completion together with it per packet stream
    > + transmit timestamp using single PCIe transcation.

    The meaning of this is unclear to me. Is it:

    The ability to place all transmit completions with a per-packet stream
    transmit timestamp using a single PCIe transaction.

    ?

    > +
    > +8. A generic feature of the virtqueue, to contain such header data inline for virtio
    > + devices other than virtio-net.

    Given that this feature is used by this patch (for TX), the following
    patch (for RX) and flow filter manipulation, perhaps pull it out as a
    distinct requirement.

    > +
    > +9. A flow filter virtqueue also similarly need the ability to inline the short flow
    > + command header.
    --
    So tap at my window, maybe I might let you in.



  • 8.  Re: [virtio-comment] [PATCH requirements 2/7] net-features: Add low latency transmit queue requirements

    Posted 08-14-2023 11:56
    On Monday, 2023-07-24 at 06:34:16 +03, Parav Pandit wrote: > Add requirements for the low latency transmit queue. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > chagelog: > v1->v2: > - added generic requirement to inline the request content > along with the descriptor for non virtio-net devices > - added requirement to inline the request content along > with the descriptor for virtio flow filter queue as two > features are similar > v0->v1: > - added design goals for which requirements are added > --- > net-workstream/features-1.4.md 88 ++++++++++++++++++++++++++++++++++ > 1 file changed, 88 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 4c3797b..eb95592 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -7,6 +7,7 @@ together is desired while updating the virtio net interface. > > # 2. Summary > 1. Device counters visible to the driver > +2. Low latency tx virtqueue for PCI transport > > # 3. Requirements > ## 3.1 Device counters > @@ -33,3 +34,90 @@ together is desired while updating the virtio net interface. > ### 3.1.2 Per transmit queue counters > 1. le64 tx_gso_pkts: Packets send as transmit GSO sequence > 2. le64 tx_pkts: Total packets send by the device > + > +## 3.2 Low PCI latency virtqueues > +### 3.2.1 Low PCI latency tx virtqueue > +0. Design goal > + a. Reduce PCI access latency in packet transmit flow > + b. Avoid O(N) descriptor parser to detect a packet stream to simplify device > + logic > + c. Reduce number of PCI transmit completion transactions and have unified > + completion flow with/without transmit timestamping > + d. Avoid partial cache line writes on transmit completions > + > +1. Packet transmit descriptor should contain data descriptors count without any > + indirection and without any O(N) search to find the end of a packet stream. > + For example, a packet transmit descriptor (called vnet_tx_hdr_desc > + subsequently) to contain a field num_next_desc for the packet stream > + indicating that a packet is located in N data descriptors. > + > +2. Packet transmit descriptor should contain segmentation offload-related fields > + without any indirection. For example, packet transmit descriptor to contain > + gso_type, gso_size/mss, header length, csum placement byte offset, and > + csum start. > + > +3. Packet transmit descriptor should be able to place a small size packet that > + does not have any L4 data after the vnet_tx_hdr_desc in the virtqueue memory. > + For example a TCP ack only packet can fit in a descriptor memory which > + otherwise consume more than 25% of metadata to describe the packet. > + > +4. Packet transmit descriptor should be able to place a full GSO header (L2 to > + L4) after header descriptor and before data descriptors. For example, the > + GSO header is placed after struct vnet_tx_hdr_desc in the virtqueue memory. > + When such a GSO header is positioned adjacent to the packet transmit > + descriptor, and when the GSO header is not aligned to 16B, the following > + data descriptor to start on the 8B aligned boundary. > + > +5. An example of the above requirements at high level is: > + > +``` > +struct vitio_packed_q_desc { "virtio_packed_q_desc" > + /* current desc for reference */ > + u64 address; > + u32 len; > + u16 id; > + u16 flags; > +}; > + > +/* Constant size header descriptor for tx packets */ > +struct vnet_tx_hdr_desc { > + u16 flags; /* indicate how to parse next fields */ > + u16 id; /* desc id to come back in completion */ > + u8 num_next_desc; /* indicates the number of the next 16B data desc for this > + * buffer. > + */ > + u8 gso_type; > + le16 gso_hdr_len; > + le16 gso_size; > + le16 csum_start; > + le16 csum_offset; > + u8 inline_pkt_len; /* indicates the length of the inline packet after this > + * desc > + */ > + u8 reserved; > + u8 padding[]; > +}; > + > +/* Example of a short packet or GSO header placed in the desc section of the vq > + */ > +struct vnet_tx_small_pkt_desc { > + u8 raw_pkt[128]; > +}; > + > +/* Example of header followed by data descriptor */ > +struct vnet_tx_hdr_desc hdr_desc; > +struct vnet_data_desc desc[2]; > + > +``` > + > +6. Ability to zero pad the transmit completion when the transmit completion is > + shorter than the CPU cache line size. > + > +7. Ability to place all transmit completion together with it per packet stream > + transmit timestamp using single PCIe transcation. The meaning of this is unclear to me. Is it: The ability to place all transmit completions with a per-packet stream transmit timestamp using a single PCIe transaction. ? > + > +8. A generic feature of the virtqueue, to contain such header data inline for virtio > + devices other than virtio-net. Given that this feature is used by this patch (for TX), the following patch (for RX) and flow filter manipulation, perhaps pull it out as a distinct requirement. > + > +9. A flow filter virtqueue also similarly need the ability to inline the short flow > + command header. -- So tap at my window, maybe I might let you in.


  • 9.  [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 07-24-2023 03:35
    Add requirements document template for the virtio net features. Add virtio net device counters visible to driver. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v0->v1: - removed tx dropped counter - updated requirements to mention about virtqueue interface for counters query --- net-workstream/features-1.4.md 35 ++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 net-workstream/features-1.4.md diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md new file mode 100644 index 0000000..4c3797b --- /dev/null +++ b/net-workstream/features-1.4.md @@ -0,0 +1,35 @@ +# 1. Introduction + +This document describes the overall requirements for virtio net device +improvements for upcoming release 1.4. Some of these requirements are +interrelated and influence the interface design, hence reviewing them +together is desired while updating the virtio net interface. + +# 2. Summary +1. Device counters visible to the driver + +# 3. Requirements +## 3.1 Device counters +1. The driver should be able to query the device and/or per vq counters for + debugging purpose using a virtqueue directly from driver to device for + example using a control vq. +2. The driver should be able to query which counters are supported using a + virtqueue command, for example using an existing control vq. +3. If this device is migrated between two hosts, the driver should be able + get the counter values in the destination host from where it was left + off in the source host. +4. If a virtio device is group member device, a group owner should be able + to query all the counter attributes using the administration command which + a virtio member device will expose via a virtqueue to the driver. + +### 3.1.1 Per receive queue counters +1. le64 rx_oversize_pkt_errors: Packet dropped due to receive packet being + oversize than the buffer size +2. le64 rx_no_buffer_pkt_errors: Packet dropped due to unavailability of the + buffer in the receive queue +3. le64 rx_gro_pkts: Packets treated as receive GSO sequence by the device +4. le64 rx_pkts: Total packets received by the device + +### 3.1.2 Per transmit queue counters +1. le64 tx_gso_pkts: Packets send as transmit GSO sequence +2. le64 tx_pkts: Total packets send by the device -- 2.26.2


  • 10.  Re: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-08-2023 08:16



  • 11.  Re: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-08-2023 08:23



  • 12.  RE: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-14-2023 05:18
    Hi David,

    > From: David Edmondson <david.edmondson@oracle.com>
    > Sent: Tuesday, August 8, 2023 1:46 PM

    Something is wrong with all 3 replies from you.
    There is no message body in them.
    I thought it is my mailbox, but looking at the mailing list [1], it is also missing.

    Can you please reply your comments again?

    [1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00125.html




  • 13.  RE: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-14-2023 05:18
    Hi David, > From: David Edmondson <david.edmondson@oracle.com> > Sent: Tuesday, August 8, 2023 1:46 PM Something is wrong with all 3 replies from you. There is no message body in them. I thought it is my mailbox, but looking at the mailing list [1], it is also missing. Can you please reply your comments again? [1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00125.html


  • 14.  Re: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-14-2023 11:53

    On Monday, 2023-08-14 at 05:17:34 UTC, Parav Pandit wrote:
    > Hi David,
    >
    >> From: David Edmondson <david.edmondson@oracle.com>
    >> Sent: Tuesday, August 8, 2023 1:46 PM
    >
    > Something is wrong with all 3 replies from you.
    > There is no message body in them.
    > I thought it is my mailbox, but looking at the mailing list [1], it is also missing.
    >
    > Can you please reply your comments again?

    Apologies, I will resend. This will teach me to fiddle with the
    configuration of my mail client...

    > [1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00125.html
    >
    >
    > This publicly archived list offers a means to provide input to the
    > OASIS Virtual I/O Device (VIRTIO) TC.
    >
    > In order to verify user consent to the Feedback License terms and
    > to minimize spam in the list archive, subscription is required
    > before posting.
    >
    > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
    > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
    > List help: virtio-comment-help@lists.oasis-open.org
    > List archive: https://lists.oasis-open.org/archives/virtio-comment/
    > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
    > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
    > Committee: https://www.oasis-open.org/committees/virtio/
    > Join OASIS: https://www.oasis-open.org/join/
    --
    Seems I'm not alone at being alone.



  • 15.  Re: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-14-2023 11:54
    On Monday, 2023-08-14 at 05:17:34 UTC, Parav Pandit wrote: > Hi David, > >> From: David Edmondson <david.edmondson@oracle.com> >> Sent: Tuesday, August 8, 2023 1:46 PM > > Something is wrong with all 3 replies from you. > There is no message body in them. > I thought it is my mailbox, but looking at the mailing list [1], it is also missing. > > Can you please reply your comments again? Apologies, I will resend. This will teach me to fiddle with the configuration of my mail client... > [1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00125.html > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ -- Seems I'm not alone at being alone.


  • 16.  Re: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-14-2023 11:56
    On Monday, 2023-07-24 at 06:34:15 +03, Parav Pandit wrote:
    > Add requirements document template for the virtio net features.
    >
    > Add virtio net device counters visible to driver.

    Minor, but perhaps separate the introduction and the statistics into
    distinct changes.

    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v0->v1:
    > - removed tx dropped counter
    > - updated requirements to mention about virtqueue interface for counters
    > query
    > ---
    > net-workstream/features-1.4.md | 35 ++++++++++++++++++++++++++++++++++
    > 1 file changed, 35 insertions(+)
    > create mode 100644 net-workstream/features-1.4.md
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > new file mode 100644
    > index 0000000..4c3797b
    > --- /dev/null
    > +++ b/net-workstream/features-1.4.md
    > @@ -0,0 +1,35 @@
    > +# 1. Introduction
    > +
    > +This document describes the overall requirements for virtio net device
    > +improvements for upcoming release 1.4. Some of these requirements are
    > +interrelated and influence the interface design, hence reviewing them
    > +together is desired while updating the virtio net interface.
    > +
    > +# 2. Summary
    > +1. Device counters visible to the driver
    > +
    > +# 3. Requirements
    > +## 3.1 Device counters
    > +1. The driver should be able to query the device and/or per vq counters for
    > + debugging purpose using a virtqueue directly from driver to device for
    > + example using a control vq.
    > +2. The driver should be able to query which counters are supported using a
    > + virtqueue command, for example using an existing control vq.
    > +3. If this device is migrated between two hosts, the driver should be able
    > + get the counter values in the destination host from where it was left
    > + off in the source host.

    Isn't this really "if the driver is migrated"?

    I'm not sure of an obvious term for "the abstracted device that
    represents an actual device to which this driver is attached".

    > +4. If a virtio device is group member device, a group owner should be able
    > + to query all the counter attributes using the administration command which
    > + a virtio member device will expose via a virtqueue to the driver.

    Suggest:

    If a virtio device is a group member device, a group owner should be
    able to query all of the member device counter attributes and counters
    via the group owner device.

    > +
    > +### 3.1.1 Per receive queue counters
    > +1. le64 rx_oversize_pkt_errors: Packet dropped due to receive packet being
    > + oversize than the buffer size
    > +2. le64 rx_no_buffer_pkt_errors: Packet dropped due to unavailability of the
    > + buffer in the receive queue
    > +3. le64 rx_gro_pkts: Packets treated as receive GSO sequence by the device
    > +4. le64 rx_pkts: Total packets received by the device
    > +
    > +### 3.1.2 Per transmit queue counters
    > +1. le64 tx_gso_pkts: Packets send as transmit GSO sequence
    > +2. le64 tx_pkts: Total packets send by the device

    The patch from Xuan includes more than this - perhaps include them here
    so that we can debate the specifics?
    --
    Walking upside down in the sky, between the satellites passing by, I'm looking.



  • 17.  Re: [virtio-comment] [PATCH requirements 1/7] net-features: Add requirements document for release 1.4

    Posted 08-14-2023 11:57
    On Monday, 2023-07-24 at 06:34:15 +03, Parav Pandit wrote: > Add requirements document template for the virtio net features. > > Add virtio net device counters visible to driver. Minor, but perhaps separate the introduction and the statistics into distinct changes. > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v0->v1: > - removed tx dropped counter > - updated requirements to mention about virtqueue interface for counters > query > --- > net-workstream/features-1.4.md 35 ++++++++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > create mode 100644 net-workstream/features-1.4.md > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > new file mode 100644 > index 0000000..4c3797b > --- /dev/null > +++ b/net-workstream/features-1.4.md > @@ -0,0 +1,35 @@ > +# 1. Introduction > + > +This document describes the overall requirements for virtio net device > +improvements for upcoming release 1.4. Some of these requirements are > +interrelated and influence the interface design, hence reviewing them > +together is desired while updating the virtio net interface. > + > +# 2. Summary > +1. Device counters visible to the driver > + > +# 3. Requirements > +## 3.1 Device counters > +1. The driver should be able to query the device and/or per vq counters for > + debugging purpose using a virtqueue directly from driver to device for > + example using a control vq. > +2. The driver should be able to query which counters are supported using a > + virtqueue command, for example using an existing control vq. > +3. If this device is migrated between two hosts, the driver should be able > + get the counter values in the destination host from where it was left > + off in the source host. Isn't this really "if the driver is migrated"? I'm not sure of an obvious term for "the abstracted device that represents an actual device to which this driver is attached". > +4. If a virtio device is group member device, a group owner should be able > + to query all the counter attributes using the administration command which > + a virtio member device will expose via a virtqueue to the driver. Suggest: If a virtio device is a group member device, a group owner should be able to query all of the member device counter attributes and counters via the group owner device. > + > +### 3.1.1 Per receive queue counters > +1. le64 rx_oversize_pkt_errors: Packet dropped due to receive packet being > + oversize than the buffer size > +2. le64 rx_no_buffer_pkt_errors: Packet dropped due to unavailability of the > + buffer in the receive queue > +3. le64 rx_gro_pkts: Packets treated as receive GSO sequence by the device > +4. le64 rx_pkts: Total packets received by the device > + > +### 3.1.2 Per transmit queue counters > +1. le64 tx_gso_pkts: Packets send as transmit GSO sequence > +2. le64 tx_pkts: Total packets send by the device The patch from Xuan includes more than this - perhaps include them here so that we can debate the specifics? -- Walking upside down in the sky, between the satellites passing by, I'm looking.


  • 18.  [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 07-24-2023 03:35
    Add requirements for the low latency receive queue. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v0->v1: - clarified the requirements further - added line for the gro case - added design goals as the motivation for the requirements --- net-workstream/features-1.4.md 45 +++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index eb95592..e04727a 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver -2. Low latency tx virtqueue for PCI transport +2. Low latency tx and rx virtqueues for PCI transport # 3. Requirements ## 3.1 Device counters @@ -121,3 +121,46 @@ struct vnet_data_desc desc[2]; 9. A flow filter virtqueue also similarly need the ability to inline the short flow command header. + +### 3.2.2 Low latency rx virtqueue +0. Design goal: + a. Keep packet metadata and buffer data together which is consumed by driver + layer and make it available in a single cache line of cpu + b. Instead of having per packet descriptors which is complex to scale for + the device, supply the page directly to the device to consume it based + on packet size +1. The device should be able to write a packet receive completion that consists + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write + PCIe TLP. +2. The device should be able to perform DMA writes of multiple packets + completions in a single DMA transaction up to the PCIe maximum write limit + in a transaction. +3. The device should be able to zero pad packet write completion to align it to + 64B or CPU cache line size whenever possible. +4. An example of the above DMA completion structure: + +``` +/* Constant size receive packet completion */ +struct vnet_rx_completion { + u16 flags; + u16 id; /* buffer id */ + u8 gso_type; + u8 reserved[3]; + le16 gso_hdr_len; + le16 gso_size; + le16 csum_start; + le16 csum_offset; + u16 reserved2; + u64 timestamp; /* explained later */ + u8 padding[]; +}; +``` +5. The driver should be able to post constant-size buffer pages on a receive + queue which can be consumed by the device for an incoming packet of any size + from 64B to 9K bytes. +6. The device should be able to know the constant buffer size at receive + virtqueue level instead of per buffer level. +7. The device should be able to indicate when a full page buffer is consumed, + which can be recycled by the driver when the packets from the completed + page is fully consumed. +8. The device should be able to consume multiple pages for a receive GSO stream. -- 2.26.2


  • 19.  Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 08-08-2023 08:33



  • 20.  Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 08-08-2023 08:35



  • 21.  Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 08-14-2023 11:55

    On Monday, 2023-07-24 at 06:34:17 +03, Parav Pandit wrote:
    > Add requirements for the low latency receive queue.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v0->v1:
    > - clarified the requirements further
    > - added line for the gro case
    > - added design goals as the motivation for the requirements
    > ---
    > net-workstream/features-1.4.md | 45 +++++++++++++++++++++++++++++++++-
    > 1 file changed, 44 insertions(+), 1 deletion(-)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index eb95592..e04727a 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
    >
    > # 2. Summary
    > 1. Device counters visible to the driver
    > -2. Low latency tx virtqueue for PCI transport
    > +2. Low latency tx and rx virtqueues for PCI transport
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -121,3 +121,46 @@ struct vnet_data_desc desc[2];
    >
    > 9. A flow filter virtqueue also similarly need the ability to inline the short flow
    > command header.
    > +
    > +### 3.2.2 Low latency rx virtqueue
    > +0. Design goal:
    > + a. Keep packet metadata and buffer data together which is consumed by driver
    > + layer and make it available in a single cache line of cpu
    > + b. Instead of having per packet descriptors which is complex to scale for
    > + the device, supply the page directly to the device to consume it based
    > + on packet size

    Really "per packet descriptor buffers"?

    > +1. The device should be able to write a packet receive completion that consists
    > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
    > + PCIe TLP.
    > +2. The device should be able to perform DMA writes of multiple packets
    > + completions in a single DMA transaction up to the PCIe maximum write limit
    > + in a transaction.
    > +3. The device should be able to zero pad packet write completion to align it to
    > + 64B or CPU cache line size whenever possible.
    > +4. An example of the above DMA completion structure:
    > +
    > +```
    > +/* Constant size receive packet completion */
    > +struct vnet_rx_completion {
    > + u16 flags;
    > + u16 id; /* buffer id */
    > + u8 gso_type;
    > + u8 reserved[3];
    > + le16 gso_hdr_len;
    > + le16 gso_size;
    > + le16 csum_start;
    > + le16 csum_offset;
    > + u16 reserved2;
    > + u64 timestamp; /* explained later */
    > + u8 padding[];
    > +};
    > +```
    > +5. The driver should be able to post constant-size buffer pages on a receive
    > + queue which can be consumed by the device for an incoming packet of any size
    > + from 64B to 9K bytes.
    > +6. The device should be able to know the constant buffer size at receive
    > + virtqueue level instead of per buffer level.
    > +7. The device should be able to indicate when a full page buffer is consumed,
    > + which can be recycled by the driver when the packets from the completed
    > + page is fully consumed.

    s/is full consumed/are fully consumed/

    > +8. The device should be able to consume multiple pages for a receive GSO stream.
    --
    So tap at my window, maybe I might let you in.



  • 22.  Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 08-14-2023 11:55
    On Monday, 2023-07-24 at 06:34:17 +03, Parav Pandit wrote: > Add requirements for the low latency receive queue. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v0->v1: > - clarified the requirements further > - added line for the gro case > - added design goals as the motivation for the requirements > --- > net-workstream/features-1.4.md 45 +++++++++++++++++++++++++++++++++- > 1 file changed, 44 insertions(+), 1 deletion(-) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index eb95592..e04727a 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. > > # 2. Summary > 1. Device counters visible to the driver > -2. Low latency tx virtqueue for PCI transport > +2. Low latency tx and rx virtqueues for PCI transport > > # 3. Requirements > ## 3.1 Device counters > @@ -121,3 +121,46 @@ struct vnet_data_desc desc[2]; > > 9. A flow filter virtqueue also similarly need the ability to inline the short flow > command header. > + > +### 3.2.2 Low latency rx virtqueue > +0. Design goal: > + a. Keep packet metadata and buffer data together which is consumed by driver > + layer and make it available in a single cache line of cpu > + b. Instead of having per packet descriptors which is complex to scale for > + the device, supply the page directly to the device to consume it based > + on packet size Really "per packet descriptor buffers"? > +1. The device should be able to write a packet receive completion that consists > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write > + PCIe TLP. > +2. The device should be able to perform DMA writes of multiple packets > + completions in a single DMA transaction up to the PCIe maximum write limit > + in a transaction. > +3. The device should be able to zero pad packet write completion to align it to > + 64B or CPU cache line size whenever possible. > +4. An example of the above DMA completion structure: > + > +``` > +/* Constant size receive packet completion */ > +struct vnet_rx_completion { > + u16 flags; > + u16 id; /* buffer id */ > + u8 gso_type; > + u8 reserved[3]; > + le16 gso_hdr_len; > + le16 gso_size; > + le16 csum_start; > + le16 csum_offset; > + u16 reserved2; > + u64 timestamp; /* explained later */ > + u8 padding[]; > +}; > +``` > +5. The driver should be able to post constant-size buffer pages on a receive > + queue which can be consumed by the device for an incoming packet of any size > + from 64B to 9K bytes. > +6. The device should be able to know the constant buffer size at receive > + virtqueue level instead of per buffer level. > +7. The device should be able to indicate when a full page buffer is consumed, > + which can be recycled by the driver when the packets from the completed > + page is fully consumed. s/is full consumed/are fully consumed/ > +8. The device should be able to consume multiple pages for a receive GSO stream. -- So tap at my window, maybe I might let you in.


  • 23.  RE: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 08-15-2023 04:45


    > From: David Edmondson <david.edmondson@oracle.com>
    > Sent: Monday, August 14, 2023 5:25 PM
    >
    > On Monday, 2023-07-24 at 06:34:17 +03, Parav Pandit wrote:
    > > Add requirements for the low latency receive queue.
    > >
    > > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > > ---
    > > changelog:
    > > v0->v1:
    > > - clarified the requirements further
    > > - added line for the gro case
    > > - added design goals as the motivation for the requirements
    > > ---
    > > net-workstream/features-1.4.md | 45
    > > +++++++++++++++++++++++++++++++++-
    > > 1 file changed, 44 insertions(+), 1 deletion(-)
    > >
    > > diff --git a/net-workstream/features-1.4.md
    > > b/net-workstream/features-1.4.md index eb95592..e04727a 100644
    > > --- a/net-workstream/features-1.4.md
    > > +++ b/net-workstream/features-1.4.md
    > > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
    > >
    > > # 2. Summary
    > > 1. Device counters visible to the driver -2. Low latency tx virtqueue
    > > for PCI transport
    > > +2. Low latency tx and rx virtqueues for PCI transport
    > >
    > > # 3. Requirements
    > > ## 3.1 Device counters
    > > @@ -121,3 +121,46 @@ struct vnet_data_desc desc[2];
    > >
    > > 9. A flow filter virtqueue also similarly need the ability to inline the short flow
    > > command header.
    > > +
    > > +### 3.2.2 Low latency rx virtqueue
    > > +0. Design goal:
    > > + a. Keep packet metadata and buffer data together which is consumed by
    > driver
    > > + layer and make it available in a single cache line of cpu
    > > + b. Instead of having per packet descriptors which is complex to scale for
    > > + the device, supply the page directly to the device to consume it based
    > > + on packet size
    >
    > Really "per packet descriptor buffers"?
    >
    Yes, this is what is done today with packed and split q using mergable buffer and otherwise.
    Every 64B to 1500B packet consume one descriptor (per packet one descriptor).

    Today driver takes the page, splits them in descriptors and device end up assembling them back in the tedious process.

    And _Instead_ of doing that a better scheme of supplying the page directly, and let device tell how much/where things are place.
    So there is no segmentation and reassembly.
    Only segmentation by the device.

    > > +1. The device should be able to write a packet receive completion that
    > consists
    > > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
    > > + PCIe TLP.
    > > +2. The device should be able to perform DMA writes of multiple packets
    > > + completions in a single DMA transaction up to the PCIe maximum write
    > limit
    > > + in a transaction.
    > > +3. The device should be able to zero pad packet write completion to align it
    > to
    > > + 64B or CPU cache line size whenever possible.
    > > +4. An example of the above DMA completion structure:
    > > +
    > > +```
    > > +/* Constant size receive packet completion */ struct
    > > +vnet_rx_completion {
    > > + u16 flags;
    > > + u16 id; /* buffer id */
    > > + u8 gso_type;
    > > + u8 reserved[3];
    > > + le16 gso_hdr_len;
    > > + le16 gso_size;
    > > + le16 csum_start;
    > > + le16 csum_offset;
    > > + u16 reserved2;
    > > + u64 timestamp; /* explained later */
    > > + u8 padding[];
    > > +};
    > > +```
    > > +5. The driver should be able to post constant-size buffer pages on a receive
    > > + queue which can be consumed by the device for an incoming packet of any
    > size
    > > + from 64B to 9K bytes.
    > > +6. The device should be able to know the constant buffer size at receive
    > > + virtqueue level instead of per buffer level.
    > > +7. The device should be able to indicate when a full page buffer is consumed,
    > > + which can be recycled by the driver when the packets from the completed
    > > + page is fully consumed.
    >
    > s/is full consumed/are fully consumed/
    >
    Ack. Will fix.



  • 24.  RE: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

    Posted 08-15-2023 04:45
    > From: David Edmondson <david.edmondson@oracle.com> > Sent: Monday, August 14, 2023 5:25 PM > > On Monday, 2023-07-24 at 06:34:17 +03, Parav Pandit wrote: > > Add requirements for the low latency receive queue. > > > > Signed-off-by: Parav Pandit <parav@nvidia.com> > > --- > > changelog: > > v0->v1: > > - clarified the requirements further > > - added line for the gro case > > - added design goals as the motivation for the requirements > > --- > > net-workstream/features-1.4.md 45 > > +++++++++++++++++++++++++++++++++- > > 1 file changed, 44 insertions(+), 1 deletion(-) > > > > diff --git a/net-workstream/features-1.4.md > > b/net-workstream/features-1.4.md index eb95592..e04727a 100644 > > --- a/net-workstream/features-1.4.md > > +++ b/net-workstream/features-1.4.md > > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. > > > > # 2. Summary > > 1. Device counters visible to the driver -2. Low latency tx virtqueue > > for PCI transport > > +2. Low latency tx and rx virtqueues for PCI transport > > > > # 3. Requirements > > ## 3.1 Device counters > > @@ -121,3 +121,46 @@ struct vnet_data_desc desc[2]; > > > > 9. A flow filter virtqueue also similarly need the ability to inline the short flow > > command header. > > + > > +### 3.2.2 Low latency rx virtqueue > > +0. Design goal: > > + a. Keep packet metadata and buffer data together which is consumed by > driver > > + layer and make it available in a single cache line of cpu > > + b. Instead of having per packet descriptors which is complex to scale for > > + the device, supply the page directly to the device to consume it based > > + on packet size > > Really "per packet descriptor buffers"? > Yes, this is what is done today with packed and split q using mergable buffer and otherwise. Every 64B to 1500B packet consume one descriptor (per packet one descriptor). Today driver takes the page, splits them in descriptors and device end up assembling them back in the tedious process. And _Instead_ of doing that a better scheme of supplying the page directly, and let device tell how much/where things are place. So there is no segmentation and reassembly. Only segmentation by the device. > > +1. The device should be able to write a packet receive completion that > consists > > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write > > + PCIe TLP. > > +2. The device should be able to perform DMA writes of multiple packets > > + completions in a single DMA transaction up to the PCIe maximum write > limit > > + in a transaction. > > +3. The device should be able to zero pad packet write completion to align it > to > > + 64B or CPU cache line size whenever possible. > > +4. An example of the above DMA completion structure: > > + > > +``` > > +/* Constant size receive packet completion */ struct > > +vnet_rx_completion { > > + u16 flags; > > + u16 id; /* buffer id */ > > + u8 gso_type; > > + u8 reserved[3]; > > + le16 gso_hdr_len; > > + le16 gso_size; > > + le16 csum_start; > > + le16 csum_offset; > > + u16 reserved2; > > + u64 timestamp; /* explained later */ > > + u8 padding[]; > > +}; > > +``` > > +5. The driver should be able to post constant-size buffer pages on a receive > > + queue which can be consumed by the device for an incoming packet of any > size > > + from 64B to 9K bytes. > > +6. The device should be able to know the constant buffer size at receive > > + virtqueue level instead of per buffer level. > > +7. The device should be able to indicate when a full page buffer is consumed, > > + which can be recycled by the driver when the packets from the completed > > + page is fully consumed. > > s/is full consumed/are fully consumed/ > Ack. Will fix.


  • 25.  [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 07-24-2023 03:35
    Add tx and rx packet timestamp requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> --- net-workstream/features-1.4.md 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index d228462..37820b6 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface. 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support 4 Virtqueue receive flow filters (RFF) +5. Device timestamp for tx and rx packets # 3. Requirements ## 3.1 Device counters @@ -280,3 +281,28 @@ struct virtio_net_rff_delete { u8 padding[2]; le32 flow_id; }; + +## 3.5 Packet timestamp +1. Device should provide transmit timestamp and receive timestamp of the packets + at per packet level when the device is enabled. +2. Device should provide the current free running clock in the least latency + possible using an MMIO register read of 64-bit to have the least jitter. +3. Device should provide the current frequency and the frequency unit for the + software to synchronize the reference point of software and the device using + a control vq command. + +### 3.5.1 Transmit timestamp +1. Transmit completion must contain a packet transmission timestamp when the + device is enabled for it. +2. The device should record the packet transmit timestamp in the completion at + the farthest egress point towards the network. +3. The device must provide a transmit packet timestamp in a single DMA + transaction along with the rest of the transmit completion fields. + +### 3.5.2 Receive timestamp +1. Receive completion must contain a packet reception timestamp when the device + is enabled for it. +2. The device should record the received packet timestamp at the closet ingress + point of reception from the network. +3. The device should provide a receive packet timestamp in a single DMA + transaction along with the rest of the receive completion fields. -- 2.26.2


  • 26.  Re: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-09-2023 08:36
    On Mon, 24 Jul 2023 06:34:20 +0300, Parav Pandit <parav@nvidia.com> wrote:
    > Add tx and rx packet timestamp requirements.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > net-workstream/features-1.4.md | 26 ++++++++++++++++++++++++++
    > 1 file changed, 26 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index d228462..37820b6 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface.
    > 2. Low latency tx and rx virtqueues for PCI transport
    > 3. Virtqueue notification coalescing re-arming support
    > 4 Virtqueue receive flow filters (RFF)
    > +5. Device timestamp for tx and rx packets
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -280,3 +281,28 @@ struct virtio_net_rff_delete {
    > u8 padding[2];
    > le32 flow_id;
    > };
    > +
    > +## 3.5 Packet timestamp
    > +1. Device should provide transmit timestamp and receive timestamp of the packets
    > + at per packet level when the device is enabled.
    > +2. Device should provide the current free running clock in the least latency
    > + possible using an MMIO register read of 64-bit to have the least jitter.
    > +3. Device should provide the current frequency and the frequency unit for the
    > + software to synchronize the reference point of software and the device using
    > + a control vq command.
    > +
    > +### 3.5.1 Transmit timestamp
    > +1. Transmit completion must contain a packet transmission timestamp when the
    > + device is enabled for it.
    > +2. The device should record the packet transmit timestamp in the completion at
    > + the farthest egress point towards the network.
    > +3. The device must provide a transmit packet timestamp in a single DMA
    > + transaction along with the rest of the transmit completion fields.
    > +
    > +### 3.5.2 Receive timestamp
    > +1. Receive completion must contain a packet reception timestamp when the device
    > + is enabled for it.
    > +2. The device should record the received packet timestamp at the closet ingress
    > + point of reception from the network.
    > +3. The device should provide a receive packet timestamp in a single DMA
    > + transaction along with the rest of the receive completion fields.


    According to the last discuss, the feature will depend on the new desc
    structure.

    I would to know, can we introduce this to the current spec with a simple change?

    struct vring_used_elem {
    /* Index of start of used descriptor chain. */
    __virtio32 id;
    /* Total length of the descriptor chain which was used (written to) */
    __virtio32 len;

    + __virtio64 timestamp;
    };


    Then, the existing devices can support this easily. If we introduce this by the
    new desc structure, we can foresee that this function will not be implemented by
    many existing machines. But this function is useful. So we want support this by
    a simple way.


    Thanks.

    > --
    > 2.26.2
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail. Follow this link to all your TCs in OASIS at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
    >



  • 27.  Re: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-09-2023 08:47
    On Mon, 24 Jul 2023 06:34:20 +0300, Parav Pandit <parav@nvidia.com> wrote: > Add tx and rx packet timestamp requirements. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > net-workstream/features-1.4.md 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index d228462..37820b6 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface. > 2. Low latency tx and rx virtqueues for PCI transport > 3. Virtqueue notification coalescing re-arming support > 4 Virtqueue receive flow filters (RFF) > +5. Device timestamp for tx and rx packets > > # 3. Requirements > ## 3.1 Device counters > @@ -280,3 +281,28 @@ struct virtio_net_rff_delete { > u8 padding[2]; > le32 flow_id; > }; > + > +## 3.5 Packet timestamp > +1. Device should provide transmit timestamp and receive timestamp of the packets > + at per packet level when the device is enabled. > +2. Device should provide the current free running clock in the least latency > + possible using an MMIO register read of 64-bit to have the least jitter. > +3. Device should provide the current frequency and the frequency unit for the > + software to synchronize the reference point of software and the device using > + a control vq command. > + > +### 3.5.1 Transmit timestamp > +1. Transmit completion must contain a packet transmission timestamp when the > + device is enabled for it. > +2. The device should record the packet transmit timestamp in the completion at > + the farthest egress point towards the network. > +3. The device must provide a transmit packet timestamp in a single DMA > + transaction along with the rest of the transmit completion fields. > + > +### 3.5.2 Receive timestamp > +1. Receive completion must contain a packet reception timestamp when the device > + is enabled for it. > +2. The device should record the received packet timestamp at the closet ingress > + point of reception from the network. > +3. The device should provide a receive packet timestamp in a single DMA > + transaction along with the rest of the receive completion fields. According to the last discuss, the feature will depend on the new desc structure. I would to know, can we introduce this to the current spec with a simple change? struct vring_used_elem { /* Index of start of used descriptor chain. */ __virtio32 id; /* Total length of the descriptor chain which was used (written to) */ __virtio32 len; + __virtio64 timestamp; }; Then, the existing devices can support this easily. If we introduce this by the new desc structure, we can foresee that this function will not be implemented by many existing machines. But this function is useful. So we want support this by a simple way. Thanks. > -- > 2.26.2 > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >


  • 28.  Re: [virtio-comment] Re: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-10-2023 06:56
    On Wed, Aug 9, 2023 at 4:47?PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
    >
    > On Mon, 24 Jul 2023 06:34:20 +0300, Parav Pandit <parav@nvidia.com> wrote:
    > > Add tx and rx packet timestamp requirements.
    > >
    > > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > > ---
    > > net-workstream/features-1.4.md | 26 ++++++++++++++++++++++++++
    > > 1 file changed, 26 insertions(+)
    > >
    > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > > index d228462..37820b6 100644
    > > --- a/net-workstream/features-1.4.md
    > > +++ b/net-workstream/features-1.4.md
    > > @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface.
    > > 2. Low latency tx and rx virtqueues for PCI transport
    > > 3. Virtqueue notification coalescing re-arming support
    > > 4 Virtqueue receive flow filters (RFF)
    > > +5. Device timestamp for tx and rx packets
    > >
    > > # 3. Requirements
    > > ## 3.1 Device counters
    > > @@ -280,3 +281,28 @@ struct virtio_net_rff_delete {
    > > u8 padding[2];
    > > le32 flow_id;
    > > };
    > > +
    > > +## 3.5 Packet timestamp
    > > +1. Device should provide transmit timestamp and receive timestamp of the packets
    > > + at per packet level when the device is enabled.
    > > +2. Device should provide the current free running clock in the least latency
    > > + possible using an MMIO register read of 64-bit to have the least jitter.
    > > +3. Device should provide the current frequency and the frequency unit for the
    > > + software to synchronize the reference point of software and the device using
    > > + a control vq command.
    > > +
    > > +### 3.5.1 Transmit timestamp
    > > +1. Transmit completion must contain a packet transmission timestamp when the
    > > + device is enabled for it.
    > > +2. The device should record the packet transmit timestamp in the completion at
    > > + the farthest egress point towards the network.
    > > +3. The device must provide a transmit packet timestamp in a single DMA
    > > + transaction along with the rest of the transmit completion fields.
    > > +
    > > +### 3.5.2 Receive timestamp
    > > +1. Receive completion must contain a packet reception timestamp when the device
    > > + is enabled for it.
    > > +2. The device should record the received packet timestamp at the closet ingress
    > > + point of reception from the network.
    > > +3. The device should provide a receive packet timestamp in a single DMA
    > > + transaction along with the rest of the receive completion fields.
    >
    >
    > According to the last discuss, the feature will depend on the new desc
    > structure.
    >
    > I would to know, can we introduce this to the current spec with a simple change?
    >
    > struct vring_used_elem {
    > /* Index of start of used descriptor chain. */
    > __virtio32 id;
    > /* Total length of the descriptor chain which was used (written to) */
    > __virtio32 len;
    >
    > + __virtio64 timestamp;
    > };

    I think this could be one way and another proposal from Willem is:

    https://lists.linuxfoundation.org/pipermail/virtualization/2021-February/052422.html

    which might be tricky for TX but it's more flexible since it allows
    timestamps to be done per buffer.

    >
    >
    > Then, the existing devices can support this easily. If we introduce this by the
    > new desc structure, we can foresee that this function will not be implemented by
    > many existing machines. But this function is useful. So we want support this by
    > a simple way.

    Make sense.

    Thanks

    >
    >
    > Thanks.
    >
    > > --
    > > 2.26.2
    > >
    > >
    > > ---------------------------------------------------------------------
    > > To unsubscribe from this mail list, you must leave the OASIS TC that
    > > generates this mail. Follow this link to all your TCs in OASIS at:
    > > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
    > >
    >
    > This publicly archived list offers a means to provide input to the
    > OASIS Virtual I/O Device (VIRTIO) TC.
    >
    > In order to verify user consent to the Feedback License terms and
    > to minimize spam in the list archive, subscription is required
    > before posting.
    >
    > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
    > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
    > List help: virtio-comment-help@lists.oasis-open.org
    > List archive: https://lists.oasis-open.org/archives/virtio-comment/
    > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
    > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
    > Committee: https://www.oasis-open.org/committees/virtio/
    > Join OASIS: https://www.oasis-open.org/join/
    >




  • 29.  Re: [virtio-comment] Re: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-10-2023 06:56
    On Wed, Aug 9, 2023 at 4:47âPM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 24 Jul 2023 06:34:20 +0300, Parav Pandit <parav@nvidia.com> wrote: > > Add tx and rx packet timestamp requirements. > > > > Signed-off-by: Parav Pandit <parav@nvidia.com> > > --- > > net-workstream/features-1.4.md 26 ++++++++++++++++++++++++++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > > index d228462..37820b6 100644 > > --- a/net-workstream/features-1.4.md > > +++ b/net-workstream/features-1.4.md > > @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface. > > 2. Low latency tx and rx virtqueues for PCI transport > > 3. Virtqueue notification coalescing re-arming support > > 4 Virtqueue receive flow filters (RFF) > > +5. Device timestamp for tx and rx packets > > > > # 3. Requirements > > ## 3.1 Device counters > > @@ -280,3 +281,28 @@ struct virtio_net_rff_delete { > > u8 padding[2]; > > le32 flow_id; > > }; > > + > > +## 3.5 Packet timestamp > > +1. Device should provide transmit timestamp and receive timestamp of the packets > > + at per packet level when the device is enabled. > > +2. Device should provide the current free running clock in the least latency > > + possible using an MMIO register read of 64-bit to have the least jitter. > > +3. Device should provide the current frequency and the frequency unit for the > > + software to synchronize the reference point of software and the device using > > + a control vq command. > > + > > +### 3.5.1 Transmit timestamp > > +1. Transmit completion must contain a packet transmission timestamp when the > > + device is enabled for it. > > +2. The device should record the packet transmit timestamp in the completion at > > + the farthest egress point towards the network. > > +3. The device must provide a transmit packet timestamp in a single DMA > > + transaction along with the rest of the transmit completion fields. > > + > > +### 3.5.2 Receive timestamp > > +1. Receive completion must contain a packet reception timestamp when the device > > + is enabled for it. > > +2. The device should record the received packet timestamp at the closet ingress > > + point of reception from the network. > > +3. The device should provide a receive packet timestamp in a single DMA > > + transaction along with the rest of the receive completion fields. > > > According to the last discuss, the feature will depend on the new desc > structure. > > I would to know, can we introduce this to the current spec with a simple change? > > struct vring_used_elem { > /* Index of start of used descriptor chain. */ > __virtio32 id; > /* Total length of the descriptor chain which was used (written to) */ > __virtio32 len; > > + __virtio64 timestamp; > }; I think this could be one way and another proposal from Willem is: https://lists.linuxfoundation.org/pipermail/virtualization/2021-February/052422.html which might be tricky for TX but it's more flexible since it allows timestamps to be done per buffer. > > > Then, the existing devices can support this easily. If we introduce this by the > new desc structure, we can foresee that this function will not be implemented by > many existing machines. But this function is useful. So we want support this by > a simple way. Make sense. Thanks > > > Thanks. > > > -- > > 2.26.2 > > > > > > --------------------------------------------------------------------- > > To unsubscribe from this mail list, you must leave the OASIS TC that > > generates this mail. Follow this link to all your TCs in OASIS at: > > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ >


  • 30.  RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-14-2023 13:06


    > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    > Sent: Wednesday, August 9, 2023 2:06 PM
    > struct vring_used_elem {
    > /* Index of start of used descriptor chain. */
    > __virtio32 id;
    > /* Total length of the descriptor chain which was used (written to) */
    > __virtio32 len;
    >
    > + __virtio64 timestamp;
    > };
    >
    >
    > Then, the existing devices can support this easily. If we introduce this by the
    > new desc structure, we can foresee that this function will not be implemented
    > by many existing machines. But this function is useful. So we want support this
    > by a simple way.

    This only works for split q.
    Packed q needs yet another format.
    And even after that we still have to live with other limitations listed in other requirements.

    Therefore its better to do the new desc definition one time which enables to optionally support timestamp, using single descriptor format.



  • 31.  RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-14-2023 13:06
    > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Sent: Wednesday, August 9, 2023 2:06 PM > struct vring_used_elem { > /* Index of start of used descriptor chain. */ > __virtio32 id; > /* Total length of the descriptor chain which was used (written to) */ > __virtio32 len; > > + __virtio64 timestamp; > }; > > > Then, the existing devices can support this easily. If we introduce this by the > new desc structure, we can foresee that this function will not be implemented > by many existing machines. But this function is useful. So we want support this > by a simple way. This only works for split q. Packed q needs yet another format. And even after that we still have to live with other limitations listed in other requirements. Therefore its better to do the new desc definition one time which enables to optionally support timestamp, using single descriptor format.


  • 32.  Re: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 02:47
    On Mon, 14 Aug 2023 13:06:03 +0000, Parav Pandit <parav@nvidia.com> wrote:
    >
    >
    > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    > > Sent: Wednesday, August 9, 2023 2:06 PM
    > > struct vring_used_elem {
    > > /* Index of start of used descriptor chain. */
    > > __virtio32 id;
    > > /* Total length of the descriptor chain which was used (written to) */
    > > __virtio32 len;
    > >
    > > + __virtio64 timestamp;
    > > };
    > >
    > >
    > > Then, the existing devices can support this easily. If we introduce this by the
    > > new desc structure, we can foresee that this function will not be implemented
    > > by many existing machines. But this function is useful. So we want support this
    > > by a simple way.
    >
    > This only works for split q.

    YES.


    > Packed q needs yet another format.
    > And even after that we still have to live with other limitations listed in other requirements.

    I would like some simple changes for the tx timestamp.

    >
    > Therefore its better to do the new desc definition one time which enables to optionally support timestamp, using single descriptor format.


    I agree.

    But we can have too. In addition to your plans, I would like to introduce a
    simple method that can be used with existing machines.

    Thanks.



  • 33.  Re: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 02:51
    On Mon, 14 Aug 2023 13:06:03 +0000, Parav Pandit <parav@nvidia.com> wrote: > > > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Sent: Wednesday, August 9, 2023 2:06 PM > > struct vring_used_elem { > > /* Index of start of used descriptor chain. */ > > __virtio32 id; > > /* Total length of the descriptor chain which was used (written to) */ > > __virtio32 len; > > > > + __virtio64 timestamp; > > }; > > > > > > Then, the existing devices can support this easily. If we introduce this by the > > new desc structure, we can foresee that this function will not be implemented > > by many existing machines. But this function is useful. So we want support this > > by a simple way. > > This only works for split q. YES. > Packed q needs yet another format. > And even after that we still have to live with other limitations listed in other requirements. I would like some simple changes for the tx timestamp. > > Therefore its better to do the new desc definition one time which enables to optionally support timestamp, using single descriptor format. I agree. But we can have too. In addition to your plans, I would like to introduce a simple method that can be used with existing machines. Thanks.


  • 34.  RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 04:01


    > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    > Sent: Tuesday, August 15, 2023 8:17 AM
    >
    > > Packed q needs yet another format.
    > > And even after that we still have to live with other limitations listed in other
    > requirements.
    >
    > I would like some simple changes for the tx timestamp.
    >
    > >
    > > Therefore its better to do the new desc definition one time which enables to
    > optionally support timestamp, using single descriptor format.
    >
    >
    > I agree.
    >
    > But we can have too. In addition to your plans, I would like to introduce a simple
    > method that can be used with existing machines.
    >
    Can you please explain what is a "existing machine"?
    This is the change in driver device interface.
    So one needs new driver and device extension anyway.



  • 35.  RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 04:01
    > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Sent: Tuesday, August 15, 2023 8:17 AM > > > Packed q needs yet another format. > > And even after that we still have to live with other limitations listed in other > requirements. > > I would like some simple changes for the tx timestamp. > > > > > Therefore its better to do the new desc definition one time which enables to > optionally support timestamp, using single descriptor format. > > > I agree. > > But we can have too. In addition to your plans, I would like to introduce a simple > method that can be used with existing machines. > Can you please explain what is a "existing machine"? This is the change in driver device interface. So one needs new driver and device extension anyway.


  • 36.  Re: RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 06:01
    On Tue, 15 Aug 2023 04:01:25 +0000, Parav Pandit <parav@nvidia.com> wrote:
    >
    >
    > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    > > Sent: Tuesday, August 15, 2023 8:17 AM
    > >
    > > > Packed q needs yet another format.
    > > > And even after that we still have to live with other limitations listed in other
    > > requirements.
    > >
    > > I would like some simple changes for the tx timestamp.
    > >
    > > >
    > > > Therefore its better to do the new desc definition one time which enables to
    > > optionally support timestamp, using single descriptor format.
    > >
    > >
    > > I agree.
    > >
    > > But we can have too. In addition to your plans, I would like to introduce a simple
    > > method that can be used with existing machines.
    > >
    > Can you please explain what is a "existing machine"?
    > This is the change in driver device interface.
    > So one needs new driver and device extension anyway.


    The devices that are running now.


    If the change for timestamp is small, then that will be easy for these devices
    to update.

    But the timestamp is so usefull, we want introduce this feature to these devices.

    I see your new description, for the existing devices, that will be high risk to
    update. And the new description maybe not be needed for they.

    Thanks.







  • 37.  Re: RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 06:06
    On Tue, 15 Aug 2023 04:01:25 +0000, Parav Pandit <parav@nvidia.com> wrote: > > > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Sent: Tuesday, August 15, 2023 8:17 AM > > > > > Packed q needs yet another format. > > > And even after that we still have to live with other limitations listed in other > > requirements. > > > > I would like some simple changes for the tx timestamp. > > > > > > > > Therefore its better to do the new desc definition one time which enables to > > optionally support timestamp, using single descriptor format. > > > > > > I agree. > > > > But we can have too. In addition to your plans, I would like to introduce a simple > > method that can be used with existing machines. > > > Can you please explain what is a "existing machine"? > This is the change in driver device interface. > So one needs new driver and device extension anyway. The devices that are running now. If the change for timestamp is small, then that will be easy for these devices to update. But the timestamp is so usefull, we want introduce this feature to these devices. I see your new description, for the existing devices, that will be high risk to update. And the new description maybe not be needed for they. Thanks.


  • 38.  RE: RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 06:10
    > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    > Sent: Tuesday, August 15, 2023 11:31 AM

    > > Can you please explain what is a "existing machine"?
    > > This is the change in driver device interface.
    > > So one needs new driver and device extension anyway.
    >
    >
    > The devices that are running now.
    >
    >
    > If the change for timestamp is small, then that will be easy for these devices to
    > update.
    >
    > But the timestamp is so usefull, we want introduce this feature to these devices.
    >
    So lets work towards getting the new descriptors more quickly and prioritize it.

    > I see your new description, for the existing devices, that will be high risk to
    > update. And the new description maybe not be needed for they.

    Lets mitigate the risk by working towards making the new descriptors that solves multiple problems.
    Otherwise its all double work for all the layers from spec, device, driver and more.

    I agree that I got slowed down in last 1.5 months due to personal issues, but I am better now to speed up the new format.



  • 39.  RE: RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 06:10
    > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Sent: Tuesday, August 15, 2023 11:31 AM > > Can you please explain what is a "existing machine"? > > This is the change in driver device interface. > > So one needs new driver and device extension anyway. > > > The devices that are running now. > > > If the change for timestamp is small, then that will be easy for these devices to > update. > > But the timestamp is so usefull, we want introduce this feature to these devices. > So lets work towards getting the new descriptors more quickly and prioritize it. > I see your new description, for the existing devices, that will be high risk to > update. And the new description maybe not be needed for they. Lets mitigate the risk by working towards making the new descriptors that solves multiple problems. Otherwise its all double work for all the layers from spec, device, driver and more. I agree that I got slowed down in last 1.5 months due to personal issues, but I am better now to speed up the new format.


  • 40.  Re: RE: RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 09:45
    On Tue, 15 Aug 2023 06:09:48 +0000, Parav Pandit <parav@nvidia.com> wrote:
    > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    > > Sent: Tuesday, August 15, 2023 11:31 AM
    >
    > > > Can you please explain what is a "existing machine"?
    > > > This is the change in driver device interface.
    > > > So one needs new driver and device extension anyway.
    > >
    > >
    > > The devices that are running now.
    > >
    > >
    > > If the change for timestamp is small, then that will be easy for these devices to
    > > update.
    > >
    > > But the timestamp is so usefull, we want introduce this feature to these devices.
    > >
    > So lets work towards getting the new descriptors more quickly and prioritize it.
    >
    > > I see your new description, for the existing devices, that will be high risk to
    > > update. And the new description maybe not be needed for they.
    >
    > Lets mitigate the risk by working towards making the new descriptors that solves multiple problems.
    > Otherwise its all double work for all the layers from spec, device, driver and more.
    >
    > I agree that I got slowed down in last 1.5 months due to personal issues, but I am better now to speed up the new format.


    I am happy to hear that you are healthy.

    Thanks.





  • 41.  Re: RE: RE: RE: [virtio] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 09:46
    On Tue, 15 Aug 2023 06:09:48 +0000, Parav Pandit <parav@nvidia.com> wrote: > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Sent: Tuesday, August 15, 2023 11:31 AM > > > > Can you please explain what is a "existing machine"? > > > This is the change in driver device interface. > > > So one needs new driver and device extension anyway. > > > > > > The devices that are running now. > > > > > > If the change for timestamp is small, then that will be easy for these devices to > > update. > > > > But the timestamp is so usefull, we want introduce this feature to these devices. > > > So lets work towards getting the new descriptors more quickly and prioritize it. > > > I see your new description, for the existing devices, that will be high risk to > > update. And the new description maybe not be needed for they. > > Lets mitigate the risk by working towards making the new descriptors that solves multiple problems. > Otherwise its all double work for all the layers from spec, device, driver and more. > > I agree that I got slowed down in last 1.5 months due to personal issues, but I am better now to speed up the new format. I am happy to hear that you are healthy. Thanks.


  • 42.  Re: [virtio-comment] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-14-2023 12:00

    On Monday, 2023-07-24 at 06:34:20 +03, Parav Pandit wrote:
    > Add tx and rx packet timestamp requirements.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>

    Acked-by: David Edmondson <david.edmondson@oracle.com>

    > ---
    > net-workstream/features-1.4.md | 26 ++++++++++++++++++++++++++
    > 1 file changed, 26 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index d228462..37820b6 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface.
    > 2. Low latency tx and rx virtqueues for PCI transport
    > 3. Virtqueue notification coalescing re-arming support
    > 4 Virtqueue receive flow filters (RFF)
    > +5. Device timestamp for tx and rx packets
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -280,3 +281,28 @@ struct virtio_net_rff_delete {
    > u8 padding[2];
    > le32 flow_id;
    > };
    > +
    > +## 3.5 Packet timestamp
    > +1. Device should provide transmit timestamp and receive timestamp of the packets
    > + at per packet level when the device is enabled.
    > +2. Device should provide the current free running clock in the least latency
    > + possible using an MMIO register read of 64-bit to have the least jitter.
    > +3. Device should provide the current frequency and the frequency unit for the
    > + software to synchronize the reference point of software and the device using
    > + a control vq command.
    > +
    > +### 3.5.1 Transmit timestamp
    > +1. Transmit completion must contain a packet transmission timestamp when the
    > + device is enabled for it.
    > +2. The device should record the packet transmit timestamp in the completion at
    > + the farthest egress point towards the network.
    > +3. The device must provide a transmit packet timestamp in a single DMA
    > + transaction along with the rest of the transmit completion fields.
    > +
    > +### 3.5.2 Receive timestamp
    > +1. Receive completion must contain a packet reception timestamp when the device
    > + is enabled for it.
    > +2. The device should record the received packet timestamp at the closet ingress
    > + point of reception from the network.
    > +3. The device should provide a receive packet timestamp in a single DMA
    > + transaction along with the rest of the receive completion fields.
    --
    Do not leave the building.



  • 43.  Re: [virtio-comment] [PATCH requirements 6/7] net-features: Add packet timestamp requirements

    Posted 08-14-2023 12:00
    On Monday, 2023-07-24 at 06:34:20 +03, Parav Pandit wrote: > Add tx and rx packet timestamp requirements. > > Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> > --- > net-workstream/features-1.4.md 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index d228462..37820b6 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface. > 2. Low latency tx and rx virtqueues for PCI transport > 3. Virtqueue notification coalescing re-arming support > 4 Virtqueue receive flow filters (RFF) > +5. Device timestamp for tx and rx packets > > # 3. Requirements > ## 3.1 Device counters > @@ -280,3 +281,28 @@ struct virtio_net_rff_delete { > u8 padding[2]; > le32 flow_id; > }; > + > +## 3.5 Packet timestamp > +1. Device should provide transmit timestamp and receive timestamp of the packets > + at per packet level when the device is enabled. > +2. Device should provide the current free running clock in the least latency > + possible using an MMIO register read of 64-bit to have the least jitter. > +3. Device should provide the current frequency and the frequency unit for the > + software to synchronize the reference point of software and the device using > + a control vq command. > + > +### 3.5.1 Transmit timestamp > +1. Transmit completion must contain a packet transmission timestamp when the > + device is enabled for it. > +2. The device should record the packet transmit timestamp in the completion at > + the farthest egress point towards the network. > +3. The device must provide a transmit packet timestamp in a single DMA > + transaction along with the rest of the transmit completion fields. > + > +### 3.5.2 Receive timestamp > +1. Receive completion must contain a packet reception timestamp when the device > + is enabled for it. > +2. The device should record the received packet timestamp at the closet ingress > + point of reception from the network. > +3. The device should provide a receive packet timestamp in a single DMA > + transaction along with the rest of the receive completion fields. -- Do not leave the building.


  • 44.  [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 07-24-2023 03:35
    Add header data split requirements for the receive packets. Signed-off-by: Parav Pandit <parav@nvidia.com> --- net-workstream/features-1.4.md 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 37820b6..a64e356 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. 3. Virtqueue notification coalescing re-arming support 4 Virtqueue receive flow filters (RFF) 5. Device timestamp for tx and rx packets +6. Header data split for the receive virtqueue # 3. Requirements ## 3.1 Device counters @@ -306,3 +307,15 @@ struct virtio_net_rff_delete { point of reception from the network. 3. The device should provide a receive packet timestamp in a single DMA transaction along with the rest of the receive completion fields. + +## 3.6 Header data split for the receive virtqueue +1. The device should be able to DMA the packet header and data to two different + memory locations, this enables driver and networking stack to perform zero + copy to application buffer(s). +2. The driver should be able to configure maximum header buffer size per + virtqueue. +3. The header buffer to be in a physically contiguous memory per virtqueue +4. The device should be able to indicate header data split in the receive + completion. +5. The device should be able to zero pad the header buffer when the received + header is shorter than cpu cache line size. -- 2.26.2


  • 45.  RE: [EXT] [virtio] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-10-2023 19:19
    Hi Parav

    >


  • 46.  RE: [EXT] [virtio] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-10-2023 19:20
    Hi Parav >


  • 47.  Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 12:01

    On Monday, 2023-07-24 at 06:34:21 +03, Parav Pandit wrote:
    > Add header data split requirements for the receive packets.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > net-workstream/features-1.4.md | 13 +++++++++++++
    > 1 file changed, 13 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 37820b6..a64e356 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface.
    > 3. Virtqueue notification coalescing re-arming support
    > 4 Virtqueue receive flow filters (RFF)
    > 5. Device timestamp for tx and rx packets
    > +6. Header data split for the receive virtqueue
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -306,3 +307,15 @@ struct virtio_net_rff_delete {
    > point of reception from the network.
    > 3. The device should provide a receive packet timestamp in a single DMA
    > transaction along with the rest of the receive completion fields.
    > +
    > +## 3.6 Header data split for the receive virtqueue
    > +1. The device should be able to DMA the packet header and data to two different
    > + memory locations, this enables driver and networking stack to perform zero
    > + copy to application buffer(s).
    > +2. The driver should be able to configure maximum header buffer size per
    > + virtqueue.
    > +3. The header buffer to be in a physically contiguous memory per virtqueue
    > +4. The device should be able to indicate header data split in the receive
    > + completion.
    > +5. The device should be able to zero pad the header buffer when the received
    > + header is shorter than cpu cache line size.

    What's the use case for this (item 5)?
    --
    And now I know what every step is for.



  • 48.  Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 12:01
    On Monday, 2023-07-24 at 06:34:21 +03, Parav Pandit wrote: > Add header data split requirements for the receive packets. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > net-workstream/features-1.4.md 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 37820b6..a64e356 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. > 3. Virtqueue notification coalescing re-arming support > 4 Virtqueue receive flow filters (RFF) > 5. Device timestamp for tx and rx packets > +6. Header data split for the receive virtqueue > > # 3. Requirements > ## 3.1 Device counters > @@ -306,3 +307,15 @@ struct virtio_net_rff_delete { > point of reception from the network. > 3. The device should provide a receive packet timestamp in a single DMA > transaction along with the rest of the receive completion fields. > + > +## 3.6 Header data split for the receive virtqueue > +1. The device should be able to DMA the packet header and data to two different > + memory locations, this enables driver and networking stack to perform zero > + copy to application buffer(s). > +2. The driver should be able to configure maximum header buffer size per > + virtqueue. > +3. The header buffer to be in a physically contiguous memory per virtqueue > +4. The device should be able to indicate header data split in the receive > + completion. > +5. The device should be able to zero pad the header buffer when the received > + header is shorter than cpu cache line size. What's the use case for this (item 5)? -- And now I know what every step is for.


  • 49.  Re: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 12:45
    On Mon, Aug 14, 2023 at 8:01âAM David Edmondson <david.edmondson@oracle.com> wrote: > > > On Monday, 2023-07-24 at 06:34:21 +03, Parav Pandit wrote: > > Add header data split requirements for the receive packets. > > > > Signed-off-by: Parav Pandit <parav@nvidia.com> > > --- > > net-workstream/features-1.4.md 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > > index 37820b6..a64e356 100644 > > --- a/net-workstream/features-1.4.md > > +++ b/net-workstream/features-1.4.md > > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. > > 3. Virtqueue notification coalescing re-arming support > > 4 Virtqueue receive flow filters (RFF) > > 5. Device timestamp for tx and rx packets > > +6. Header data split for the receive virtqueue > > > > # 3. Requirements > > ## 3.1 Device counters > > @@ -306,3 +307,15 @@ struct virtio_net_rff_delete { > > point of reception from the network. > > 3. The device should provide a receive packet timestamp in a single DMA > > transaction along with the rest of the receive completion fields. > > + > > +## 3.6 Header data split for the receive virtqueue > > +1. The device should be able to DMA the packet header and data to two different > > + memory locations, this enables driver and networking stack to perform zero > > + copy to application buffer(s). > > +2. The driver should be able to configure maximum header buffer size per > > + virtqueue. > > +3. The header buffer to be in a physically contiguous memory per virtqueue > > +4. The device should be able to indicate header data split in the receive > > + completion. > > +5. The device should be able to zero pad the header buffer when the received > > + header is shorter than cpu cache line size. > > What's the use case for this (item 5)? Without zero padding, each header write results in a read-modify-write, possibly over PCIe. That can significantly depress throughput.


  • 50.  Re: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 13:10

    On Monday, 2023-08-14 at 08:44:11 -04, Willem de Bruijn wrote:
    > On Mon, Aug 14, 2023 at 8:01?AM David Edmondson
    > <david.edmondson@oracle.com> wrote:
    >>
    >>
    >> On Monday, 2023-07-24 at 06:34:21 +03, Parav Pandit wrote:
    >> > Add header data split requirements for the receive packets.
    >> >
    >> > Signed-off-by: Parav Pandit <parav@nvidia.com>
    >> > ---
    >> > net-workstream/features-1.4.md | 13 +++++++++++++
    >> > 1 file changed, 13 insertions(+)
    >> >
    >> > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    >> > index 37820b6..a64e356 100644
    >> > --- a/net-workstream/features-1.4.md
    >> > +++ b/net-workstream/features-1.4.md
    >> > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface.
    >> > 3. Virtqueue notification coalescing re-arming support
    >> > 4 Virtqueue receive flow filters (RFF)
    >> > 5. Device timestamp for tx and rx packets
    >> > +6. Header data split for the receive virtqueue
    >> >
    >> > # 3. Requirements
    >> > ## 3.1 Device counters
    >> > @@ -306,3 +307,15 @@ struct virtio_net_rff_delete {
    >> > point of reception from the network.
    >> > 3. The device should provide a receive packet timestamp in a single DMA
    >> > transaction along with the rest of the receive completion fields.
    >> > +
    >> > +## 3.6 Header data split for the receive virtqueue
    >> > +1. The device should be able to DMA the packet header and data to two different
    >> > + memory locations, this enables driver and networking stack to perform zero
    >> > + copy to application buffer(s).
    >> > +2. The driver should be able to configure maximum header buffer size per
    >> > + virtqueue.
    >> > +3. The header buffer to be in a physically contiguous memory per virtqueue
    >> > +4. The device should be able to indicate header data split in the receive
    >> > + completion.
    >> > +5. The device should be able to zero pad the header buffer when the received
    >> > + header is shorter than cpu cache line size.
    >>
    >> What's the use case for this (item 5)?
    >
    > Without zero padding, each header write results in a
    > read-modify-write, possibly over PCIe. That can significantly depress
    > throughput.

    Understood. So it could be anything padding, we just want to write a
    full cache line.
    --
    Woke up in my clothes again this morning, don't know exactly where I am.



  • 51.  Re: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 13:10
    On Monday, 2023-08-14 at 08:44:11 -04, Willem de Bruijn wrote: > On Mon, Aug 14, 2023 at 8:01âAM David Edmondson > <david.edmondson@oracle.com> wrote: >> >> >> On Monday, 2023-07-24 at 06:34:21 +03, Parav Pandit wrote: >> > Add header data split requirements for the receive packets. >> > >> > Signed-off-by: Parav Pandit <parav@nvidia.com> >> > --- >> > net-workstream/features-1.4.md 13 +++++++++++++ >> > 1 file changed, 13 insertions(+) >> > >> > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md >> > index 37820b6..a64e356 100644 >> > --- a/net-workstream/features-1.4.md >> > +++ b/net-workstream/features-1.4.md >> > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. >> > 3. Virtqueue notification coalescing re-arming support >> > 4 Virtqueue receive flow filters (RFF) >> > 5. Device timestamp for tx and rx packets >> > +6. Header data split for the receive virtqueue >> > >> > # 3. Requirements >> > ## 3.1 Device counters >> > @@ -306,3 +307,15 @@ struct virtio_net_rff_delete { >> > point of reception from the network. >> > 3. The device should provide a receive packet timestamp in a single DMA >> > transaction along with the rest of the receive completion fields. >> > + >> > +## 3.6 Header data split for the receive virtqueue >> > +1. The device should be able to DMA the packet header and data to two different >> > + memory locations, this enables driver and networking stack to perform zero >> > + copy to application buffer(s). >> > +2. The driver should be able to configure maximum header buffer size per >> > + virtqueue. >> > +3. The header buffer to be in a physically contiguous memory per virtqueue >> > +4. The device should be able to indicate header data split in the receive >> > + completion. >> > +5. The device should be able to zero pad the header buffer when the received >> > + header is shorter than cpu cache line size. >> >> What's the use case for this (item 5)? > > Without zero padding, each header write results in a > read-modify-write, possibly over PCIe. That can significantly depress > throughput. Understood. So it could be anything padding, we just want to write a full cache line. -- Woke up in my clothes again this morning, don't know exactly where I am.


  • 52.  RE: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 13:29


    > From: David Edmondson <david.edmondson@oracle.com>
    > Sent: Monday, August 14, 2023 6:40 PM

    > >> > +5. The device should be able to zero pad the header buffer when the
    > received
    > >> > + header is shorter than cpu cache line size.
    > >>
    > >> What's the use case for this (item 5)?
    > >
    > > Without zero padding, each header write results in a
    > > read-modify-write, possibly over PCIe. That can significantly depress
    > > throughput.
    >
    > Understood. So it could be anything padding, we just want to write a full cache
    > line.
    Yes. if the data descriptor is partial, need to zero fill too.
    I will double check if I covered already or not.

    Yet catching up after a break.



  • 53.  RE: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 13:29
    > From: David Edmondson <david.edmondson@oracle.com> > Sent: Monday, August 14, 2023 6:40 PM > >> > +5. The device should be able to zero pad the header buffer when the > received > >> > + header is shorter than cpu cache line size. > >> > >> What's the use case for this (item 5)? > > > > Without zero padding, each header write results in a > > read-modify-write, possibly over PCIe. That can significantly depress > > throughput. > > Understood. So it could be anything padding, we just want to write a full cache > line. Yes. if the data descriptor is partial, need to zero fill too. I will double check if I covered already or not. Yet catching up after a break.


  • 54.  Re: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 13:56

    On Monday, 2023-08-14 at 13:28:52 UTC, Parav Pandit wrote:
    >> From: David Edmondson <david.edmondson@oracle.com>
    >> Sent: Monday, August 14, 2023 6:40 PM
    >
    >> >> > +5. The device should be able to zero pad the header buffer when the
    >> received
    >> >> > + header is shorter than cpu cache line size.
    >> >>
    >> >> What's the use case for this (item 5)?
    >> >
    >> > Without zero padding, each header write results in a
    >> > read-modify-write, possibly over PCIe. That can significantly depress
    >> > throughput.
    >>
    >> Understood. So it could be anything padding, we just want to write a full cache
    >> line.
    > Yes. if the data descriptor is partial, need to zero fill too.
    > I will double check if I covered already or not.

    Perhaps the requirement, then, should be that the device is permitted
    (and encouraged) to write full cache lines, with an aside that zero
    padding should be used to achieve this.
    --
    No proper time of day.



  • 55.  Re: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-14-2023 13:58
    On Monday, 2023-08-14 at 13:28:52 UTC, Parav Pandit wrote: >> From: David Edmondson <david.edmondson@oracle.com> >> Sent: Monday, August 14, 2023 6:40 PM > >> >> > +5. The device should be able to zero pad the header buffer when the >> received >> >> > + header is shorter than cpu cache line size. >> >> >> >> What's the use case for this (item 5)? >> > >> > Without zero padding, each header write results in a >> > read-modify-write, possibly over PCIe. That can significantly depress >> > throughput. >> >> Understood. So it could be anything padding, we just want to write a full cache >> line. > Yes. if the data descriptor is partial, need to zero fill too. > I will double check if I covered already or not. Perhaps the requirement, then, should be that the device is permitted (and encouraged) to write full cache lines, with an aside that zero padding should be used to achieve this. -- No proper time of day.


  • 56.  RE: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-15-2023 04:42

    > From: David Edmondson <david.edmondson@oracle.com>
    > Sent: Monday, August 14, 2023 7:26 PM

    > Perhaps the requirement, then, should be that the device is permitted (and
    > encouraged) to write full cache lines, with an aside that zero padding should be
    > used to achieve this.

    Yes, we can relax this.
    I think the point from driver->device interface point of view is to communicate this details and use aligned offset within the header buffer for subsequent packet entries.
    So that driver knows where to expect the next packet's header (at cache aligned address).



  • 57.  RE: [virtio] Re: [virtio-comment] [PATCH requirements 7/7] net-features: Add header data split requirements

    Posted 08-15-2023 04:42
    > From: David Edmondson <david.edmondson@oracle.com> > Sent: Monday, August 14, 2023 7:26 PM > Perhaps the requirement, then, should be that the device is permitted (and > encouraged) to write full cache lines, with an aside that zero padding should be > used to achieve this. Yes, we can relax this. I think the point from driver->device interface point of view is to communicate this details and use aligned offset within the header buffer for subsequent packet entries. So that driver knows where to expect the next packet's header (at cache aligned address).


  • 58.  [PATCH requirements 4/7] net-features: Add notification coalescing requirements

    Posted 07-24-2023 03:35
    Add virtio net device notification coalescing improvements requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v1->v2: - addressed comments from Stefan - redrafted the requirements to use rearm term and avoid queue enable confusion v0->v1: - updated the description --- net-workstream/features-1.4.md 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index e04727a..27a7886 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport +3. Virtqueue notification coalescing re-arming support # 3. Requirements ## 3.1 Device counters @@ -164,3 +165,13 @@ struct vnet_rx_completion { which can be recycled by the driver when the packets from the completed page is fully consumed. 8. The device should be able to consume multiple pages for a receive GSO stream. + +## 3.3 Virtqueue notification coalescing re-arming support +0. Design goal: + a. Avoid constant notifications from the device even in conditions when + the driver may not have acted on the previous pending notification. +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such + a notification is reported by the device, the device stops sending further + notifications until the driver rearms the notifications of the virtqueue. +2. When the driver rearms the notification of the virtqueue, the device + to notify again if notification coalescing conditions are met. -- 2.26.2


  • 59.  Re: [virtio-comment] [PATCH requirements 4/7] net-features: Add notification coalescing requirements

    Posted 08-14-2023 11:57

    On Monday, 2023-07-24 at 06:34:18 +03, Parav Pandit wrote:
    > Add virtio net device notification coalescing improvements requirements.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>

    Acked-by: David Edmondson <david.edmondson@oracle.com>

    > ---
    > changelog:
    > v1->v2:
    > - addressed comments from Stefan
    > - redrafted the requirements to use rearm term and avoid queue enable
    > confusion
    > v0->v1:
    > - updated the description
    > ---
    > net-workstream/features-1.4.md | 11 +++++++++++
    > 1 file changed, 11 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index e04727a..27a7886 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface.
    > # 2. Summary
    > 1. Device counters visible to the driver
    > 2. Low latency tx and rx virtqueues for PCI transport
    > +3. Virtqueue notification coalescing re-arming support
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -164,3 +165,13 @@ struct vnet_rx_completion {
    > which can be recycled by the driver when the packets from the completed
    > page is fully consumed.
    > 8. The device should be able to consume multiple pages for a receive GSO stream.
    > +
    > +## 3.3 Virtqueue notification coalescing re-arming support
    > +0. Design goal:
    > + a. Avoid constant notifications from the device even in conditions when
    > + the driver may not have acted on the previous pending notification.
    > +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such
    > + a notification is reported by the device, the device stops sending further
    > + notifications until the driver rearms the notifications of the virtqueue.
    > +2. When the driver rearms the notification of the virtqueue, the device
    > + to notify again if notification coalescing conditions are met.
    --
    You know your green from your red.



  • 60.  Re: [virtio-comment] [PATCH requirements 4/7] net-features: Add notification coalescing requirements

    Posted 08-14-2023 11:58
    On Monday, 2023-07-24 at 06:34:18 +03, Parav Pandit wrote: > Add virtio net device notification coalescing improvements requirements. > > Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> > --- > changelog: > v1->v2: > - addressed comments from Stefan > - redrafted the requirements to use rearm term and avoid queue enable > confusion > v0->v1: > - updated the description > --- > net-workstream/features-1.4.md 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index e04727a..27a7886 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. > # 2. Summary > 1. Device counters visible to the driver > 2. Low latency tx and rx virtqueues for PCI transport > +3. Virtqueue notification coalescing re-arming support > > # 3. Requirements > ## 3.1 Device counters > @@ -164,3 +165,13 @@ struct vnet_rx_completion { > which can be recycled by the driver when the packets from the completed > page is fully consumed. > 8. The device should be able to consume multiple pages for a receive GSO stream. > + > +## 3.3 Virtqueue notification coalescing re-arming support > +0. Design goal: > + a. Avoid constant notifications from the device even in conditions when > + the driver may not have acted on the previous pending notification. > +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such > + a notification is reported by the device, the device stops sending further > + notifications until the driver rearms the notifications of the virtqueue. > +2. When the driver rearms the notification of the virtqueue, the device > + to notify again if notification coalescing conditions are met. -- You know your green from your red.


  • 61.  [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 07-24-2023 03:35
    Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 105 +++++++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 27a7886..d228462 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -175,3 +176,107 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) +0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific context to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. +5. Flow filter operations are often accelerated by device in a hardware. Ability + to handle them on a queue other than control vq is desired. This achieves near + zero modifications to existing implementations to add new operations on new + purpose built queues (similar to transmit and receive queue). +6. The filter masks are optional; the device should be able to expose if it + support filter masks. +7. The driver may want to have priority among group of flow entries; to facilitate + the device support grouping flow filter entries by a notion of a group. Each + group defines priority in processing flow. +8. The driver and group owner driver should be able to query supported device + limits for the flow filter entries. + +### 3.4.2 flow operations path +1. The driver should be able to define a receive packet match criteria, an + action and a destination for a packet. For example, an ipv4 packet with a + multicast address to be steered to the receive vq 0. The second example is + ipv4, tcp packet matching a specified IP address and tcp port tuple to + be steered to receive vq 10. +2. The match criteria should include exact tuple fields well-defined such as mac + address, IP addresses, tcp/udp ports, etc. +3. The match criteria should also optionally include the field mask. +4. The match criteria may optionally also include specific packet byte offset + pattern, match length, mask instead of RFC defined fields. + length, and matching pattern, which may not be defined in the standard RFC. +5. Action includes (a) dropping or (b) forwarding the packet. +6. Destination is a receive virtqueue index. +7. The device should process packet receive filters programmed via control vq + commands first in the processing chain. +7. The device should process RFF entries before RSS configuration, i.e., + when there is a miss on the RFF entry, RSS configuration applies if it exists. +8. To summarize the processing chain on a rx packet is: + {mac,vlan,promisc rx filters} -> {receive flow filters} -> {rss/hash config}. +9. If multiple entries are programmed which has overlapping attributes for a + received packet, the driver to define the location/priority of the entry. +10. The filter entries are usually short in size of few tens of bytes, + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is + high, hence supplying fields inside the queue descriptor is preferred for + up to a certain fixed size, say 56 bytes. +11. A flow filter entry consists of (a) match criteria, (b) action, + (c) destination and (d) a unique 32 bit flow id, all supplied by the + driver. +12. The driver should be able to query and delete flow filter entry by the + the device by the flow id. + +### 3.4.3 interface example + +Flow filter capabilities to query using a DMA interface: + +``` +struct flow_filter_capabilities { + u8 flow_groups; + u16 num_flow_filter_vqs; + u16 start_vq_index; + u32 max_flow_filters_per_group; + u32 max_flow_filters; + u64 supported_packet_field_mask_bmap[4]; +}; + + +``` + +1. Flow filter entry add/modify, delete: + +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 group_id; + u8 padding[2]; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +2. Flow filter entry delete: +struct virtio_net_rff_delete { + u8 flow_op; + u8 group_id; + u8 padding[2]; + le32 flow_id; +}; -- 2.26.2


  • 62.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-01-2023 08:34
    Hi Michael and all,

    > From: Parav Pandit <parav@nvidia.com>
    > Sent: Monday, July 24, 2023 9:04 AM
    >
    > Add virtio net device requirements for receive flow filters.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>

    Do you have any further comments on it?

    Heng and I want to progress now to start the design/spec draft of it starting 3rd Aug.
    Flow filter command requires packed VQ descriptor extension.
    So, we need to wrap up now for this infrastructure extension pre-work which has significant effort.

    For now, we are taking design to next stage for these steering requirements.
    The packed vq extension is going to benefit low latency tx part as well as reviewed by Stephan.



  • 63.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-01-2023 08:34
    Hi Michael and all, > From: Parav Pandit <parav@nvidia.com> > Sent: Monday, July 24, 2023 9:04 AM > > Add virtio net device requirements for receive flow filters. > > Signed-off-by: Parav Pandit <parav@nvidia.com> Do you have any further comments on it? Heng and I want to progress now to start the design/spec draft of it starting 3rd Aug. Flow filter command requires packed VQ descriptor extension. So, we need to wrap up now for this infrastructure extension pre-work which has significant effort. For now, we are taking design to next stage for these steering requirements. The packed vq extension is going to benefit low latency tx part as well as reviewed by Stephan.


  • 64.  RE: [EXT] [virtio] [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-02-2023 07:18
    Hi Parav

    >


  • 65.  RE: [EXT] [virtio] [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-02-2023 07:18
    Hi Parav >


  • 66.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-02-2023 15:25
    On Mon, Jul 24, 2023 at 06:34:19AM +0300, Parav Pandit wrote:
    > Add virtio net device requirements for receive flow filters.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v1->v2:
    > - split setup and operations requirements
    > - added design goal
    > - worded requirements more precisely
    > v0->v1:
    > - fixed comments from Heng Li
    > - renamed receive flow steering to receive flow filters
    > - clarified byte offset in match criteria
    > ---
    > net-workstream/features-1.4.md | 105 +++++++++++++++++++++++++++++++++
    > 1 file changed, 105 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 27a7886..d228462 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    > 1. Device counters visible to the driver
    > 2. Low latency tx and rx virtqueues for PCI transport
    > 3. Virtqueue notification coalescing re-arming support
    > +4 Virtqueue receive flow filters (RFF)
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -175,3 +176,107 @@ struct vnet_rx_completion {
    > notifications until the driver rearms the notifications of the virtqueue.
    > 2. When the driver rearms the notification of the virtqueue, the device
    > to notify again if notification coalescing conditions are met.
    > +
    > +## 3.4 Virtqueue receive flow filters (RFF)
    > +0. Design goal:
    > + To filter and/or to steer packet based on specific pattern match to a
    > + specific context to support application/networking stack driven receive
    > + processing.
    > +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS
    > + and to support netdev feature NETIF_F_NTUPLE aka ARFS.

    Hi, Parav. Sorry for not responding to this in time due to other things recently.

    Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the packet steering on the device side.
    I think manually configured rules should have higher priority than ARFS automatic configuration.
    This behavior is intuitive and consistent with other drivers. Therefore, the processing chain on a rx packet is:
    {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}.

    There are also priorities within set_rxnfc and ARFS respectively.
    1. For set_rxnfc, which has the exact match and the mask match. Exact matches should have higher priority.
    Suppose there are two rules,
    rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1
    rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2
    .For recieved rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1.

    The rules of set_rxnfc come from manual configuration, the number of these rules is small and
    we may not need group grouping for this. And ctrlq can meet the configuration rate,

    2. For ARFS, which only has the exact match.
    For ARFS, since there is only one matching rule for a certain flow, so there is no need for group?
    We may need different types of tables, such as UDPv4 flow table, TCPv4 flow table to speed up the lookup for differect flow types.
    Besides, the high rate and large number of configuration rules means that we need flow vq.

    Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are still some differences,
    such as configuration rate and quantity. So do we need add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS)
    for set_rxnfc and ARFS respectively, and ARFS can choose flow vq?
    In this way, is it more conducive to advancing the work of RFF (such as accelerating the advancement of set_rxnfc)?

    > +
    > +### 3.4.1 control path
    > +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec
    > + or even more. Hence flow filter operations must be done over a queueing
    > + interface using one or more queues.

    This is only for ARFS, for devices that only want to support set_rxnfc,
    they don't provide VIRTIO_NET_F_ARFS and consider implementing flow vq.

    > +2. The device should be able to expose one or more supported flow filter queue
    > + count and its start vq index to the driver.
    > +3. As each device may be operating for different performance characteristic,
    > + start vq index and count may be different for each device. Secondly, it is
    > + inefficient for device to provide flow filters capabilities via a config space
    > + region. Hence, the device should be able to share these attributes using
    > + dma interface, instead of transport registers.
    > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > + will likely create these queues when flow filters are enabled.

    I understand that the number of flow vqs is not reflected in
    max_virtqueue_pairs. And a new vq is created at runtime, is this
    supported in the existing virtio spec?

    > +5. Flow filter operations are often accelerated by device in a hardware. Ability
    > + to handle them on a queue other than control vq is desired. This achieves near
    > + zero modifications to existing implementations to add new operations on new
    > + purpose built queues (similar to transmit and receive queue).
    > +6. The filter masks are optional; the device should be able to expose if it
    > + support filter masks.
    > +7. The driver may want to have priority among group of flow entries; to facilitate
    > + the device support grouping flow filter entries by a notion of a group. Each
    > + group defines priority in processing flow.
    > +8. The driver and group owner driver should be able to query supported device
    > + limits for the flow filter entries.
    > +
    > +### 3.4.2 flow operations path
    > +1. The driver should be able to define a receive packet match criteria, an
    > + action and a destination for a packet.

    When the user does not specify a destination when configuring a rule, do
    we need a default destination?

    > For example, an ipv4 packet with a
    > + multicast address to be steered to the receive vq 0. The second example is
    > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > + be steered to receive vq 10.
    > +2. The match criteria should include exact tuple fields well-defined such as mac
    > + address, IP addresses, tcp/udp ports, etc.
    > +3. The match criteria should also optionally include the field mask.
    > +4. The match criteria may optionally also include specific packet byte offset
    > + pattern, match length, mask instead of RFC defined fields.
    > + length, and matching pattern, which may not be defined in the standard RFC.

    Is there a description error here?

    > +5. Action includes (a) dropping or (b) forwarding the packet.
    > +6. Destination is a receive virtqueue index.

    Since the concept of RSS context does not yet exist in the virtio spec.
    Did we say that we also support carrying RSS context information when
    negotiating the RFF feature? For example, RSS context configuration
    commands and structures, etc.

    Or support RSS context functionality as a separate feature in another thread?

    A related point to consider is that when a user inserts a rule with an
    rss context, the RSS context cannot be deleted, otherwise the device
    will cause undefined behavior.

    Thanks!

    > +7. The device should process packet receive filters programmed via control vq
    > + commands first in the processing chain.
    > +7. The device should process RFF entries before RSS configuration, i.e.,
    > + when there is a miss on the RFF entry, RSS configuration applies if it exists.
    > +8. To summarize the processing chain on a rx packet is:
    > + {mac,vlan,promisc rx filters} -> {receive flow filters} -> {rss/hash config}.
    > +9. If multiple entries are programmed which has overlapping attributes for a
    > + received packet, the driver to define the location/priority of the entry.
    > +10. The filter entries are usually short in size of few tens of bytes,
    > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > + high, hence supplying fields inside the queue descriptor is preferred for
    > + up to a certain fixed size, say 56 bytes.
    > +11. A flow filter entry consists of (a) match criteria, (b) action,
    > + (c) destination and (d) a unique 32 bit flow id, all supplied by the
    > + driver.
    > +12. The driver should be able to query and delete flow filter entry by the
    > + the device by the flow id.
    > +
    > +### 3.4.3 interface example
    > +
    > +Flow filter capabilities to query using a DMA interface:
    > +
    > +```
    > +struct flow_filter_capabilities {
    > + u8 flow_groups;
    > + u16 num_flow_filter_vqs;
    > + u16 start_vq_index;
    > + u32 max_flow_filters_per_group;
    > + u32 max_flow_filters;
    > + u64 supported_packet_field_mask_bmap[4];
    > +};
    > +
    > +
    > +```
    > +
    > +1. Flow filter entry add/modify, delete:
    > +
    > +struct virtio_net_rff_add_modify {
    > + u8 flow_op;
    > + u8 group_id;
    > + u8 padding[2];
    > + le32 flow_id;
    > + struct match_criteria mc;
    > + struct destination dest;
    > + struct action action;
    > +
    > + struct match_criteria mask; /* optional */
    > +};
    > +
    > +2. Flow filter entry delete:
    > +struct virtio_net_rff_delete {
    > + u8 flow_op;
    > + u8 group_id;
    > + u8 padding[2];
    > + le32 flow_id;
    > +};
    > --
    > 2.26.2



  • 67.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-02-2023 15:26
    On Mon, Jul 24, 2023 at 06:34:19AM +0300, Parav Pandit wrote: > Add virtio net device requirements for receive flow filters. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v1->v2: > - split setup and operations requirements > - added design goal > - worded requirements more precisely > v0->v1: > - fixed comments from Heng Li > - renamed receive flow steering to receive flow filters > - clarified byte offset in match criteria > --- > net-workstream/features-1.4.md 105 +++++++++++++++++++++++++++++++++ > 1 file changed, 105 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 27a7886..d228462 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. > 1. Device counters visible to the driver > 2. Low latency tx and rx virtqueues for PCI transport > 3. Virtqueue notification coalescing re-arming support > +4 Virtqueue receive flow filters (RFF) > > # 3. Requirements > ## 3.1 Device counters > @@ -175,3 +176,107 @@ struct vnet_rx_completion { > notifications until the driver rearms the notifications of the virtqueue. > 2. When the driver rearms the notification of the virtqueue, the device > to notify again if notification coalescing conditions are met. > + > +## 3.4 Virtqueue receive flow filters (RFF) > +0. Design goal: > + To filter and/or to steer packet based on specific pattern match to a > + specific context to support application/networking stack driven receive > + processing. > +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS > + and to support netdev feature NETIF_F_NTUPLE aka ARFS. Hi, Parav. Sorry for not responding to this in time due to other things recently. Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the packet steering on the device side. I think manually configured rules should have higher priority than ARFS automatic configuration. This behavior is intuitive and consistent with other drivers. Therefore, the processing chain on a rx packet is: {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}. There are also priorities within set_rxnfc and ARFS respectively. 1. For set_rxnfc, which has the exact match and the mask match. Exact matches should have higher priority. Suppose there are two rules, rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1 rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1. The rules of set_rxnfc come from manual configuration, the number of these rules is small and we may not need group grouping for this. And ctrlq can meet the configuration rate, 2. For ARFS, which only has the exact match. For ARFS, since there is only one matching rule for a certain flow, so there is no need for group? We may need different types of tables, such as UDPv4 flow table, TCPv4 flow table to speed up the lookup for differect flow types. Besides, the high rate and large number of configuration rules means that we need flow vq. Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are still some differences, such as configuration rate and quantity. So do we need add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for set_rxnfc and ARFS respectively, and ARFS can choose flow vq? In this way, is it more conducive to advancing the work of RFF (such as accelerating the advancement of set_rxnfc)? > + > +### 3.4.1 control path > +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec > + or even more. Hence flow filter operations must be done over a queueing > + interface using one or more queues. This is only for ARFS, for devices that only want to support set_rxnfc, they don't provide VIRTIO_NET_F_ARFS and consider implementing flow vq. > +2. The device should be able to expose one or more supported flow filter queue > + count and its start vq index to the driver. > +3. As each device may be operating for different performance characteristic, > + start vq index and count may be different for each device. Secondly, it is > + inefficient for device to provide flow filters capabilities via a config space > + region. Hence, the device should be able to share these attributes using > + dma interface, instead of transport registers. > +4. Since flow filters are enabled much later in the driver life cycle, driver > + will likely create these queues when flow filters are enabled. I understand that the number of flow vqs is not reflected in max_virtqueue_pairs. And a new vq is created at runtime, is this supported in the existing virtio spec? > +5. Flow filter operations are often accelerated by device in a hardware. Ability > + to handle them on a queue other than control vq is desired. This achieves near > + zero modifications to existing implementations to add new operations on new > + purpose built queues (similar to transmit and receive queue). > +6. The filter masks are optional; the device should be able to expose if it > + support filter masks. > +7. The driver may want to have priority among group of flow entries; to facilitate > + the device support grouping flow filter entries by a notion of a group. Each > + group defines priority in processing flow. > +8. The driver and group owner driver should be able to query supported device > + limits for the flow filter entries. > + > +### 3.4.2 flow operations path > +1. The driver should be able to define a receive packet match criteria, an > + action and a destination for a packet. When the user does not specify a destination when configuring a rule, do we need a default destination? > For example, an ipv4 packet with a > + multicast address to be steered to the receive vq 0. The second example is > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > + be steered to receive vq 10. > +2. The match criteria should include exact tuple fields well-defined such as mac > + address, IP addresses, tcp/udp ports, etc. > +3. The match criteria should also optionally include the field mask. > +4. The match criteria may optionally also include specific packet byte offset > + pattern, match length, mask instead of RFC defined fields. > + length, and matching pattern, which may not be defined in the standard RFC. Is there a description error here? > +5. Action includes (a) dropping or (b) forwarding the packet. > +6. Destination is a receive virtqueue index. Since the concept of RSS context does not yet exist in the virtio spec. Did we say that we also support carrying RSS context information when negotiating the RFF feature? For example, RSS context configuration commands and structures, etc. Or support RSS context functionality as a separate feature in another thread? A related point to consider is that when a user inserts a rule with an rss context, the RSS context cannot be deleted, otherwise the device will cause undefined behavior. Thanks! > +7. The device should process packet receive filters programmed via control vq > + commands first in the processing chain. > +7. The device should process RFF entries before RSS configuration, i.e., > + when there is a miss on the RFF entry, RSS configuration applies if it exists. > +8. To summarize the processing chain on a rx packet is: > + {mac,vlan,promisc rx filters} -> {receive flow filters} -> {rss/hash config}. > +9. If multiple entries are programmed which has overlapping attributes for a > + received packet, the driver to define the location/priority of the entry. > +10. The filter entries are usually short in size of few tens of bytes, > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is > + high, hence supplying fields inside the queue descriptor is preferred for > + up to a certain fixed size, say 56 bytes. > +11. A flow filter entry consists of (a) match criteria, (b) action, > + (c) destination and (d) a unique 32 bit flow id, all supplied by the > + driver. > +12. The driver should be able to query and delete flow filter entry by the > + the device by the flow id. > + > +### 3.4.3 interface example > + > +Flow filter capabilities to query using a DMA interface: > + > +``` > +struct flow_filter_capabilities { > + u8 flow_groups; > + u16 num_flow_filter_vqs; > + u16 start_vq_index; > + u32 max_flow_filters_per_group; > + u32 max_flow_filters; > + u64 supported_packet_field_mask_bmap[4]; > +}; > + > + > +``` > + > +1. Flow filter entry add/modify, delete: > + > +struct virtio_net_rff_add_modify { > + u8 flow_op; > + u8 group_id; > + u8 padding[2]; > + le32 flow_id; > + struct match_criteria mc; > + struct destination dest; > + struct action action; > + > + struct match_criteria mask; /* optional */ > +}; > + > +2. Flow filter entry delete: > +struct virtio_net_rff_delete { > + u8 flow_op; > + u8 group_id; > + u8 padding[2]; > + le32 flow_id; > +}; > -- > 2.26.2


  • 68.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-03-2023 10:00

    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Wednesday, August 2, 2023 8:55 PM

    > Hi, Parav. Sorry for not responding to this in time due to other things recently.
    >
    > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the
    > packet steering on the device side.
    > I think manually configured rules should have higher priority than ARFS
    > automatic configuration.
    > This behavior is intuitive and consistent with other drivers. Therefore, the
    > processing chain on a rx packet is:
    > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}.
    >
    Correct.
    Within the RFF context, the priority among multiple RFF entries is governed by the concept of group.
    So above two users of the RFF will create two groups and assign priority to it and achieve the desired processing order.

    > There are also priorities within set_rxnfc and ARFS respectively.
    > 1. For set_rxnfc, which has the exact match and the mask match. Exact
    > matches should have higher priority.
    > Suppose there are two rules,
    > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1
    > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved
    > rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1.
    >
    Yes. Driver should be able to set the priority within the group as well for above scenario.

    > The rules of set_rxnfc come from manual configuration, the number of these
    > rules is small and we may not need group grouping for this. And ctrlq can meet
    > the configuration rate,
    >
    Yes, but having single interface for two use cases enables the device implementation to not build driver interface specific infra.
    Both can be handled by unified interface.

    > 2. For ARFS, which only has the exact match.
    > For ARFS, since there is only one matching rule for a certain flow, so there is no
    > need for group?
    Groups are defining the priority between two types of rules.
    Within ARFS domain we don't need group.

    However instead of starting with only two limiting groups, it is better to have some flexibility for supporting multiple groups.
    A device can device one/two or more groups.
    So in future if a use case arise, interface wont be limiting to it.

    > We may need different types of tables, such as UDPv4 flow table, TCPv4 flow
    > table to speed up the lookup for differect flow types.
    > Besides, the high rate and large number of configuration rules means that we
    > need flow vq.
    >
    Yes, I am not sure if those tables should be exposed to the driver.
    Thinking that a device may be able to decide on table count which it may be able to create.

    > Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are
    > still some differences, such as configuration rate and quantity. So do we need
    > add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for
    > set_rxnfc and ARFS respectively, and ARFS can choose flow vq?
    Not really, as one interface can fullfil both the needs without attaching it to a specific OS interface.

    > In this way, is it more conducive to advancing the work of RFF (such as
    > accelerating the advancement of set_rxnfc)?
    >
    Both the use cases are equally immediately usable so we can advance it easily using single interface now.

    > > +
    > > +### 3.4.1 control path
    > > +1. The number of flow filter operations/sec can range from 100k/sec to
    > 1M/sec
    > > + or even more. Hence flow filter operations must be done over a queueing
    > > + interface using one or more queues.
    >
    > This is only for ARFS, for devices that only want to support set_rxnfc, they don't
    > provide VIRTIO_NET_F_ARFS and consider implementing flow vq.
    >
    Well once the device implements flow vq, it will service both cases.
    A simple device implementation who only case for RXNFC, can implement flowvq in semi-software serving very small number of req/sec.

    > > +2. The device should be able to expose one or more supported flow filter
    > queue
    > > + count and its start vq index to the driver.
    > > +3. As each device may be operating for different performance characteristic,
    > > + start vq index and count may be different for each device. Secondly, it is
    > > + inefficient for device to provide flow filters capabilities via a config space
    > > + region. Hence, the device should be able to share these attributes using
    > > + dma interface, instead of transport registers.
    > > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > > + will likely create these queues when flow filters are enabled.
    >
    > I understand that the number of flow vqs is not reflected in
    > max_virtqueue_pairs. And a new vq is created at runtime, is this supported in
    > the existing virtio spec?
    >
    We are extending the virtio-spec now if it is not supported.
    But yes, it is supported because max_virtqueue_pairs will not expose the count of flow_vq.
    (similar to how we did the AQ).
    And flowvq anyway is not _pair_ so we cannot expose there anyway.

    > > +5. Flow filter operations are often accelerated by device in a hardware.
    > Ability
    > > + to handle them on a queue other than control vq is desired. This achieves
    > near
    > > + zero modifications to existing implementations to add new operations on
    > new
    > > + purpose built queues (similar to transmit and receive queue).
    > > +6. The filter masks are optional; the device should be able to expose if it
    > > + support filter masks.
    > > +7. The driver may want to have priority among group of flow entries; to
    > facilitate
    > > + the device support grouping flow filter entries by a notion of a group. Each
    > > + group defines priority in processing flow.
    > > +8. The driver and group owner driver should be able to query supported
    > device
    > > + limits for the flow filter entries.
    > > +
    > > +### 3.4.2 flow operations path
    > > +1. The driver should be able to define a receive packet match criteria, an
    > > + action and a destination for a packet.
    >
    > When the user does not specify a destination when configuring a rule, do we
    > need a default destination?
    >
    I think we should not give such option to driver.
    A human/end user may not have the destination, but driver should be able to decide a predictable destination.

    > > For example, an ipv4 packet with a
    > > + multicast address to be steered to the receive vq 0. The second example is
    > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > > + be steered to receive vq 10.
    > > +2. The match criteria should include exact tuple fields well-defined such as
    > mac
    > > + address, IP addresses, tcp/udp ports, etc.
    > > +3. The match criteria should also optionally include the field mask.
    > > +4. The match criteria may optionally also include specific packet byte offset
    > > + pattern, match length, mask instead of RFC defined fields.
    > > + length, and matching pattern, which may not be defined in the standard
    > RFC.
    >
    > Is there a description error here?
    >
    Didn't follow your comment. Do you mean there is an error in above description?

    > > +5. Action includes (a) dropping or (b) forwarding the packet.
    > > +6. Destination is a receive virtqueue index.
    >
    > Since the concept of RSS context does not yet exist in the virtio spec.
    > Did we say that we also support carrying RSS context information when
    > negotiating the RFF feature? For example, RSS context configuration commands
    > and structures, etc.
    >
    > Or support RSS context functionality as a separate feature in another thread?
    >
    Support RSS context as separate feature.

    > A related point to consider is that when a user inserts a rule with an rss context,
    > the RSS context cannot be deleted, otherwise the device will cause undefined
    > behavior.
    >
    Yes, for now we can keep rss context as separate feature.



  • 69.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-03-2023 10:00
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Wednesday, August 2, 2023 8:55 PM > Hi, Parav. Sorry for not responding to this in time due to other things recently. > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the > packet steering on the device side. > I think manually configured rules should have higher priority than ARFS > automatic configuration. > This behavior is intuitive and consistent with other drivers. Therefore, the > processing chain on a rx packet is: > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}. > Correct. Within the RFF context, the priority among multiple RFF entries is governed by the concept of group. So above two users of the RFF will create two groups and assign priority to it and achieve the desired processing order. > There are also priorities within set_rxnfc and ARFS respectively. > 1. For set_rxnfc, which has the exact match and the mask match. Exact > matches should have higher priority. > Suppose there are two rules, > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1 > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved > rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1. > Yes. Driver should be able to set the priority within the group as well for above scenario. > The rules of set_rxnfc come from manual configuration, the number of these > rules is small and we may not need group grouping for this. And ctrlq can meet > the configuration rate, > Yes, but having single interface for two use cases enables the device implementation to not build driver interface specific infra. Both can be handled by unified interface. > 2. For ARFS, which only has the exact match. > For ARFS, since there is only one matching rule for a certain flow, so there is no > need for group? Groups are defining the priority between two types of rules. Within ARFS domain we don't need group. However instead of starting with only two limiting groups, it is better to have some flexibility for supporting multiple groups. A device can device one/two or more groups. So in future if a use case arise, interface wont be limiting to it. > We may need different types of tables, such as UDPv4 flow table, TCPv4 flow > table to speed up the lookup for differect flow types. > Besides, the high rate and large number of configuration rules means that we > need flow vq. > Yes, I am not sure if those tables should be exposed to the driver. Thinking that a device may be able to decide on table count which it may be able to create. > Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are > still some differences, such as configuration rate and quantity. So do we need > add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for > set_rxnfc and ARFS respectively, and ARFS can choose flow vq? Not really, as one interface can fullfil both the needs without attaching it to a specific OS interface. > In this way, is it more conducive to advancing the work of RFF (such as > accelerating the advancement of set_rxnfc)? > Both the use cases are equally immediately usable so we can advance it easily using single interface now. > > + > > +### 3.4.1 control path > > +1. The number of flow filter operations/sec can range from 100k/sec to > 1M/sec > > + or even more. Hence flow filter operations must be done over a queueing > > + interface using one or more queues. > > This is only for ARFS, for devices that only want to support set_rxnfc, they don't > provide VIRTIO_NET_F_ARFS and consider implementing flow vq. > Well once the device implements flow vq, it will service both cases. A simple device implementation who only case for RXNFC, can implement flowvq in semi-software serving very small number of req/sec. > > +2. The device should be able to expose one or more supported flow filter > queue > > + count and its start vq index to the driver. > > +3. As each device may be operating for different performance characteristic, > > + start vq index and count may be different for each device. Secondly, it is > > + inefficient for device to provide flow filters capabilities via a config space > > + region. Hence, the device should be able to share these attributes using > > + dma interface, instead of transport registers. > > +4. Since flow filters are enabled much later in the driver life cycle, driver > > + will likely create these queues when flow filters are enabled. > > I understand that the number of flow vqs is not reflected in > max_virtqueue_pairs. And a new vq is created at runtime, is this supported in > the existing virtio spec? > We are extending the virtio-spec now if it is not supported. But yes, it is supported because max_virtqueue_pairs will not expose the count of flow_vq. (similar to how we did the AQ). And flowvq anyway is not _pair_ so we cannot expose there anyway. > > +5. Flow filter operations are often accelerated by device in a hardware. > Ability > > + to handle them on a queue other than control vq is desired. This achieves > near > > + zero modifications to existing implementations to add new operations on > new > > + purpose built queues (similar to transmit and receive queue). > > +6. The filter masks are optional; the device should be able to expose if it > > + support filter masks. > > +7. The driver may want to have priority among group of flow entries; to > facilitate > > + the device support grouping flow filter entries by a notion of a group. Each > > + group defines priority in processing flow. > > +8. The driver and group owner driver should be able to query supported > device > > + limits for the flow filter entries. > > + > > +### 3.4.2 flow operations path > > +1. The driver should be able to define a receive packet match criteria, an > > + action and a destination for a packet. > > When the user does not specify a destination when configuring a rule, do we > need a default destination? > I think we should not give such option to driver. A human/end user may not have the destination, but driver should be able to decide a predictable destination. > > For example, an ipv4 packet with a > > + multicast address to be steered to the receive vq 0. The second example is > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > > + be steered to receive vq 10. > > +2. The match criteria should include exact tuple fields well-defined such as > mac > > + address, IP addresses, tcp/udp ports, etc. > > +3. The match criteria should also optionally include the field mask. > > +4. The match criteria may optionally also include specific packet byte offset > > + pattern, match length, mask instead of RFC defined fields. > > + length, and matching pattern, which may not be defined in the standard > RFC. > > Is there a description error here? > Didn't follow your comment. Do you mean there is an error in above description? > > +5. Action includes (a) dropping or (b) forwarding the packet. > > +6. Destination is a receive virtqueue index. > > Since the concept of RSS context does not yet exist in the virtio spec. > Did we say that we also support carrying RSS context information when > negotiating the RFF feature? For example, RSS context configuration commands > and structures, etc. > > Or support RSS context functionality as a separate feature in another thread? > Support RSS context as separate feature. > A related point to consider is that when a user inserts a rule with an rss context, > the RSS context cannot be deleted, otherwise the device will cause undefined > behavior. > Yes, for now we can keep rss context as separate feature.


  • 70.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-03-2023 13:07
    On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote:
    >
    > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > Sent: Wednesday, August 2, 2023 8:55 PM
    >
    > > Hi, Parav. Sorry for not responding to this in time due to other things recently.
    > >
    > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the
    > > packet steering on the device side.
    > > I think manually configured rules should have higher priority than ARFS
    > > automatic configuration.
    > > This behavior is intuitive and consistent with other drivers. Therefore, the
    > > processing chain on a rx packet is:
    > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}.
    > >
    > Correct.
    > Within the RFF context, the priority among multiple RFF entries is governed by the concept of group.
    > So above two users of the RFF will create two groups and assign priority to it and achieve the desired processing order.

    OK, we intend to use group as the concept of rule storage. Therefore, we
    should have two priorities:
    1. one is the priority of the group, this field is not seen in
    the structure virtio_net_rff_add_modify, or the group id implies the priority
    (for example, the smaller the priority, the higher the priority?)?
    2. the other is the priority of the rule, the current structure
    virtio_net_rff_add_modify is still missing this.

    I think we should add some more texts in the next version describing how
    matching rules are prioritized and how groups work. This is important
    for RFF.

    I also want to confirm that for the interaction between the driver and
    the device, the driver only needs to tell the priority of the device group and the
    priority of the rule, and we should not reflect how the device stores
    and queries rules (such as tcam or some acl acceleration solutions) ?

    >
    > > There are also priorities within set_rxnfc and ARFS respectively.
    > > 1. For set_rxnfc, which has the exact match and the mask match. Exact
    > > matches should have higher priority.
    > > Suppose there are two rules,
    > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1
    > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved
    > > rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1.
    > >
    > Yes. Driver should be able to set the priority within the group as well for above scenario.

    But here I am wrong, it should be:
    rx packets whose src-ip is 1.1.1.1 and dst-port is 8989, should match rule2 instead of rule1.

    >
    > > The rules of set_rxnfc come from manual configuration, the number of these
    > > rules is small and we may not need group grouping for this. And ctrlq can meet
    > > the configuration rate,
    > >
    > Yes, but having single interface for two use cases enables the device implementation to not build driver interface specific infra.
    > Both can be handled by unified interface.

    I agree:) Is ctrlq an option when the num_flow_filter_vqs
    exposed by the device is 0?

    >
    > > 2. For ARFS, which only has the exact match.
    > > For ARFS, since there is only one matching rule for a certain flow, so there is no
    > > need for group?
    > Groups are defining the priority between two types of rules.
    > Within ARFS domain we don't need group.

    Yes, ARFS doesn't need group.

    >
    > However instead of starting with only two limiting groups, it is better to have some flexibility for supporting multiple groups.
    > A device can device one/two or more groups.
    > So in future if a use case arise, interface wont be limiting to it.

    Ok. This works.

    >
    > > We may need different types of tables, such as UDPv4 flow table, TCPv4 flow
    > > table to speed up the lookup for differect flow types.
    > > Besides, the high rate and large number of configuration rules means that we
    > > need flow vq.
    > >
    > Yes, I am not sure if those tables should be exposed to the driver.
    > Thinking that a device may be able to decide on table count which it may be able to create.

    o how to store and what method to use to store rules is determined by
    the device, just like my question above. If yes, I think this is a good
    way to work, because it allows for increased flexibility in device
    implementation.

    >
    > > Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are
    > > still some differences, such as configuration rate and quantity. So do we need
    > > add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for
    > > set_rxnfc and ARFS respectively, and ARFS can choose flow vq?
    > Not really, as one interface can fullfil both the needs without attaching it to a specific OS interface.
    >

    Ok!

    > > In this way, is it more conducive to advancing the work of RFF (such as
    > > accelerating the advancement of set_rxnfc)?
    > >
    > Both the use cases are equally immediately usable so we can advance it easily using single interface now.
    >
    > > > +
    > > > +### 3.4.1 control path
    > > > +1. The number of flow filter operations/sec can range from 100k/sec to
    > > 1M/sec
    > > > + or even more. Hence flow filter operations must be done over a queueing
    > > > + interface using one or more queues.
    > >
    > > This is only for ARFS, for devices that only want to support set_rxnfc, they don't
    > > provide VIRTIO_NET_F_ARFS and consider implementing flow vq.
    > >
    > Well once the device implements flow vq, it will service both cases.
    > A simple device implementation who only case for RXNFC, can implement flowvq in semi-software serving very small number of req/sec.
    >

    When the device does not provide flow vq, whether the driver can use
    ctrlq to the device.

    > > > +2. The device should be able to expose one or more supported flow filter
    > > queue
    > > > + count and its start vq index to the driver.
    > > > +3. As each device may be operating for different performance characteristic,
    > > > + start vq index and count may be different for each device. Secondly, it is
    > > > + inefficient for device to provide flow filters capabilities via a config space
    > > > + region. Hence, the device should be able to share these attributes using
    > > > + dma interface, instead of transport registers.
    > > > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > > > + will likely create these queues when flow filters are enabled.
    > >
    > > I understand that the number of flow vqs is not reflected in
    > > max_virtqueue_pairs. And a new vq is created at runtime, is this supported in
    > > the existing virtio spec?
    > >
    > We are extending the virtio-spec now if it is not supported.
    > But yes, it is supported because max_virtqueue_pairs will not expose the count of flow_vq.
    > (similar to how we did the AQ).
    > And flowvq anyway is not _pair_ so we cannot expose there anyway.

    Absolutely.

    >
    > > > +5. Flow filter operations are often accelerated by device in a hardware.
    > > Ability
    > > > + to handle them on a queue other than control vq is desired. This achieves
    > > near
    > > > + zero modifications to existing implementations to add new operations on
    > > new
    > > > + purpose built queues (similar to transmit and receive queue).
    > > > +6. The filter masks are optional; the device should be able to expose if it
    > > > + support filter masks.
    > > > +7. The driver may want to have priority among group of flow entries; to
    > > facilitate
    > > > + the device support grouping flow filter entries by a notion of a group. Each
    > > > + group defines priority in processing flow.
    > > > +8. The driver and group owner driver should be able to query supported
    > > device
    > > > + limits for the flow filter entries.
    > > > +
    > > > +### 3.4.2 flow operations path
    > > > +1. The driver should be able to define a receive packet match criteria, an
    > > > + action and a destination for a packet.
    > >
    > > When the user does not specify a destination when configuring a rule, do we
    > > need a default destination?
    > >
    > I think we should not give such option to driver.
    > A human/end user may not have the destination, but driver should be able to decide a predictable destination.

    Yes, that's what I mean:), and I said "we" for "the driver."

    >
    > > > For example, an ipv4 packet with a
    > > > + multicast address to be steered to the receive vq 0. The second example is
    > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > > > + be steered to receive vq 10.
    > > > +2. The match criteria should include exact tuple fields well-defined such as
    > > mac
    > > > + address, IP addresses, tcp/udp ports, etc.
    > > > +3. The match criteria should also optionally include the field mask.
    > > > +4. The match criteria may optionally also include specific packet byte offset
    > > > + pattern, match length, mask instead of RFC defined fields.
    > > > + length, and matching pattern, which may not be defined in the standard
    > > RFC.
    > >
    > > Is there a description error here?
    > >
    > Didn't follow your comment. Do you mean there is an error in above description?

    I don't quite understand what "specific packet byte offset pattern" means :(

    >
    > > > +5. Action includes (a) dropping or (b) forwarding the packet.
    > > > +6. Destination is a receive virtqueue index.
    > >
    > > Since the concept of RSS context does not yet exist in the virtio spec.
    > > Did we say that we also support carrying RSS context information when
    > > negotiating the RFF feature? For example, RSS context configuration commands
    > > and structures, etc.
    > >
    > > Or support RSS context functionality as a separate feature in another thread?
    > >
    > Support RSS context as separate feature.

    Ok, humbly asking if your work plan includes this part, do you need me
    to share your work, such as rss context.

    Thanks a lot!

    >
    > > A related point to consider is that when a user inserts a rule with an rss context,
    > > the RSS context cannot be deleted, otherwise the device will cause undefined
    > > behavior.
    > >
    > Yes, for now we can keep rss context as separate feature.



  • 71.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-03-2023 13:08
    On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote: > > > From: Heng Qi <hengqi@linux.alibaba.com> > > Sent: Wednesday, August 2, 2023 8:55 PM > > > Hi, Parav. Sorry for not responding to this in time due to other things recently. > > > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the > > packet steering on the device side. > > I think manually configured rules should have higher priority than ARFS > > automatic configuration. > > This behavior is intuitive and consistent with other drivers. Therefore, the > > processing chain on a rx packet is: > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}. > > > Correct. > Within the RFF context, the priority among multiple RFF entries is governed by the concept of group. > So above two users of the RFF will create two groups and assign priority to it and achieve the desired processing order. OK, we intend to use group as the concept of rule storage. Therefore, we should have two priorities: 1. one is the priority of the group, this field is not seen in the structure virtio_net_rff_add_modify, or the group id implies the priority (for example, the smaller the priority, the higher the priority?)? 2. the other is the priority of the rule, the current structure virtio_net_rff_add_modify is still missing this. I think we should add some more texts in the next version describing how matching rules are prioritized and how groups work. This is important for RFF. I also want to confirm that for the interaction between the driver and the device, the driver only needs to tell the priority of the device group and the priority of the rule, and we should not reflect how the device stores and queries rules (such as tcam or some acl acceleration solutions) ? > > > There are also priorities within set_rxnfc and ARFS respectively. > > 1. For set_rxnfc, which has the exact match and the mask match. Exact > > matches should have higher priority. > > Suppose there are two rules, > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1 > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved > > rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1. > > > Yes. Driver should be able to set the priority within the group as well for above scenario. But here I am wrong, it should be: rx packets whose src-ip is 1.1.1.1 and dst-port is 8989, should match rule2 instead of rule1. > > > The rules of set_rxnfc come from manual configuration, the number of these > > rules is small and we may not need group grouping for this. And ctrlq can meet > > the configuration rate, > > > Yes, but having single interface for two use cases enables the device implementation to not build driver interface specific infra. > Both can be handled by unified interface. I agree:) Is ctrlq an option when the num_flow_filter_vqs exposed by the device is 0? > > > 2. For ARFS, which only has the exact match. > > For ARFS, since there is only one matching rule for a certain flow, so there is no > > need for group? > Groups are defining the priority between two types of rules. > Within ARFS domain we don't need group. Yes, ARFS doesn't need group. > > However instead of starting with only two limiting groups, it is better to have some flexibility for supporting multiple groups. > A device can device one/two or more groups. > So in future if a use case arise, interface wont be limiting to it. Ok. This works. > > > We may need different types of tables, such as UDPv4 flow table, TCPv4 flow > > table to speed up the lookup for differect flow types. > > Besides, the high rate and large number of configuration rules means that we > > need flow vq. > > > Yes, I am not sure if those tables should be exposed to the driver. > Thinking that a device may be able to decide on table count which it may be able to create. o how to store and what method to use to store rules is determined by the device, just like my question above. If yes, I think this is a good way to work, because it allows for increased flexibility in device implementation. > > > Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are > > still some differences, such as configuration rate and quantity. So do we need > > add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for > > set_rxnfc and ARFS respectively, and ARFS can choose flow vq? > Not really, as one interface can fullfil both the needs without attaching it to a specific OS interface. > Ok! > > In this way, is it more conducive to advancing the work of RFF (such as > > accelerating the advancement of set_rxnfc)? > > > Both the use cases are equally immediately usable so we can advance it easily using single interface now. > > > > + > > > +### 3.4.1 control path > > > +1. The number of flow filter operations/sec can range from 100k/sec to > > 1M/sec > > > + or even more. Hence flow filter operations must be done over a queueing > > > + interface using one or more queues. > > > > This is only for ARFS, for devices that only want to support set_rxnfc, they don't > > provide VIRTIO_NET_F_ARFS and consider implementing flow vq. > > > Well once the device implements flow vq, it will service both cases. > A simple device implementation who only case for RXNFC, can implement flowvq in semi-software serving very small number of req/sec. > When the device does not provide flow vq, whether the driver can use ctrlq to the device. > > > +2. The device should be able to expose one or more supported flow filter > > queue > > > + count and its start vq index to the driver. > > > +3. As each device may be operating for different performance characteristic, > > > + start vq index and count may be different for each device. Secondly, it is > > > + inefficient for device to provide flow filters capabilities via a config space > > > + region. Hence, the device should be able to share these attributes using > > > + dma interface, instead of transport registers. > > > +4. Since flow filters are enabled much later in the driver life cycle, driver > > > + will likely create these queues when flow filters are enabled. > > > > I understand that the number of flow vqs is not reflected in > > max_virtqueue_pairs. And a new vq is created at runtime, is this supported in > > the existing virtio spec? > > > We are extending the virtio-spec now if it is not supported. > But yes, it is supported because max_virtqueue_pairs will not expose the count of flow_vq. > (similar to how we did the AQ). > And flowvq anyway is not _pair_ so we cannot expose there anyway. Absolutely. > > > > +5. Flow filter operations are often accelerated by device in a hardware. > > Ability > > > + to handle them on a queue other than control vq is desired. This achieves > > near > > > + zero modifications to existing implementations to add new operations on > > new > > > + purpose built queues (similar to transmit and receive queue). > > > +6. The filter masks are optional; the device should be able to expose if it > > > + support filter masks. > > > +7. The driver may want to have priority among group of flow entries; to > > facilitate > > > + the device support grouping flow filter entries by a notion of a group. Each > > > + group defines priority in processing flow. > > > +8. The driver and group owner driver should be able to query supported > > device > > > + limits for the flow filter entries. > > > + > > > +### 3.4.2 flow operations path > > > +1. The driver should be able to define a receive packet match criteria, an > > > + action and a destination for a packet. > > > > When the user does not specify a destination when configuring a rule, do we > > need a default destination? > > > I think we should not give such option to driver. > A human/end user may not have the destination, but driver should be able to decide a predictable destination. Yes, that's what I mean:), and I said "we" for "the driver." > > > > For example, an ipv4 packet with a > > > + multicast address to be steered to the receive vq 0. The second example is > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > > > + be steered to receive vq 10. > > > +2. The match criteria should include exact tuple fields well-defined such as > > mac > > > + address, IP addresses, tcp/udp ports, etc. > > > +3. The match criteria should also optionally include the field mask. > > > +4. The match criteria may optionally also include specific packet byte offset > > > + pattern, match length, mask instead of RFC defined fields. > > > + length, and matching pattern, which may not be defined in the standard > > RFC. > > > > Is there a description error here? > > > Didn't follow your comment. Do you mean there is an error in above description? I don't quite understand what "specific packet byte offset pattern" means :( > > > > +5. Action includes (a) dropping or (b) forwarding the packet. > > > +6. Destination is a receive virtqueue index. > > > > Since the concept of RSS context does not yet exist in the virtio spec. > > Did we say that we also support carrying RSS context information when > > negotiating the RFF feature? For example, RSS context configuration commands > > and structures, etc. > > > > Or support RSS context functionality as a separate feature in another thread? > > > Support RSS context as separate feature. Ok, humbly asking if your work plan includes this part, do you need me to share your work, such as rss context. Thanks a lot! > > > A related point to consider is that when a user inserts a rule with an rss context, > > the RSS context cannot be deleted, otherwise the device will cause undefined > > behavior. > > > Yes, for now we can keep rss context as separate feature.


  • 72.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 06:21

    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Thursday, August 3, 2023 6:37 PM
    >
    > On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote:
    > >
    > > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > > Sent: Wednesday, August 2, 2023 8:55 PM
    > >
    > > > Hi, Parav. Sorry for not responding to this in time due to other things
    > recently.
    > > >
    > > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will
    > > > affect the packet steering on the device side.
    > > > I think manually configured rules should have higher priority than
    > > > ARFS automatic configuration.
    > > > This behavior is intuitive and consistent with other drivers.
    > > > Therefore, the processing chain on a rx packet is:
    > > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}.
    > > >
    > > Correct.
    > > Within the RFF context, the priority among multiple RFF entries is governed by
    > the concept of group.
    > > So above two users of the RFF will create two groups and assign priority to it
    > and achieve the desired processing order.
    >
    > OK, we intend to use group as the concept of rule storage. Therefore, we
    > should have two priorities:
    > 1. one is the priority of the group, this field is not seen in the structure
    > virtio_net_rff_add_modify, or the group id implies the priority (for example, the
    > smaller the priority, the higher the priority?)?
    Good catch, yes, we need priority assignment to the group.
    Hence, we need group add/delete command as well.

    > 2. the other is the priority of the rule, the current structure
    > virtio_net_rff_add_modify is still missing this.
    >
    Adding it.
    > I think we should add some more texts in the next version describing how
    > matching rules are prioritized and how groups work. This is important for RFF.
    >
    > I also want to confirm that for the interaction between the driver and the
    > device, the driver only needs to tell the priority of the device group and the
    > priority of the rule, and we should not reflect how the device stores and
    > queries rules (such as tcam or some acl acceleration solutions) ?
    >
    Correct, we try to keep thing as abstract as possible.
    > >
    > > > There are also priorities within set_rxnfc and ARFS respectively.
    > > > 1. For set_rxnfc, which has the exact match and the mask match.
    > > > Exact matches should have higher priority.
    > > > Suppose there are two rules,
    > > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1
    > > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For
    > > > recieved rx packets whose src-ip is 1.1.1.1, should match rule2 instead of
    > rule1.
    > > >
    > > Yes. Driver should be able to set the priority within the group as well for above
    > scenario.
    >
    > But here I am wrong, it should be:
    > rx packets whose src-ip is 1.1.1.1 and dst-port is 8989, should match rule2
    > instead of rule1.
    >
    That is what you wrote, here both the rules are within one group.
    And rule2 should have higher priority than rule1.

    > >
    > > > The rules of set_rxnfc come from manual configuration, the number of
    > > > these rules is small and we may not need group grouping for this.
    > > > And ctrlq can meet the configuration rate,
    > > >
    > > Yes, but having single interface for two use cases enables the device
    > implementation to not build driver interface specific infra.
    > > Both can be handled by unified interface.
    >
    > I agree:) Is ctrlq an option when the num_flow_filter_vqs exposed by the device
    > is 0?
    >
    I think ctrlq would in above scenario offers, a quick start to the feature by making flowvq optional.
    It comes with the tradeoff of perf, and dual code implementation.
    Which is fine, the problem occurs is when flowvq is also supported, and flowvq is created, if driver issues command on cvq and flowvq both, synchronizing these on the device is nightmare.

    So if we can draft it as:
    If flowvq is created, RFF must be done only on flowvq by the driver.
    If flowvq is supported, but not created, cvq can be used.

    In that case it is flexible enough for device to implement with reasonable trade off.

    > >
    > > > 2. For ARFS, which only has the exact match.
    > > > For ARFS, since there is only one matching rule for a certain flow,
    > > > so there is no need for group?
    > > Groups are defining the priority between two types of rules.
    > > Within ARFS domain we don't need group.
    >
    > Yes, ARFS doesn't need group.
    >
    > >
    > > However instead of starting with only two limiting groups, it is better to have
    > some flexibility for supporting multiple groups.
    > > A device can device one/two or more groups.
    > > So in future if a use case arise, interface wont be limiting to it.
    >
    > Ok. This works.
    >
    > >
    > > > We may need different types of tables, such as UDPv4 flow table,
    > > > TCPv4 flow table to speed up the lookup for differect flow types.
    > > > Besides, the high rate and large number of configuration rules means
    > > > that we need flow vq.
    > > >
    > > Yes, I am not sure if those tables should be exposed to the driver.
    > > Thinking that a device may be able to decide on table count which it may be
    > able to create.
    >
    > o how to store and what method to use to store rules is determined by the
    > device, just like my question above. If yes, I think this is a good way to work,
    > because it allows for increased flexibility in device implementation.
    >
    Right, we can possibly avoid concept of table in spec.
    So far I see below objects:

    1. group with priority (priority applies among the group)
    2. flow entries with priority (priority applies to entries within the group)
    > >
    > > > Therefore, although set_rxnfc and ARFS share a set of
    > > > infrastructure, there are still some differences, such as
    > > > configuration rate and quantity. So do we need add two features
    > > > (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for set_rxnfc and ARFS
    > respectively, and ARFS can choose flow vq?
    > > Not really, as one interface can fullfil both the needs without attaching it to a
    > specific OS interface.
    > >
    >
    > Ok!
    >
    > > > In this way, is it more conducive to advancing the work of RFF (such
    > > > as accelerating the advancement of set_rxnfc)?
    > > >
    > > Both the use cases are equally immediately usable so we can advance it easily
    > using single interface now.
    > >
    > > > > +
    > > > > +### 3.4.1 control path
    > > > > +1. The number of flow filter operations/sec can range from
    > > > > +100k/sec to
    > > > 1M/sec
    > > > > + or even more. Hence flow filter operations must be done over a
    > queueing
    > > > > + interface using one or more queues.
    > > >
    > > > This is only for ARFS, for devices that only want to support
    > > > set_rxnfc, they don't provide VIRTIO_NET_F_ARFS and consider
    > implementing flow vq.
    > > >
    > > Well once the device implements flow vq, it will service both cases.
    > > A simple device implementation who only case for RXNFC, can implement
    > flowvq in semi-software serving very small number of req/sec.
    > >
    >
    > When the device does not provide flow vq, whether the driver can use ctrlq to
    > the device.
    >
    Please see above.

    > > > > +2. The device should be able to expose one or more supported flow
    > > > > +filter
    > > > queue
    > > > > + count and its start vq index to the driver.
    > > > > +3. As each device may be operating for different performance
    > characteristic,
    > > > > + start vq index and count may be different for each device. Secondly, it is
    > > > > + inefficient for device to provide flow filters capabilities via a config
    > space
    > > > > + region. Hence, the device should be able to share these attributes using
    > > > > + dma interface, instead of transport registers.
    > > > > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > > > > + will likely create these queues when flow filters are enabled.
    > > >
    > > > I understand that the number of flow vqs is not reflected in
    > > > max_virtqueue_pairs. And a new vq is created at runtime, is this
    > > > supported in the existing virtio spec?
    > > >
    > > We are extending the virtio-spec now if it is not supported.
    > > But yes, it is supported because max_virtqueue_pairs will not expose the
    > count of flow_vq.
    > > (similar to how we did the AQ).
    > > And flowvq anyway is not _pair_ so we cannot expose there anyway.
    >
    > Absolutely.
    >
    > >
    > > > > +5. Flow filter operations are often accelerated by device in a hardware.
    > > > Ability
    > > > > + to handle them on a queue other than control vq is desired.
    > > > > + This achieves
    > > > near
    > > > > + zero modifications to existing implementations to add new
    > > > > + operations on
    > > > new
    > > > > + purpose built queues (similar to transmit and receive queue).
    > > > > +6. The filter masks are optional; the device should be able to expose if it
    > > > > + support filter masks.
    > > > > +7. The driver may want to have priority among group of flow
    > > > > +entries; to
    > > > facilitate
    > > > > + the device support grouping flow filter entries by a notion of a group.
    > Each
    > > > > + group defines priority in processing flow.
    > > > > +8. The driver and group owner driver should be able to query
    > > > > +supported
    > > > device
    > > > > + limits for the flow filter entries.
    > > > > +
    > > > > +### 3.4.2 flow operations path
    > > > > +1. The driver should be able to define a receive packet match criteria, an
    > > > > + action and a destination for a packet.
    > > >
    > > > When the user does not specify a destination when configuring a
    > > > rule, do we need a default destination?
    > > >
    > > I think we should not give such option to driver.
    > > A human/end user may not have the destination, but driver should be able to
    > decide a predictable destination.
    >
    > Yes, that's what I mean:), and I said "we" for "the driver."
    >
    Ok. got it.

    > >
    > > > > For example, an ipv4 packet with a
    > > > > + multicast address to be steered to the receive vq 0. The second
    > example is
    > > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > > > > + be steered to receive vq 10.
    > > > > +2. The match criteria should include exact tuple fields
    > > > > +well-defined such as
    > > > mac
    > > > > + address, IP addresses, tcp/udp ports, etc.
    > > > > +3. The match criteria should also optionally include the field mask.
    > > > > +4. The match criteria may optionally also include specific packet byte
    > offset
    > > > > + pattern, match length, mask instead of RFC defined fields.
    > > > > + length, and matching pattern, which may not be defined in the
    > > > > +standard
    > > > RFC.
    > > >
    > > > Is there a description error here?
    > > >
    > > Didn't follow your comment. Do you mean there is an error in above
    > description?
    >
    > I don't quite understand what "specific packet byte offset pattern" means :(
    >
    Time to make it verbose. :)
    For any new/undefined protocol, if user wants to say,
    In a packet at byte offset A, if you find pattern == 0x800, drop the packet.
    In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10.

    I didn't consider multiple matching patterns for now, though it is very useful.
    I am inclined to keep the option of _any_match to take up later, for now to do only well defined match?
    WDYT?


    > >
    > > > > +5. Action includes (a) dropping or (b) forwarding the packet.
    > > > > +6. Destination is a receive virtqueue index.
    > > >
    > > > Since the concept of RSS context does not yet exist in the virtio spec.
    > > > Did we say that we also support carrying RSS context information
    > > > when negotiating the RFF feature? For example, RSS context
    > > > configuration commands and structures, etc.
    > > >
    > > > Or support RSS context functionality as a separate feature in another
    > thread?
    > > >
    > > Support RSS context as separate feature.
    >
    > Ok, humbly asking if your work plan includes this part, do you need me to share
    > your work, such as rss context.
    >
    Lets keep rss context in future work as its orthogonal to it.
    Yes, your help for rss context will be good.
    Lets first finish RFF as its bit in the advance stage.

    > Thanks a lot!
    >
    > >
    > > > A related point to consider is that when a user inserts a rule with
    > > > an rss context, the RSS context cannot be deleted, otherwise the
    > > > device will cause undefined behavior.
    > > >
    > > Yes, for now we can keep rss context as separate feature.



  • 73.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 06:21
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Thursday, August 3, 2023 6:37 PM > > On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote: > > > > > From: Heng Qi <hengqi@linux.alibaba.com> > > > Sent: Wednesday, August 2, 2023 8:55 PM > > > > > Hi, Parav. Sorry for not responding to this in time due to other things > recently. > > > > > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will > > > affect the packet steering on the device side. > > > I think manually configured rules should have higher priority than > > > ARFS automatic configuration. > > > This behavior is intuitive and consistent with other drivers. > > > Therefore, the processing chain on a rx packet is: > > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}. > > > > > Correct. > > Within the RFF context, the priority among multiple RFF entries is governed by > the concept of group. > > So above two users of the RFF will create two groups and assign priority to it > and achieve the desired processing order. > > OK, we intend to use group as the concept of rule storage. Therefore, we > should have two priorities: > 1. one is the priority of the group, this field is not seen in the structure > virtio_net_rff_add_modify, or the group id implies the priority (for example, the > smaller the priority, the higher the priority?)? Good catch, yes, we need priority assignment to the group. Hence, we need group add/delete command as well. > 2. the other is the priority of the rule, the current structure > virtio_net_rff_add_modify is still missing this. > Adding it. > I think we should add some more texts in the next version describing how > matching rules are prioritized and how groups work. This is important for RFF. > > I also want to confirm that for the interaction between the driver and the > device, the driver only needs to tell the priority of the device group and the > priority of the rule, and we should not reflect how the device stores and > queries rules (such as tcam or some acl acceleration solutions) ? > Correct, we try to keep thing as abstract as possible. > > > > > There are also priorities within set_rxnfc and ARFS respectively. > > > 1. For set_rxnfc, which has the exact match and the mask match. > > > Exact matches should have higher priority. > > > Suppose there are two rules, > > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1 > > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For > > > recieved rx packets whose src-ip is 1.1.1.1, should match rule2 instead of > rule1. > > > > > Yes. Driver should be able to set the priority within the group as well for above > scenario. > > But here I am wrong, it should be: > rx packets whose src-ip is 1.1.1.1 and dst-port is 8989, should match rule2 > instead of rule1. > That is what you wrote, here both the rules are within one group. And rule2 should have higher priority than rule1. > > > > > The rules of set_rxnfc come from manual configuration, the number of > > > these rules is small and we may not need group grouping for this. > > > And ctrlq can meet the configuration rate, > > > > > Yes, but having single interface for two use cases enables the device > implementation to not build driver interface specific infra. > > Both can be handled by unified interface. > > I agree:) Is ctrlq an option when the num_flow_filter_vqs exposed by the device > is 0? > I think ctrlq would in above scenario offers, a quick start to the feature by making flowvq optional. It comes with the tradeoff of perf, and dual code implementation. Which is fine, the problem occurs is when flowvq is also supported, and flowvq is created, if driver issues command on cvq and flowvq both, synchronizing these on the device is nightmare. So if we can draft it as: If flowvq is created, RFF must be done only on flowvq by the driver. If flowvq is supported, but not created, cvq can be used. In that case it is flexible enough for device to implement with reasonable trade off. > > > > > 2. For ARFS, which only has the exact match. > > > For ARFS, since there is only one matching rule for a certain flow, > > > so there is no need for group? > > Groups are defining the priority between two types of rules. > > Within ARFS domain we don't need group. > > Yes, ARFS doesn't need group. > > > > > However instead of starting with only two limiting groups, it is better to have > some flexibility for supporting multiple groups. > > A device can device one/two or more groups. > > So in future if a use case arise, interface wont be limiting to it. > > Ok. This works. > > > > > > We may need different types of tables, such as UDPv4 flow table, > > > TCPv4 flow table to speed up the lookup for differect flow types. > > > Besides, the high rate and large number of configuration rules means > > > that we need flow vq. > > > > > Yes, I am not sure if those tables should be exposed to the driver. > > Thinking that a device may be able to decide on table count which it may be > able to create. > > o how to store and what method to use to store rules is determined by the > device, just like my question above. If yes, I think this is a good way to work, > because it allows for increased flexibility in device implementation. > Right, we can possibly avoid concept of table in spec. So far I see below objects: 1. group with priority (priority applies among the group) 2. flow entries with priority (priority applies to entries within the group) > > > > > Therefore, although set_rxnfc and ARFS share a set of > > > infrastructure, there are still some differences, such as > > > configuration rate and quantity. So do we need add two features > > > (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for set_rxnfc and ARFS > respectively, and ARFS can choose flow vq? > > Not really, as one interface can fullfil both the needs without attaching it to a > specific OS interface. > > > > Ok! > > > > In this way, is it more conducive to advancing the work of RFF (such > > > as accelerating the advancement of set_rxnfc)? > > > > > Both the use cases are equally immediately usable so we can advance it easily > using single interface now. > > > > > > + > > > > +### 3.4.1 control path > > > > +1. The number of flow filter operations/sec can range from > > > > +100k/sec to > > > 1M/sec > > > > + or even more. Hence flow filter operations must be done over a > queueing > > > > + interface using one or more queues. > > > > > > This is only for ARFS, for devices that only want to support > > > set_rxnfc, they don't provide VIRTIO_NET_F_ARFS and consider > implementing flow vq. > > > > > Well once the device implements flow vq, it will service both cases. > > A simple device implementation who only case for RXNFC, can implement > flowvq in semi-software serving very small number of req/sec. > > > > When the device does not provide flow vq, whether the driver can use ctrlq to > the device. > Please see above. > > > > +2. The device should be able to expose one or more supported flow > > > > +filter > > > queue > > > > + count and its start vq index to the driver. > > > > +3. As each device may be operating for different performance > characteristic, > > > > + start vq index and count may be different for each device. Secondly, it is > > > > + inefficient for device to provide flow filters capabilities via a config > space > > > > + region. Hence, the device should be able to share these attributes using > > > > + dma interface, instead of transport registers. > > > > +4. Since flow filters are enabled much later in the driver life cycle, driver > > > > + will likely create these queues when flow filters are enabled. > > > > > > I understand that the number of flow vqs is not reflected in > > > max_virtqueue_pairs. And a new vq is created at runtime, is this > > > supported in the existing virtio spec? > > > > > We are extending the virtio-spec now if it is not supported. > > But yes, it is supported because max_virtqueue_pairs will not expose the > count of flow_vq. > > (similar to how we did the AQ). > > And flowvq anyway is not _pair_ so we cannot expose there anyway. > > Absolutely. > > > > > > > +5. Flow filter operations are often accelerated by device in a hardware. > > > Ability > > > > + to handle them on a queue other than control vq is desired. > > > > + This achieves > > > near > > > > + zero modifications to existing implementations to add new > > > > + operations on > > > new > > > > + purpose built queues (similar to transmit and receive queue). > > > > +6. The filter masks are optional; the device should be able to expose if it > > > > + support filter masks. > > > > +7. The driver may want to have priority among group of flow > > > > +entries; to > > > facilitate > > > > + the device support grouping flow filter entries by a notion of a group. > Each > > > > + group defines priority in processing flow. > > > > +8. The driver and group owner driver should be able to query > > > > +supported > > > device > > > > + limits for the flow filter entries. > > > > + > > > > +### 3.4.2 flow operations path > > > > +1. The driver should be able to define a receive packet match criteria, an > > > > + action and a destination for a packet. > > > > > > When the user does not specify a destination when configuring a > > > rule, do we need a default destination? > > > > > I think we should not give such option to driver. > > A human/end user may not have the destination, but driver should be able to > decide a predictable destination. > > Yes, that's what I mean:), and I said "we" for "the driver." > Ok. got it. > > > > > > For example, an ipv4 packet with a > > > > + multicast address to be steered to the receive vq 0. The second > example is > > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > > > > + be steered to receive vq 10. > > > > +2. The match criteria should include exact tuple fields > > > > +well-defined such as > > > mac > > > > + address, IP addresses, tcp/udp ports, etc. > > > > +3. The match criteria should also optionally include the field mask. > > > > +4. The match criteria may optionally also include specific packet byte > offset > > > > + pattern, match length, mask instead of RFC defined fields. > > > > + length, and matching pattern, which may not be defined in the > > > > +standard > > > RFC. > > > > > > Is there a description error here? > > > > > Didn't follow your comment. Do you mean there is an error in above > description? > > I don't quite understand what "specific packet byte offset pattern" means :( > Time to make it verbose. :) For any new/undefined protocol, if user wants to say, In a packet at byte offset A, if you find pattern == 0x800, drop the packet. In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10. I didn't consider multiple matching patterns for now, though it is very useful. I am inclined to keep the option of _any_match to take up later, for now to do only well defined match? WDYT? > > > > > > +5. Action includes (a) dropping or (b) forwarding the packet. > > > > +6. Destination is a receive virtqueue index. > > > > > > Since the concept of RSS context does not yet exist in the virtio spec. > > > Did we say that we also support carrying RSS context information > > > when negotiating the RFF feature? For example, RSS context > > > configuration commands and structures, etc. > > > > > > Or support RSS context functionality as a separate feature in another > thread? > > > > > Support RSS context as separate feature. > > Ok, humbly asking if your work plan includes this part, do you need me to share > your work, such as rss context. > Lets keep rss context in future work as its orthogonal to it. Yes, your help for rss context will be good. Lets first finish RFF as its bit in the advance stage. > Thanks a lot! > > > > > > A related point to consider is that when a user inserts a rule with > > > an rss context, the RSS context cannot be deleted, otherwise the > > > device will cause undefined behavior. > > > > > Yes, for now we can keep rss context as separate feature.


  • 74.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 07:17
    On Fri, Aug 04, 2023 at 06:20:53AM +0000, Parav Pandit wrote:
    >
    > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > Sent: Thursday, August 3, 2023 6:37 PM
    > >
    > > On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote:
    > > >
    > > > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > > > Sent: Wednesday, August 2, 2023 8:55 PM
    > > >
    > > > > Hi, Parav. Sorry for not responding to this in time due to other things
    > > recently.
    > > > >
    > > > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will
    > > > > affect the packet steering on the device side.
    > > > > I think manually configured rules should have higher priority than
    > > > > ARFS automatic configuration.
    > > > > This behavior is intuitive and consistent with other drivers.
    > > > > Therefore, the processing chain on a rx packet is:
    > > > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}.
    > > > >
    > > > Correct.
    > > > Within the RFF context, the priority among multiple RFF entries is governed by
    > > the concept of group.
    > > > So above two users of the RFF will create two groups and assign priority to it
    > > and achieve the desired processing order.
    > >
    > > OK, we intend to use group as the concept of rule storage. Therefore, we
    > > should have two priorities:
    > > 1. one is the priority of the group, this field is not seen in the structure
    > > virtio_net_rff_add_modify, or the group id implies the priority (for example, the
    > > smaller the priority, the higher the priority?)?
    > Good catch, yes, we need priority assignment to the group.
    > Hence, we need group add/delete command as well.

    Yes, then group priority can be in group add command.

    >
    > > 2. the other is the priority of the rule, the current structure
    > > virtio_net_rff_add_modify is still missing this.
    > >
    > Adding it.

    Ok.

    > > I think we should add some more texts in the next version describing how
    > > matching rules are prioritized and how groups work. This is important for RFF.
    > >
    > > I also want to confirm that for the interaction between the driver and the
    > > device, the driver only needs to tell the priority of the device group and the
    > > priority of the rule, and we should not reflect how the device stores and
    > > queries rules (such as tcam or some acl acceleration solutions) ?
    > >
    > Correct, we try to keep thing as abstract as possible.

    Good! The implementation of the priority of the rules and groups can be reserved in
    the driver implementation.

    > > >
    > > > > There are also priorities within set_rxnfc and ARFS respectively.
    > > > > 1. For set_rxnfc, which has the exact match and the mask match.
    > > > > Exact matches should have higher priority.
    > > > > Suppose there are two rules,
    > > > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1
    > > > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For
    > > > > recieved rx packets whose src-ip is 1.1.1.1, should match rule2 instead of
    > > rule1.
    > > > >
    > > > Yes. Driver should be able to set the priority within the group as well for above
    > > scenario.
    > >
    > > But here I am wrong, it should be:
    > > rx packets whose src-ip is 1.1.1.1 and dst-port is 8989, should match rule2
    > > instead of rule1.
    > >
    > That is what you wrote, here both the rules are within one group.
    > And rule2 should have higher priority than rule1.
    >

    Yes.

    > > >
    > > > > The rules of set_rxnfc come from manual configuration, the number of
    > > > > these rules is small and we may not need group grouping for this.
    > > > > And ctrlq can meet the configuration rate,
    > > > >
    > > > Yes, but having single interface for two use cases enables the device
    > > implementation to not build driver interface specific infra.
    > > > Both can be handled by unified interface.
    > >
    > > I agree:) Is ctrlq an option when the num_flow_filter_vqs exposed by the device
    > > is 0?
    > >
    > I think ctrlq would in above scenario offers, a quick start to the feature by making flowvq optional.
    > It comes with the tradeoff of perf, and dual code implementation.
    > Which is fine, the problem occurs is when flowvq is also supported, and flowvq is created, if driver issues command on cvq and flowvq both, synchronizing these on the device is nightmare.
    >
    > So if we can draft it as:
    > If flowvq is created, RFF must be done only on flowvq by the driver.
    > If flowvq is supported, but not created, cvq can be used.
    >
    > In that case it is flexible enough for device to implement with reasonable trade off.

    Yes, but a small update:
    If flowvq is created, RFF must be done only on flowvq by the driver.
    If flowvq is supported, but not created, cvq can be used.
    If flowvq is not supported, cvq is used.

    >
    > > >
    > > > > 2. For ARFS, which only has the exact match.
    > > > > For ARFS, since there is only one matching rule for a certain flow,
    > > > > so there is no need for group?
    > > > Groups are defining the priority between two types of rules.
    > > > Within ARFS domain we don't need group.
    > >
    > > Yes, ARFS doesn't need group.
    > >
    > > >
    > > > However instead of starting with only two limiting groups, it is better to have
    > > some flexibility for supporting multiple groups.
    > > > A device can device one/two or more groups.
    > > > So in future if a use case arise, interface wont be limiting to it.
    > >
    > > Ok. This works.
    > >
    > > >
    > > > > We may need different types of tables, such as UDPv4 flow table,
    > > > > TCPv4 flow table to speed up the lookup for differect flow types.
    > > > > Besides, the high rate and large number of configuration rules means
    > > > > that we need flow vq.
    > > > >
    > > > Yes, I am not sure if those tables should be exposed to the driver.
    > > > Thinking that a device may be able to decide on table count which it may be
    > > able to create.
    > >
    > > o how to store and what method to use to store rules is determined by the
    > > device, just like my question above. If yes, I think this is a good way to work,
    > > because it allows for increased flexibility in device implementation.
    > >
    > Right, we can possibly avoid concept of table in spec.
    > So far I see below objects:
    >
    > 1. group with priority (priority applies among the group)
    > 2. flow entries with priority (priority applies to entries within the group)

    Yes!

    > > >
    > > > > Therefore, although set_rxnfc and ARFS share a set of
    > > > > infrastructure, there are still some differences, such as
    > > > > configuration rate and quantity. So do we need add two features
    > > > > (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for set_rxnfc and ARFS
    > > respectively, and ARFS can choose flow vq?
    > > > Not really, as one interface can fullfil both the needs without attaching it to a
    > > specific OS interface.
    > > >
    > >
    > > Ok!
    > >
    > > > > In this way, is it more conducive to advancing the work of RFF (such
    > > > > as accelerating the advancement of set_rxnfc)?
    > > > >
    > > > Both the use cases are equally immediately usable so we can advance it easily
    > > using single interface now.
    > > >
    > > > > > +
    > > > > > +### 3.4.1 control path
    > > > > > +1. The number of flow filter operations/sec can range from
    > > > > > +100k/sec to
    > > > > 1M/sec
    > > > > > + or even more. Hence flow filter operations must be done over a
    > > queueing
    > > > > > + interface using one or more queues.
    > > > >
    > > > > This is only for ARFS, for devices that only want to support
    > > > > set_rxnfc, they don't provide VIRTIO_NET_F_ARFS and consider
    > > implementing flow vq.
    > > > >
    > > > Well once the device implements flow vq, it will service both cases.
    > > > A simple device implementation who only case for RXNFC, can implement
    > > flowvq in semi-software serving very small number of req/sec.
    > > >
    > >
    > > When the device does not provide flow vq, whether the driver can use ctrlq to
    > > the device.
    > >
    > Please see above.
    >
    > > > > > +2. The device should be able to expose one or more supported flow
    > > > > > +filter
    > > > > queue
    > > > > > + count and its start vq index to the driver.
    > > > > > +3. As each device may be operating for different performance
    > > characteristic,
    > > > > > + start vq index and count may be different for each device. Secondly, it is
    > > > > > + inefficient for device to provide flow filters capabilities via a config
    > > space
    > > > > > + region. Hence, the device should be able to share these attributes using
    > > > > > + dma interface, instead of transport registers.
    > > > > > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > > > > > + will likely create these queues when flow filters are enabled.
    > > > >
    > > > > I understand that the number of flow vqs is not reflected in
    > > > > max_virtqueue_pairs. And a new vq is created at runtime, is this
    > > > > supported in the existing virtio spec?
    > > > >
    > > > We are extending the virtio-spec now if it is not supported.
    > > > But yes, it is supported because max_virtqueue_pairs will not expose the
    > > count of flow_vq.
    > > > (similar to how we did the AQ).
    > > > And flowvq anyway is not _pair_ so we cannot expose there anyway.
    > >
    > > Absolutely.
    > >
    > > >
    > > > > > +5. Flow filter operations are often accelerated by device in a hardware.
    > > > > Ability
    > > > > > + to handle them on a queue other than control vq is desired.
    > > > > > + This achieves
    > > > > near
    > > > > > + zero modifications to existing implementations to add new
    > > > > > + operations on
    > > > > new
    > > > > > + purpose built queues (similar to transmit and receive queue).
    > > > > > +6. The filter masks are optional; the device should be able to expose if it
    > > > > > + support filter masks.
    > > > > > +7. The driver may want to have priority among group of flow
    > > > > > +entries; to
    > > > > facilitate
    > > > > > + the device support grouping flow filter entries by a notion of a group.
    > > Each
    > > > > > + group defines priority in processing flow.
    > > > > > +8. The driver and group owner driver should be able to query
    > > > > > +supported
    > > > > device
    > > > > > + limits for the flow filter entries.
    > > > > > +
    > > > > > +### 3.4.2 flow operations path
    > > > > > +1. The driver should be able to define a receive packet match criteria, an
    > > > > > + action and a destination for a packet.
    > > > >
    > > > > When the user does not specify a destination when configuring a
    > > > > rule, do we need a default destination?
    > > > >
    > > > I think we should not give such option to driver.
    > > > A human/end user may not have the destination, but driver should be able to
    > > decide a predictable destination.
    > >
    > > Yes, that's what I mean:), and I said "we" for "the driver."
    > >
    > Ok. got it.
    >
    > > >
    > > > > > For example, an ipv4 packet with a
    > > > > > + multicast address to be steered to the receive vq 0. The second
    > > example is
    > > > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > > > > > + be steered to receive vq 10.
    > > > > > +2. The match criteria should include exact tuple fields
    > > > > > +well-defined such as
    > > > > mac
    > > > > > + address, IP addresses, tcp/udp ports, etc.
    > > > > > +3. The match criteria should also optionally include the field mask.
    > > > > > +4. The match criteria may optionally also include specific packet byte
    > > offset
    > > > > > + pattern, match length, mask instead of RFC defined fields.
    > > > > > + length, and matching pattern, which may not be defined in the
    > > > > > +standard
    > > > > RFC.
    > > > >
    > > > > Is there a description error here?
    > > > >
    > > > Didn't follow your comment. Do you mean there is an error in above
    > > description?
    > >
    > > I don't quite understand what "specific packet byte offset pattern" means :(
    > >
    > Time to make it verbose. :)

    Oh, I got it, I think we should let people see more details in the next version.:)

    > For any new/undefined protocol, if user wants to say,
    > In a packet at byte offset A, if you find pattern == 0x800, drop the packet.
    > In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10.
    >
    > I didn't consider multiple matching patterns for now, though it is very useful.
    > I am inclined to keep the option of _any_match to take up later, for now to do only well defined match?
    > WDYT?

    I think it's ok, but we don't seem to need the length here. But in any
    case, I don't think it's that important to have or not ^^

    >
    >
    > > >
    > > > > > +5. Action includes (a) dropping or (b) forwarding the packet.
    > > > > > +6. Destination is a receive virtqueue index.
    > > > >
    > > > > Since the concept of RSS context does not yet exist in the virtio spec.
    > > > > Did we say that we also support carrying RSS context information
    > > > > when negotiating the RFF feature? For example, RSS context
    > > > > configuration commands and structures, etc.
    > > > >
    > > > > Or support RSS context functionality as a separate feature in another
    > > thread?
    > > > >
    > > > Support RSS context as separate feature.
    > >
    > > Ok, humbly asking if your work plan includes this part, do you need me to share
    > > your work, such as rss context.
    > >
    > Lets keep rss context in future work as its orthogonal to it.
    > Yes, your help for rss context will be good.
    > Lets first finish RFF as its bit in the advance stage.

    Yes! But I worry that a spec that references a concept that doesn't
    exist in the existing spec may be blocked, so if you have no objections,
    I will push this work forward in the near future to help RFF possible
    blocking.

    Thanks.

    >
    > > Thanks a lot!
    > >
    > > >
    > > > > A related point to consider is that when a user inserts a rule with
    > > > > an rss context, the RSS context cannot be deleted, otherwise the
    > > > > device will cause undefined behavior.
    > > > >
    > > > Yes, for now we can keep rss context as separate feature.



  • 75.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 07:18
    On Fri, Aug 04, 2023 at 06:20:53AM +0000, Parav Pandit wrote: > > > From: Heng Qi <hengqi@linux.alibaba.com> > > Sent: Thursday, August 3, 2023 6:37 PM > > > > On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote: > > > > > > > From: Heng Qi <hengqi@linux.alibaba.com> > > > > Sent: Wednesday, August 2, 2023 8:55 PM > > > > > > > Hi, Parav. Sorry for not responding to this in time due to other things > > recently. > > > > > > > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will > > > > affect the packet steering on the device side. > > > > I think manually configured rules should have higher priority than > > > > ARFS automatic configuration. > > > > This behavior is intuitive and consistent with other drivers. > > > > Therefore, the processing chain on a rx packet is: > > > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}. > > > > > > > Correct. > > > Within the RFF context, the priority among multiple RFF entries is governed by > > the concept of group. > > > So above two users of the RFF will create two groups and assign priority to it > > and achieve the desired processing order. > > > > OK, we intend to use group as the concept of rule storage. Therefore, we > > should have two priorities: > > 1. one is the priority of the group, this field is not seen in the structure > > virtio_net_rff_add_modify, or the group id implies the priority (for example, the > > smaller the priority, the higher the priority?)? > Good catch, yes, we need priority assignment to the group. > Hence, we need group add/delete command as well. Yes, then group priority can be in group add command. > > > 2. the other is the priority of the rule, the current structure > > virtio_net_rff_add_modify is still missing this. > > > Adding it. Ok. > > I think we should add some more texts in the next version describing how > > matching rules are prioritized and how groups work. This is important for RFF. > > > > I also want to confirm that for the interaction between the driver and the > > device, the driver only needs to tell the priority of the device group and the > > priority of the rule, and we should not reflect how the device stores and > > queries rules (such as tcam or some acl acceleration solutions) ? > > > Correct, we try to keep thing as abstract as possible. Good! The implementation of the priority of the rules and groups can be reserved in the driver implementation. > > > > > > > There are also priorities within set_rxnfc and ARFS respectively. > > > > 1. For set_rxnfc, which has the exact match and the mask match. > > > > Exact matches should have higher priority. > > > > Suppose there are two rules, > > > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1 > > > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For > > > > recieved rx packets whose src-ip is 1.1.1.1, should match rule2 instead of > > rule1. > > > > > > > Yes. Driver should be able to set the priority within the group as well for above > > scenario. > > > > But here I am wrong, it should be: > > rx packets whose src-ip is 1.1.1.1 and dst-port is 8989, should match rule2 > > instead of rule1. > > > That is what you wrote, here both the rules are within one group. > And rule2 should have higher priority than rule1. > Yes. > > > > > > > The rules of set_rxnfc come from manual configuration, the number of > > > > these rules is small and we may not need group grouping for this. > > > > And ctrlq can meet the configuration rate, > > > > > > > Yes, but having single interface for two use cases enables the device > > implementation to not build driver interface specific infra. > > > Both can be handled by unified interface. > > > > I agree:) Is ctrlq an option when the num_flow_filter_vqs exposed by the device > > is 0? > > > I think ctrlq would in above scenario offers, a quick start to the feature by making flowvq optional. > It comes with the tradeoff of perf, and dual code implementation. > Which is fine, the problem occurs is when flowvq is also supported, and flowvq is created, if driver issues command on cvq and flowvq both, synchronizing these on the device is nightmare. > > So if we can draft it as: > If flowvq is created, RFF must be done only on flowvq by the driver. > If flowvq is supported, but not created, cvq can be used. > > In that case it is flexible enough for device to implement with reasonable trade off. Yes, but a small update: If flowvq is created, RFF must be done only on flowvq by the driver. If flowvq is supported, but not created, cvq can be used. If flowvq is not supported, cvq is used. > > > > > > > > 2. For ARFS, which only has the exact match. > > > > For ARFS, since there is only one matching rule for a certain flow, > > > > so there is no need for group? > > > Groups are defining the priority between two types of rules. > > > Within ARFS domain we don't need group. > > > > Yes, ARFS doesn't need group. > > > > > > > > However instead of starting with only two limiting groups, it is better to have > > some flexibility for supporting multiple groups. > > > A device can device one/two or more groups. > > > So in future if a use case arise, interface wont be limiting to it. > > > > Ok. This works. > > > > > > > > > We may need different types of tables, such as UDPv4 flow table, > > > > TCPv4 flow table to speed up the lookup for differect flow types. > > > > Besides, the high rate and large number of configuration rules means > > > > that we need flow vq. > > > > > > > Yes, I am not sure if those tables should be exposed to the driver. > > > Thinking that a device may be able to decide on table count which it may be > > able to create. > > > > o how to store and what method to use to store rules is determined by the > > device, just like my question above. If yes, I think this is a good way to work, > > because it allows for increased flexibility in device implementation. > > > Right, we can possibly avoid concept of table in spec. > So far I see below objects: > > 1. group with priority (priority applies among the group) > 2. flow entries with priority (priority applies to entries within the group) Yes! > > > > > > > Therefore, although set_rxnfc and ARFS share a set of > > > > infrastructure, there are still some differences, such as > > > > configuration rate and quantity. So do we need add two features > > > > (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for set_rxnfc and ARFS > > respectively, and ARFS can choose flow vq? > > > Not really, as one interface can fullfil both the needs without attaching it to a > > specific OS interface. > > > > > > > Ok! > > > > > > In this way, is it more conducive to advancing the work of RFF (such > > > > as accelerating the advancement of set_rxnfc)? > > > > > > > Both the use cases are equally immediately usable so we can advance it easily > > using single interface now. > > > > > > > > + > > > > > +### 3.4.1 control path > > > > > +1. The number of flow filter operations/sec can range from > > > > > +100k/sec to > > > > 1M/sec > > > > > + or even more. Hence flow filter operations must be done over a > > queueing > > > > > + interface using one or more queues. > > > > > > > > This is only for ARFS, for devices that only want to support > > > > set_rxnfc, they don't provide VIRTIO_NET_F_ARFS and consider > > implementing flow vq. > > > > > > > Well once the device implements flow vq, it will service both cases. > > > A simple device implementation who only case for RXNFC, can implement > > flowvq in semi-software serving very small number of req/sec. > > > > > > > When the device does not provide flow vq, whether the driver can use ctrlq to > > the device. > > > Please see above. > > > > > > +2. The device should be able to expose one or more supported flow > > > > > +filter > > > > queue > > > > > + count and its start vq index to the driver. > > > > > +3. As each device may be operating for different performance > > characteristic, > > > > > + start vq index and count may be different for each device. Secondly, it is > > > > > + inefficient for device to provide flow filters capabilities via a config > > space > > > > > + region. Hence, the device should be able to share these attributes using > > > > > + dma interface, instead of transport registers. > > > > > +4. Since flow filters are enabled much later in the driver life cycle, driver > > > > > + will likely create these queues when flow filters are enabled. > > > > > > > > I understand that the number of flow vqs is not reflected in > > > > max_virtqueue_pairs. And a new vq is created at runtime, is this > > > > supported in the existing virtio spec? > > > > > > > We are extending the virtio-spec now if it is not supported. > > > But yes, it is supported because max_virtqueue_pairs will not expose the > > count of flow_vq. > > > (similar to how we did the AQ). > > > And flowvq anyway is not _pair_ so we cannot expose there anyway. > > > > Absolutely. > > > > > > > > > > +5. Flow filter operations are often accelerated by device in a hardware. > > > > Ability > > > > > + to handle them on a queue other than control vq is desired. > > > > > + This achieves > > > > near > > > > > + zero modifications to existing implementations to add new > > > > > + operations on > > > > new > > > > > + purpose built queues (similar to transmit and receive queue). > > > > > +6. The filter masks are optional; the device should be able to expose if it > > > > > + support filter masks. > > > > > +7. The driver may want to have priority among group of flow > > > > > +entries; to > > > > facilitate > > > > > + the device support grouping flow filter entries by a notion of a group. > > Each > > > > > + group defines priority in processing flow. > > > > > +8. The driver and group owner driver should be able to query > > > > > +supported > > > > device > > > > > + limits for the flow filter entries. > > > > > + > > > > > +### 3.4.2 flow operations path > > > > > +1. The driver should be able to define a receive packet match criteria, an > > > > > + action and a destination for a packet. > > > > > > > > When the user does not specify a destination when configuring a > > > > rule, do we need a default destination? > > > > > > > I think we should not give such option to driver. > > > A human/end user may not have the destination, but driver should be able to > > decide a predictable destination. > > > > Yes, that's what I mean:), and I said "we" for "the driver." > > > Ok. got it. > > > > > > > > > For example, an ipv4 packet with a > > > > > + multicast address to be steered to the receive vq 0. The second > > example is > > > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > > > > > + be steered to receive vq 10. > > > > > +2. The match criteria should include exact tuple fields > > > > > +well-defined such as > > > > mac > > > > > + address, IP addresses, tcp/udp ports, etc. > > > > > +3. The match criteria should also optionally include the field mask. > > > > > +4. The match criteria may optionally also include specific packet byte > > offset > > > > > + pattern, match length, mask instead of RFC defined fields. > > > > > + length, and matching pattern, which may not be defined in the > > > > > +standard > > > > RFC. > > > > > > > > Is there a description error here? > > > > > > > Didn't follow your comment. Do you mean there is an error in above > > description? > > > > I don't quite understand what "specific packet byte offset pattern" means :( > > > Time to make it verbose. :) Oh, I got it, I think we should let people see more details in the next version.:) > For any new/undefined protocol, if user wants to say, > In a packet at byte offset A, if you find pattern == 0x800, drop the packet. > In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10. > > I didn't consider multiple matching patterns for now, though it is very useful. > I am inclined to keep the option of _any_match to take up later, for now to do only well defined match? > WDYT? I think it's ok, but we don't seem to need the length here. But in any case, I don't think it's that important to have or not ^^ > > > > > > > > > > +5. Action includes (a) dropping or (b) forwarding the packet. > > > > > +6. Destination is a receive virtqueue index. > > > > > > > > Since the concept of RSS context does not yet exist in the virtio spec. > > > > Did we say that we also support carrying RSS context information > > > > when negotiating the RFF feature? For example, RSS context > > > > configuration commands and structures, etc. > > > > > > > > Or support RSS context functionality as a separate feature in another > > thread? > > > > > > > Support RSS context as separate feature. > > > > Ok, humbly asking if your work plan includes this part, do you need me to share > > your work, such as rss context. > > > Lets keep rss context in future work as its orthogonal to it. > Yes, your help for rss context will be good. > Lets first finish RFF as its bit in the advance stage. Yes! But I worry that a spec that references a concept that doesn't exist in the existing spec may be blocked, so if you have no objections, I will push this work forward in the near future to help RFF possible blocking. Thanks. > > > Thanks a lot! > > > > > > > > > A related point to consider is that when a user inserts a rule with > > > > an rss context, the RSS context cannot be deleted, otherwise the > > > > device will cause undefined behavior. > > > > > > > Yes, for now we can keep rss context as separate feature.


  • 76.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 07:31

    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Friday, August 4, 2023 12:47 PM

    > Yes, but a small update:
    > If flowvq is created, RFF must be done only on flowvq by the driver.
    > If flowvq is supported, but not created, cvq can be used.
    > If flowvq is not supported, cvq is used.
    >
    Looks fine, I want to think little more on the last point to make sure we are not missing something.
    Will respond by Monday on it.
    [..]

    > Oh, I got it, I think we should let people see more details in the next version.:)
    >
    > > For any new/undefined protocol, if user wants to say, In a packet at
    > > byte offset A, if you find pattern == 0x800, drop the packet.
    > > In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10.
    > >
    > > I didn't consider multiple matching patterns for now, though it is very useful.
    > > I am inclined to keep the option of _any_match to take up later, for now to do
    > only well defined match?
    > > WDYT?
    >
    > I think it's ok, but we don't seem to need the length here. But in any case, I
    > don't think it's that important to have or not ^^
    >
    Length will be needed to indicate how many bytes to match to.
    Lets do this rule incrementally after we get the base line done for known RFC defined fields.

    > > Lets keep rss context in future work as its orthogonal to it.
    > > Yes, your help for rss context will be good.
    > > Lets first finish RFF as its bit in the advance stage.
    >
    > Yes! But I worry that a spec that references a concept that doesn't exist in the
    > existing spec may be blocked, so if you have no objections, I will push this work
    > forward in the near future to help RFF possible blocking.

    I was probably not clear enough, I propose that lets remove the RSS context for now in the RFF.
    Once RSS context is done, at that point in future to enhance RFF.



  • 77.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 07:31
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Friday, August 4, 2023 12:47 PM > Yes, but a small update: > If flowvq is created, RFF must be done only on flowvq by the driver. > If flowvq is supported, but not created, cvq can be used. > If flowvq is not supported, cvq is used. > Looks fine, I want to think little more on the last point to make sure we are not missing something. Will respond by Monday on it. [..] > Oh, I got it, I think we should let people see more details in the next version.:) > > > For any new/undefined protocol, if user wants to say, In a packet at > > byte offset A, if you find pattern == 0x800, drop the packet. > > In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10. > > > > I didn't consider multiple matching patterns for now, though it is very useful. > > I am inclined to keep the option of _any_match to take up later, for now to do > only well defined match? > > WDYT? > > I think it's ok, but we don't seem to need the length here. But in any case, I > don't think it's that important to have or not ^^ > Length will be needed to indicate how many bytes to match to. Lets do this rule incrementally after we get the base line done for known RFC defined fields. > > Lets keep rss context in future work as its orthogonal to it. > > Yes, your help for rss context will be good. > > Lets first finish RFF as its bit in the advance stage. > > Yes! But I worry that a spec that references a concept that doesn't exist in the > existing spec may be blocked, so if you have no objections, I will push this work > forward in the near future to help RFF possible blocking. I was probably not clear enough, I propose that lets remove the RSS context for now in the RFF. Once RSS context is done, at that point in future to enhance RFF.


  • 78.  Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 07:51


    ? 2023/8/4 ??3:30, Parav Pandit ??:
    >> From: Heng Qi <hengqi@linux.alibaba.com>
    >> Sent: Friday, August 4, 2023 12:47 PM
    >> Yes, but a small update:
    >> If flowvq is created, RFF must be done only on flowvq by the driver.
    >> If flowvq is supported, but not created, cvq can be used.
    >> If flowvq is not supported, cvq is used.
    >>
    > Looks fine, I want to think little more on the last point to make sure we are not missing something.
    > Will respond by Monday on it.

    Ok.

    > [..]
    >
    >> Oh, I got it, I think we should let people see more details in the next version.:)
    >>
    >>> For any new/undefined protocol, if user wants to say, In a packet at
    >>> byte offset A, if you find pattern == 0x800, drop the packet.
    >>> In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10.
    >>>
    >>> I didn't consider multiple matching patterns for now, though it is very useful.
    >>> I am inclined to keep the option of _any_match to take up later, for now to do
    >> only well defined match?
    >>> WDYT?
    >> I think it's ok, but we don't seem to need the length here. But in any case, I
    >> don't think it's that important to have or not ^^
    >>
    > Length will be needed to indicate how many bytes to match to.
    > Lets do this rule incrementally after we get the base line done for known RFC defined fields.
    >
    >>> Lets keep rss context in future work as its orthogonal to it.
    >>> Yes, your help for rss context will be good.
    >>> Lets first finish RFF as its bit in the advance stage.
    >> Yes! But I worry that a spec that references a concept that doesn't exist in the
    >> existing spec may be blocked, so if you have no objections, I will push this work
    >> forward in the near future to help RFF possible blocking.
    > I was probably not clear enough, I propose that lets remove the RSS context for now in the RFF.
    > Once RSS context is done, at that point in future to enhance RFF.

    I need some time to see if the rss context has an effect on our
    scenario, so please hold it for now. I'll sync it up as soon as possible.

    Thanks!

    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail. Follow this link to all your TCs in OASIS at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php




  • 79.  Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-04-2023 07:51
    å 2023/8/4 äå3:30, Parav Pandit åé: From: Heng Qi <hengqi@linux.alibaba.com> Sent: Friday, August 4, 2023 12:47 PM Yes, but a small update: If flowvq is created, RFF must be done only on flowvq by the driver. If flowvq is supported, but not created, cvq can be used. If flowvq is not supported, cvq is used. Looks fine, I want to think little more on the last point to make sure we are not missing something. Will respond by Monday on it. Ok. [..] Oh, I got it, I think we should let people see more details in the next version.:) For any new/undefined protocol, if user wants to say, In a packet at byte offset A, if you find pattern == 0x800, drop the packet. In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10. I didn't consider multiple matching patterns for now, though it is very useful. I am inclined to keep the option of _any_match to take up later, for now to do only well defined match? WDYT? I think it's ok, but we don't seem to need the length here. But in any case, I don't think it's that important to have or not ^^ Length will be needed to indicate how many bytes to match to. Lets do this rule incrementally after we get the base line done for known RFC defined fields. Lets keep rss context in future work as its orthogonal to it. Yes, your help for rss context will be good. Lets first finish RFF as its bit in the advance stage. Yes! But I worry that a spec that references a concept that doesn't exist in the existing spec may be blocked, so if you have no objections, I will push this work forward in the near future to help RFF possible blocking. I was probably not clear enough, I propose that lets remove the RSS context for now in the RFF. Once RSS context is done, at that point in future to enhance RFF. I need some time to see if the rss context has an effect on our scenario, so please hold it for now. I'll sync it up as soon as possible. Thanks! --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 80.  Re: [virtio-comment] Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-07-2023 07:23
    On Fri, Aug 04, 2023 at 03:51:16PM +0800, Heng Qi wrote:
    >
    >
    > ? 2023/8/4 ??3:30, Parav Pandit ??:
    > >>From: Heng Qi <hengqi@linux.alibaba.com>
    > >>Sent: Friday, August 4, 2023 12:47 PM
    > >>Yes, but a small update:
    > >>If flowvq is created, RFF must be done only on flowvq by the driver.
    > >>If flowvq is supported, but not created, cvq can be used.
    > >>If flowvq is not supported, cvq is used.
    > >>
    > >Looks fine, I want to think little more on the last point to make sure we are not missing something.
    > >Will respond by Monday on it.
    >
    > Ok.
    >
    > >[..]
    > >
    > >>Oh, I got it, I think we should let people see more details in the next version.:)
    > >>
    > >>>For any new/undefined protocol, if user wants to say, In a packet at
    > >>>byte offset A, if you find pattern == 0x800, drop the packet.
    > >>>In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10.
    > >>>
    > >>>I didn't consider multiple matching patterns for now, though it is very useful.
    > >>>I am inclined to keep the option of _any_match to take up later, for now to do
    > >>only well defined match?
    > >>>WDYT?
    > >>I think it's ok, but we don't seem to need the length here. But in any case, I
    > >>don't think it's that important to have or not ^^
    > >>
    > >Length will be needed to indicate how many bytes to match to.
    > >Lets do this rule incrementally after we get the base line done for known RFC defined fields.
    > >
    > >>>Lets keep rss context in future work as its orthogonal to it.
    > >>>Yes, your help for rss context will be good.
    > >>>Lets first finish RFF as its bit in the advance stage.
    > >>Yes! But I worry that a spec that references a concept that doesn't exist in the
    > >>existing spec may be blocked, so if you have no objections, I will push this work
    > >>forward in the near future to help RFF possible blocking.
    > >I was probably not clear enough, I propose that lets remove the RSS context for now in the RFF.
    > >Once RSS context is done, at that point in future to enhance RFF.
    Hi, Parav.

    We need the RSS context, which can be combined with 'ethtool -X .. equal'
    to achieve the purpose of traffic isolation, so please keep the
    RSS context. To allay your concerns that an RSS context might be
    blocking the work of n-tuples, I'll be issuing a spec this week (or next
    week) for virtio support for the RSS context. For n-tuple RFF context,
    we can support it after RFF, as a point to enhance RFF.

    > >[..]

    > +9. The filter rule add/delete entries are usually short in size of few tens of
    > + bytes, for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > + high, hence supplying fields inside the queue descriptor is preferred for
    > + up to a certain fixed size, say 56 bytes.

    '56B' does not seem to be enough. For example,
    src mac + dst mac + src-ip6 + dst-ip6 + src-port + dst-port + user-define
    (packet byte offset pattern, length and mask) + flow id + rule priority + destination + action =
    6 + 6 + 16 + 16 + 2 + 2 + 8 + 4 + 1 + 2 + 1 = 64 B,
    we also have structure alignment and quintuple mask, etc.

    > [..]
    > +5. The device should be able to expose if it support filter masks.
    > [..]
    > +7. Flow filter capabilities to query using a DMA interface:
    > +
    > +```
    > +struct flow_filter_capabilities {
    > + u8 flow_groups;
    > + u16 num_flow_filter_vqs;
    > + u16 start_vq_index;
    > + u32 max_flow_filters_per_group;
    > + u32 max_flow_filters;
    > + u64 supported_packet_field_mask_bmap[4];

    I think the function here is that after the user sends the mask, the
    driver should do an 'AND' operation with the mask supported by the device first?
    Meanwhile, is 32B enough:) :
    src-port + dst-port + src mac + dst mac + src-ip6 + dst-ip6 =
    2 + 2 + 6 + 6 + 16 + 16 = 48B

    Thanks!

    > +};
    > +
    > +```





  • 81.  Re: [virtio-comment] Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-07-2023 07:23
    On Fri, Aug 04, 2023 at 03:51:16PM +0800, Heng Qi wrote: > > > å 2023/8/4 äå3:30, Parav Pandit åé: > >>From: Heng Qi <hengqi@linux.alibaba.com> > >>Sent: Friday, August 4, 2023 12:47 PM > >>Yes, but a small update: > >>If flowvq is created, RFF must be done only on flowvq by the driver. > >>If flowvq is supported, but not created, cvq can be used. > >>If flowvq is not supported, cvq is used. > >> > >Looks fine, I want to think little more on the last point to make sure we are not missing something. > >Will respond by Monday on it. > > Ok. > > >[..] > > > >>Oh, I got it, I think we should let people see more details in the next version.:) > >> > >>>For any new/undefined protocol, if user wants to say, In a packet at > >>>byte offset A, if you find pattern == 0x800, drop the packet. > >>>In a packet at byte offset B, if you find pattern == 0x8100, forward to rq 10. > >>> > >>>I didn't consider multiple matching patterns for now, though it is very useful. > >>>I am inclined to keep the option of _any_match to take up later, for now to do > >>only well defined match? > >>>WDYT? > >>I think it's ok, but we don't seem to need the length here. But in any case, I > >>don't think it's that important to have or not ^^ > >> > >Length will be needed to indicate how many bytes to match to. > >Lets do this rule incrementally after we get the base line done for known RFC defined fields. > > > >>>Lets keep rss context in future work as its orthogonal to it. > >>>Yes, your help for rss context will be good. > >>>Lets first finish RFF as its bit in the advance stage. > >>Yes! But I worry that a spec that references a concept that doesn't exist in the > >>existing spec may be blocked, so if you have no objections, I will push this work > >>forward in the near future to help RFF possible blocking. > >I was probably not clear enough, I propose that lets remove the RSS context for now in the RFF. > >Once RSS context is done, at that point in future to enhance RFF. Hi, Parav. We need the RSS context, which can be combined with 'ethtool -X .. equal' to achieve the purpose of traffic isolation, so please keep the RSS context. To allay your concerns that an RSS context might be blocking the work of n-tuples, I'll be issuing a spec this week (or next week) for virtio support for the RSS context. For n-tuple RFF context, we can support it after RFF, as a point to enhance RFF. > >[..] > +9. The filter rule add/delete entries are usually short in size of few tens of > + bytes, for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is > + high, hence supplying fields inside the queue descriptor is preferred for > + up to a certain fixed size, say 56 bytes. '56B' does not seem to be enough. For example, src mac + dst mac + src-ip6 + dst-ip6 + src-port + dst-port + user-define (packet byte offset pattern, length and mask) + flow id + rule priority + destination + action = 6 + 6 + 16 + 16 + 2 + 2 + 8 + 4 + 1 + 2 + 1 = 64 B, we also have structure alignment and quintuple mask, etc. > [..] > +5. The device should be able to expose if it support filter masks. > [..] > +7. Flow filter capabilities to query using a DMA interface: > + > +``` > +struct flow_filter_capabilities { > + u8 flow_groups; > + u16 num_flow_filter_vqs; > + u16 start_vq_index; > + u32 max_flow_filters_per_group; > + u32 max_flow_filters; > + u64 supported_packet_field_mask_bmap[4]; I think the function here is that after the user sends the mask, the driver should do an 'AND' operation with the mask supported by the device first? Meanwhile, is 32B enough:) : src-port + dst-port + src mac + dst mac + src-ip6 + dst-ip6 = 2 + 2 + 6 + 6 + 16 + 16 = 48B Thanks! > +}; > + > +```


  • 82.  RE: [virtio-comment] Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-08-2023 07:13


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Monday, August 7, 2023 12:53 PM

    > > >Once RSS context is done, at that point in future to enhance RFF.
    > Hi, Parav.
    >
    > We need the RSS context, which can be combined with 'ethtool -X .. equal'
    > to achieve the purpose of traffic isolation, so please keep the RSS context. To
    > allay your concerns that an RSS context might be blocking the work of n-tuples,
    > I'll be issuing a spec this week (or next
    > week) for virtio support for the RSS context. For n-tuple RFF context, we can
    > support it after RFF, as a point to enhance RFF.
    >
    Ok. sounds good.

    > > >[..]
    >
    > > +9. The filter rule add/delete entries are usually short in size of few tens of
    > > + bytes, for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > > + high, hence supplying fields inside the queue descriptor is preferred for
    > > + up to a certain fixed size, say 56 bytes.
    >
    > '56B' does not seem to be enough. For example, src mac + dst mac + src-ip6 +
    > dst-ip6 + src-port + dst-port + user-define (packet byte offset pattern, length
    > and mask) + flow id + rule priority + destination + action =
    > 6 + 6 + 16 + 16 + 2 + 2 + 8 + 4 + 1 + 2 + 1 = 64 B, we also have structure alignment
    > and quintuple mask, etc.
    >
    Yes, 64B also need to add group id to it.
    A practical finite limit would be fine, I took 56B example based on ipv4 case.
    I think 96B at higher upper limit looks reasonable without considering mask.
    As ARFS kind of use cases usually are exact match, hence mask is optional that doest need to be inline.

    > > [..]
    > > +5. The device should be able to expose if it support filter masks.
    > > [..]
    > > +7. Flow filter capabilities to query using a DMA interface:
    > > +
    > > +```
    > > +struct flow_filter_capabilities {
    > > + u8 flow_groups;
    > > + u16 num_flow_filter_vqs;
    > > + u16 start_vq_index;
    > > + u32 max_flow_filters_per_group;
    > > + u32 max_flow_filters;
    > > + u64 supported_packet_field_mask_bmap[4];
    >
    > I think the function here is that after the user sends the mask, the driver should
    > do an 'AND' operation with the mask supported by the device first?
    > Meanwhile, is 32B enough:) :
    > src-port + dst-port + src mac + dst mac + src-ip6 + dst-ip6 =
    > 2 + 2 + 6 + 6 + 16 + 16 = 48B
    >
    Oh I didn’t document well.
    The field supported_packet_field_mask_bmap is bitmap of well defined fields, one bit for each field.
    Such as,
    Bit_0 = dst_mac
    Bit_1 = src_mac
    Bit_2 = eth_type
    Bit_3 = vlan_tag
    Bit_4 = dst_ip
    And so on.

    So yes, the actual content like src-port does not need to mask, but yes,
    when the filter rule arrives from ARFS or FC side, that metdata info coming in ethtool_rx_flow_spec to be masked.

    It would make more sense to have a dedicated command for the bitmap for two reasons.
    1. It doesn't get sandwiched in the future when new fields are added after the bitmap.
    2. When bitmap needs to extend, it is not fragmented at two or more places.



  • 83.  RE: [virtio-comment] Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-08-2023 07:13
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Monday, August 7, 2023 12:53 PM > > >Once RSS context is done, at that point in future to enhance RFF. > Hi, Parav. > > We need the RSS context, which can be combined with 'ethtool -X .. equal' > to achieve the purpose of traffic isolation, so please keep the RSS context. To > allay your concerns that an RSS context might be blocking the work of n-tuples, > I'll be issuing a spec this week (or next > week) for virtio support for the RSS context. For n-tuple RFF context, we can > support it after RFF, as a point to enhance RFF. > Ok. sounds good. > > >[..] > > > +9. The filter rule add/delete entries are usually short in size of few tens of > > + bytes, for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is > > + high, hence supplying fields inside the queue descriptor is preferred for > > + up to a certain fixed size, say 56 bytes. > > '56B' does not seem to be enough. For example, src mac + dst mac + src-ip6 + > dst-ip6 + src-port + dst-port + user-define (packet byte offset pattern, length > and mask) + flow id + rule priority + destination + action = > 6 + 6 + 16 + 16 + 2 + 2 + 8 + 4 + 1 + 2 + 1 = 64 B, we also have structure alignment > and quintuple mask, etc. > Yes, 64B also need to add group id to it. A practical finite limit would be fine, I took 56B example based on ipv4 case. I think 96B at higher upper limit looks reasonable without considering mask. As ARFS kind of use cases usually are exact match, hence mask is optional that doest need to be inline. > > [..] > > +5. The device should be able to expose if it support filter masks. > > [..] > > +7. Flow filter capabilities to query using a DMA interface: > > + > > +``` > > +struct flow_filter_capabilities { > > + u8 flow_groups; > > + u16 num_flow_filter_vqs; > > + u16 start_vq_index; > > + u32 max_flow_filters_per_group; > > + u32 max_flow_filters; > > + u64 supported_packet_field_mask_bmap[4]; > > I think the function here is that after the user sends the mask, the driver should > do an 'AND' operation with the mask supported by the device first? > Meanwhile, is 32B enough:) : > src-port + dst-port + src mac + dst mac + src-ip6 + dst-ip6 = > 2 + 2 + 6 + 6 + 16 + 16 = 48B > Oh I didnât document well. The field supported_packet_field_mask_bmap is bitmap of well defined fields, one bit for each field. Such as, Bit_0 = dst_mac Bit_1 = src_mac Bit_2 = eth_type Bit_3 = vlan_tag Bit_4 = dst_ip And so on. So yes, the actual content like src-port does not need to mask, but yes, when the filter rule arrives from ARFS or FC side, that metdata info coming in ethtool_rx_flow_spec to be masked. It would make more sense to have a dedicated command for the bitmap for two reasons. 1. It doesn't get sandwiched in the future when new fields are added after the bitmap. 2. When bitmap needs to extend, it is not fragmented at two or more places.


  • 84.  Re: [virtio] RE: [virtio-comment] Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-08-2023 08:19
    On Tue, Aug 08, 2023 at 07:13:13AM +0000, Parav Pandit wrote:
    >
    >
    > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > Sent: Monday, August 7, 2023 12:53 PM
    >
    > > > >Once RSS context is done, at that point in future to enhance RFF.
    > > Hi, Parav.
    > >
    > > We need the RSS context, which can be combined with 'ethtool -X .. equal'
    > > to achieve the purpose of traffic isolation, so please keep the RSS context. To
    > > allay your concerns that an RSS context might be blocking the work of n-tuples,
    > > I'll be issuing a spec this week (or next
    > > week) for virtio support for the RSS context. For n-tuple RFF context, we can
    > > support it after RFF, as a point to enhance RFF.
    > >
    > Ok. sounds good.
    >
    > > > >[..]
    > >
    > > > +9. The filter rule add/delete entries are usually short in size of few tens of
    > > > + bytes, for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > > > + high, hence supplying fields inside the queue descriptor is preferred for
    > > > + up to a certain fixed size, say 56 bytes.
    > >
    > > '56B' does not seem to be enough. For example, src mac + dst mac + src-ip6 +
    > > dst-ip6 + src-port + dst-port + user-define (packet byte offset pattern, length
    > > and mask) + flow id + rule priority + destination + action =
    > > 6 + 6 + 16 + 16 + 2 + 2 + 8 + 4 + 1 + 2 + 1 = 64 B, we also have structure alignment
    > > and quintuple mask, etc.
    > >
    > Yes, 64B also need to add group id to it.

    Yes, there are also fields such as vlan, I just did a rough
    calculation:)

    > A practical finite limit would be fine, I took 56B example based on ipv4 case.
    > I think 96B at higher upper limit looks reasonable without considering mask.

    96B is enough without mask.

    > As ARFS kind of use cases usually are exact match, hence mask is optional that doest need to be inline.

    Yes, But RFF will carry the mask, so we need to consider the mask field when
    considering the maximum fixed length.

    >
    > > > [..]
    > > > +5. The device should be able to expose if it support filter masks.
    > > > [..]
    > > > +7. Flow filter capabilities to query using a DMA interface:
    > > > +
    > > > +```
    > > > +struct flow_filter_capabilities {
    > > > + u8 flow_groups;
    > > > + u16 num_flow_filter_vqs;
    > > > + u16 start_vq_index;
    > > > + u32 max_flow_filters_per_group;
    > > > + u32 max_flow_filters;
    > > > + u64 supported_packet_field_mask_bmap[4];
    > >
    > > I think the function here is that after the user sends the mask, the driver should
    > > do an 'AND' operation with the mask supported by the device first?
    > > Meanwhile, is 32B enough:) :
    > > src-port + dst-port + src mac + dst mac + src-ip6 + dst-ip6 =
    > > 2 + 2 + 6 + 6 + 16 + 16 = 48B
    > >
    > Oh I didn’t document well.
    > The field supported_packet_field_mask_bmap is bitmap of well defined fields, one bit for each field.
    > Such as,
    > Bit_0 = dst_mac
    > Bit_1 = src_mac
    > Bit_2 = eth_type
    > Bit_3 = vlan_tag
    > Bit_4 = dst_ip
    > And so on.

    Ok, I got it. Then we don't need to reserve 256B for
    supported_packet_field_mask_bmap. 64 bits represent 64 different fields,
    and I think it seems enough, or 128 bits?

    Considering the alignment of the structure, it should look like this:
    struct flow_filter_capabilities {
    u8 flow_groups;
    u8 padding1;
    u16 num_flow_filter_vqs;
    u16 start_vq_index;
    u16 padding2;
    u32 max_flow_filters_per_group;
    u32 max_flow_filters;
    u64 supported_packet_field_mask_bmap;
    };

    >
    > So yes, the actual content like src-port does not need to mask, but yes,
    > when the filter rule arrives from ARFS or FC side, that metdata info coming in ethtool_rx_flow_spec to be masked.
    >

    Yes.

    > It would make more sense to have a dedicated command for the bitmap for two reasons.
    > 1. It doesn't get sandwiched in the future when new fields are added after the bitmap.
    > 2. When bitmap needs to extend, it is not fragmented at two or more places.

    Agree! Allowing field extensions is good, especially the supported mask fields.

    Thanks!




  • 85.  Re: [virtio] RE: [virtio-comment] Re: [virtio] RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-08-2023 08:19
    On Tue, Aug 08, 2023 at 07:13:13AM +0000, Parav Pandit wrote: > > > > From: Heng Qi <hengqi@linux.alibaba.com> > > Sent: Monday, August 7, 2023 12:53 PM > > > > >Once RSS context is done, at that point in future to enhance RFF. > > Hi, Parav. > > > > We need the RSS context, which can be combined with 'ethtool -X .. equal' > > to achieve the purpose of traffic isolation, so please keep the RSS context. To > > allay your concerns that an RSS context might be blocking the work of n-tuples, > > I'll be issuing a spec this week (or next > > week) for virtio support for the RSS context. For n-tuple RFF context, we can > > support it after RFF, as a point to enhance RFF. > > > Ok. sounds good. > > > > >[..] > > > > > +9. The filter rule add/delete entries are usually short in size of few tens of > > > + bytes, for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is > > > + high, hence supplying fields inside the queue descriptor is preferred for > > > + up to a certain fixed size, say 56 bytes. > > > > '56B' does not seem to be enough. For example, src mac + dst mac + src-ip6 + > > dst-ip6 + src-port + dst-port + user-define (packet byte offset pattern, length > > and mask) + flow id + rule priority + destination + action = > > 6 + 6 + 16 + 16 + 2 + 2 + 8 + 4 + 1 + 2 + 1 = 64 B, we also have structure alignment > > and quintuple mask, etc. > > > Yes, 64B also need to add group id to it. Yes, there are also fields such as vlan, I just did a rough calculation:) > A practical finite limit would be fine, I took 56B example based on ipv4 case. > I think 96B at higher upper limit looks reasonable without considering mask. 96B is enough without mask. > As ARFS kind of use cases usually are exact match, hence mask is optional that doest need to be inline. Yes, But RFF will carry the mask, so we need to consider the mask field when considering the maximum fixed length. > > > > [..] > > > +5. The device should be able to expose if it support filter masks. > > > [..] > > > +7. Flow filter capabilities to query using a DMA interface: > > > + > > > +``` > > > +struct flow_filter_capabilities { > > > + u8 flow_groups; > > > + u16 num_flow_filter_vqs; > > > + u16 start_vq_index; > > > + u32 max_flow_filters_per_group; > > > + u32 max_flow_filters; > > > + u64 supported_packet_field_mask_bmap[4]; > > > > I think the function here is that after the user sends the mask, the driver should > > do an 'AND' operation with the mask supported by the device first? > > Meanwhile, is 32B enough:) : > > src-port + dst-port + src mac + dst mac + src-ip6 + dst-ip6 = > > 2 + 2 + 6 + 6 + 16 + 16 = 48B > > > Oh I didnât document well. > The field supported_packet_field_mask_bmap is bitmap of well defined fields, one bit for each field. > Such as, > Bit_0 = dst_mac > Bit_1 = src_mac > Bit_2 = eth_type > Bit_3 = vlan_tag > Bit_4 = dst_ip > And so on. Ok, I got it. Then we don't need to reserve 256B for supported_packet_field_mask_bmap. 64 bits represent 64 different fields, and I think it seems enough, or 128 bits? Considering the alignment of the structure, it should look like this: struct flow_filter_capabilities { u8 flow_groups; u8 padding1; u16 num_flow_filter_vqs; u16 start_vq_index; u16 padding2; u32 max_flow_filters_per_group; u32 max_flow_filters; u64 supported_packet_field_mask_bmap; }; > > So yes, the actual content like src-port does not need to mask, but yes, > when the filter rule arrives from ARFS or FC side, that metdata info coming in ethtool_rx_flow_spec to be masked. > Yes. > It would make more sense to have a dedicated command for the bitmap for two reasons. > 1. It doesn't get sandwiched in the future when new fields are added after the bitmap. > 2. When bitmap needs to extend, it is not fragmented at two or more places. Agree! Allowing field extensions is good, especially the supported mask fields. Thanks!


  • 86.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-08-2023 08:22
    On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote:
    >
    > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > Sent: Wednesday, August 2, 2023 8:55 PM
    >
    > > Hi, Parav. Sorry for not responding to this in time due to other things recently.
    > >
    > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the
    > > packet steering on the device side.
    > > I think manually configured rules should have higher priority than ARFS
    > > automatic configuration.
    > > This behavior is intuitive and consistent with other drivers. Therefore, the
    > > processing chain on a rx packet is:
    > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}.
    > >
    > Correct.
    > Within the RFF context, the priority among multiple RFF entries is governed by the concept of group.
    > So above two users of the RFF will create two groups and assign priority to it and achieve the desired processing order.
    >
    > > There are also priorities within set_rxnfc and ARFS respectively.
    > > 1. For set_rxnfc, which has the exact match and the mask match. Exact
    > > matches should have higher priority.
    > > Suppose there are two rules,
    > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1
    > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved
    > > rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1.
    > >
    > Yes. Driver should be able to set the priority within the group as well for above scenario.
    >
    > > The rules of set_rxnfc come from manual configuration, the number of these
    > > rules is small and we may not need group grouping for this. And ctrlq can meet
    > > the configuration rate,
    > >
    > Yes, but having single interface for two use cases enables the device implementation to not build driver interface specific infra.
    > Both can be handled by unified interface.

    I reconsidered the number of groups, we don't necessarily have only
    two groups for the time being (one is RFF, the other is ARFS). For
    example, the driver may maintain groups with different priorities for
    RFF itself (for example, according to the number of fields contained in
    ntuple, etc.), and the driver may also maintain different groups with
    the same priority for different flow types of ARFS, etc.

    Thanks!

    >
    > > 2. For ARFS, which only has the exact match.
    > > For ARFS, since there is only one matching rule for a certain flow, so there is no
    > > need for group?
    > Groups are defining the priority between two types of rules.
    > Within ARFS domain we don't need group.
    >
    > However instead of starting with only two limiting groups, it is better to have some flexibility for supporting multiple groups.
    > A device can device one/two or more groups.
    > So in future if a use case arise, interface wont be limiting to it.
    >
    > > We may need different types of tables, such as UDPv4 flow table, TCPv4 flow
    > > table to speed up the lookup for differect flow types.
    > > Besides, the high rate and large number of configuration rules means that we
    > > need flow vq.
    > >
    > Yes, I am not sure if those tables should be exposed to the driver.
    > Thinking that a device may be able to decide on table count which it may be able to create.
    >
    > > Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are
    > > still some differences, such as configuration rate and quantity. So do we need
    > > add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for
    > > set_rxnfc and ARFS respectively, and ARFS can choose flow vq?
    > Not really, as one interface can fullfil both the needs without attaching it to a specific OS interface.
    >
    > > In this way, is it more conducive to advancing the work of RFF (such as
    > > accelerating the advancement of set_rxnfc)?
    > >
    > Both the use cases are equally immediately usable so we can advance it easily using single interface now.
    >
    > > > +
    > > > +### 3.4.1 control path
    > > > +1. The number of flow filter operations/sec can range from 100k/sec to
    > > 1M/sec
    > > > + or even more. Hence flow filter operations must be done over a queueing
    > > > + interface using one or more queues.
    > >
    > > This is only for ARFS, for devices that only want to support set_rxnfc, they don't
    > > provide VIRTIO_NET_F_ARFS and consider implementing flow vq.
    > >
    > Well once the device implements flow vq, it will service both cases.
    > A simple device implementation who only case for RXNFC, can implement flowvq in semi-software serving very small number of req/sec.
    >
    > > > +2. The device should be able to expose one or more supported flow filter
    > > queue
    > > > + count and its start vq index to the driver.
    > > > +3. As each device may be operating for different performance characteristic,
    > > > + start vq index and count may be different for each device. Secondly, it is
    > > > + inefficient for device to provide flow filters capabilities via a config space
    > > > + region. Hence, the device should be able to share these attributes using
    > > > + dma interface, instead of transport registers.
    > > > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > > > + will likely create these queues when flow filters are enabled.
    > >
    > > I understand that the number of flow vqs is not reflected in
    > > max_virtqueue_pairs. And a new vq is created at runtime, is this supported in
    > > the existing virtio spec?
    > >
    > We are extending the virtio-spec now if it is not supported.
    > But yes, it is supported because max_virtqueue_pairs will not expose the count of flow_vq.
    > (similar to how we did the AQ).
    > And flowvq anyway is not _pair_ so we cannot expose there anyway.
    >
    > > > +5. Flow filter operations are often accelerated by device in a hardware.
    > > Ability
    > > > + to handle them on a queue other than control vq is desired. This achieves
    > > near
    > > > + zero modifications to existing implementations to add new operations on
    > > new
    > > > + purpose built queues (similar to transmit and receive queue).
    > > > +6. The filter masks are optional; the device should be able to expose if it
    > > > + support filter masks.
    > > > +7. The driver may want to have priority among group of flow entries; to
    > > facilitate
    > > > + the device support grouping flow filter entries by a notion of a group. Each
    > > > + group defines priority in processing flow.
    > > > +8. The driver and group owner driver should be able to query supported
    > > device
    > > > + limits for the flow filter entries.
    > > > +
    > > > +### 3.4.2 flow operations path
    > > > +1. The driver should be able to define a receive packet match criteria, an
    > > > + action and a destination for a packet.
    > >
    > > When the user does not specify a destination when configuring a rule, do we
    > > need a default destination?
    > >
    > I think we should not give such option to driver.
    > A human/end user may not have the destination, but driver should be able to decide a predictable destination.
    >
    > > > For example, an ipv4 packet with a
    > > > + multicast address to be steered to the receive vq 0. The second example is
    > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > > > + be steered to receive vq 10.
    > > > +2. The match criteria should include exact tuple fields well-defined such as
    > > mac
    > > > + address, IP addresses, tcp/udp ports, etc.
    > > > +3. The match criteria should also optionally include the field mask.
    > > > +4. The match criteria may optionally also include specific packet byte offset
    > > > + pattern, match length, mask instead of RFC defined fields.
    > > > + length, and matching pattern, which may not be defined in the standard
    > > RFC.
    > >
    > > Is there a description error here?
    > >
    > Didn't follow your comment. Do you mean there is an error in above description?
    >
    > > > +5. Action includes (a) dropping or (b) forwarding the packet.
    > > > +6. Destination is a receive virtqueue index.
    > >
    > > Since the concept of RSS context does not yet exist in the virtio spec.
    > > Did we say that we also support carrying RSS context information when
    > > negotiating the RFF feature? For example, RSS context configuration commands
    > > and structures, etc.
    > >
    > > Or support RSS context functionality as a separate feature in another thread?
    > >
    > Support RSS context as separate feature.
    >
    > > A related point to consider is that when a user inserts a rule with an rss context,
    > > the RSS context cannot be deleted, otherwise the device will cause undefined
    > > behavior.
    > >
    > Yes, for now we can keep rss context as separate feature.



  • 87.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-08-2023 08:22
    On Thu, Aug 03, 2023 at 09:59:54AM +0000, Parav Pandit wrote: > > > From: Heng Qi <hengqi@linux.alibaba.com> > > Sent: Wednesday, August 2, 2023 8:55 PM > > > Hi, Parav. Sorry for not responding to this in time due to other things recently. > > > > Yes, RFF has two scenarios, set_rxnfc and ARFS, both of which will affect the > > packet steering on the device side. > > I think manually configured rules should have higher priority than ARFS > > automatic configuration. > > This behavior is intuitive and consistent with other drivers. Therefore, the > > processing chain on a rx packet is: > > {mac,vlan,promisc rx filters} -> {set_rxnfc} -> {ARFS} -> {rss/hash config}. > > > Correct. > Within the RFF context, the priority among multiple RFF entries is governed by the concept of group. > So above two users of the RFF will create two groups and assign priority to it and achieve the desired processing order. > > > There are also priorities within set_rxnfc and ARFS respectively. > > 1. For set_rxnfc, which has the exact match and the mask match. Exact > > matches should have higher priority. > > Suppose there are two rules, > > rule1: {"tcpv4", "src-ip: 1.1.1.1"} -> rxq1 > > rule2: {"tcpv4", "src-ip: 1.1.1.1", "dst-port: 8989"} -> rxq2 .For recieved > > rx packets whose src-ip is 1.1.1.1, should match rule2 instead of rule1. > > > Yes. Driver should be able to set the priority within the group as well for above scenario. > > > The rules of set_rxnfc come from manual configuration, the number of these > > rules is small and we may not need group grouping for this. And ctrlq can meet > > the configuration rate, > > > Yes, but having single interface for two use cases enables the device implementation to not build driver interface specific infra. > Both can be handled by unified interface. I reconsidered the number of groups, we don't necessarily have only two groups for the time being (one is RFF, the other is ARFS). For example, the driver may maintain groups with different priorities for RFF itself (for example, according to the number of fields contained in ntuple, etc.), and the driver may also maintain different groups with the same priority for different flow types of ARFS, etc. Thanks! > > > 2. For ARFS, which only has the exact match. > > For ARFS, since there is only one matching rule for a certain flow, so there is no > > need for group? > Groups are defining the priority between two types of rules. > Within ARFS domain we don't need group. > > However instead of starting with only two limiting groups, it is better to have some flexibility for supporting multiple groups. > A device can device one/two or more groups. > So in future if a use case arise, interface wont be limiting to it. > > > We may need different types of tables, such as UDPv4 flow table, TCPv4 flow > > table to speed up the lookup for differect flow types. > > Besides, the high rate and large number of configuration rules means that we > > need flow vq. > > > Yes, I am not sure if those tables should be exposed to the driver. > Thinking that a device may be able to decide on table count which it may be able to create. > > > Therefore, although set_rxnfc and ARFS share a set of infrastructure, there are > > still some differences, such as configuration rate and quantity. So do we need > > add two features (VIRTIO_NET_F_RXNFC and VIRTIO_NET_F_ARFS) for > > set_rxnfc and ARFS respectively, and ARFS can choose flow vq? > Not really, as one interface can fullfil both the needs without attaching it to a specific OS interface. > > > In this way, is it more conducive to advancing the work of RFF (such as > > accelerating the advancement of set_rxnfc)? > > > Both the use cases are equally immediately usable so we can advance it easily using single interface now. > > > > + > > > +### 3.4.1 control path > > > +1. The number of flow filter operations/sec can range from 100k/sec to > > 1M/sec > > > + or even more. Hence flow filter operations must be done over a queueing > > > + interface using one or more queues. > > > > This is only for ARFS, for devices that only want to support set_rxnfc, they don't > > provide VIRTIO_NET_F_ARFS and consider implementing flow vq. > > > Well once the device implements flow vq, it will service both cases. > A simple device implementation who only case for RXNFC, can implement flowvq in semi-software serving very small number of req/sec. > > > > +2. The device should be able to expose one or more supported flow filter > > queue > > > + count and its start vq index to the driver. > > > +3. As each device may be operating for different performance characteristic, > > > + start vq index and count may be different for each device. Secondly, it is > > > + inefficient for device to provide flow filters capabilities via a config space > > > + region. Hence, the device should be able to share these attributes using > > > + dma interface, instead of transport registers. > > > +4. Since flow filters are enabled much later in the driver life cycle, driver > > > + will likely create these queues when flow filters are enabled. > > > > I understand that the number of flow vqs is not reflected in > > max_virtqueue_pairs. And a new vq is created at runtime, is this supported in > > the existing virtio spec? > > > We are extending the virtio-spec now if it is not supported. > But yes, it is supported because max_virtqueue_pairs will not expose the count of flow_vq. > (similar to how we did the AQ). > And flowvq anyway is not _pair_ so we cannot expose there anyway. > > > > +5. Flow filter operations are often accelerated by device in a hardware. > > Ability > > > + to handle them on a queue other than control vq is desired. This achieves > > near > > > + zero modifications to existing implementations to add new operations on > > new > > > + purpose built queues (similar to transmit and receive queue). > > > +6. The filter masks are optional; the device should be able to expose if it > > > + support filter masks. > > > +7. The driver may want to have priority among group of flow entries; to > > facilitate > > > + the device support grouping flow filter entries by a notion of a group. Each > > > + group defines priority in processing flow. > > > +8. The driver and group owner driver should be able to query supported > > device > > > + limits for the flow filter entries. > > > + > > > +### 3.4.2 flow operations path > > > +1. The driver should be able to define a receive packet match criteria, an > > > + action and a destination for a packet. > > > > When the user does not specify a destination when configuring a rule, do we > > need a default destination? > > > I think we should not give such option to driver. > A human/end user may not have the destination, but driver should be able to decide a predictable destination. > > > > For example, an ipv4 packet with a > > > + multicast address to be steered to the receive vq 0. The second example is > > > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > > > + be steered to receive vq 10. > > > +2. The match criteria should include exact tuple fields well-defined such as > > mac > > > + address, IP addresses, tcp/udp ports, etc. > > > +3. The match criteria should also optionally include the field mask. > > > +4. The match criteria may optionally also include specific packet byte offset > > > + pattern, match length, mask instead of RFC defined fields. > > > + length, and matching pattern, which may not be defined in the standard > > RFC. > > > > Is there a description error here? > > > Didn't follow your comment. Do you mean there is an error in above description? > > > > +5. Action includes (a) dropping or (b) forwarding the packet. > > > +6. Destination is a receive virtqueue index. > > > > Since the concept of RSS context does not yet exist in the virtio spec. > > Did we say that we also support carrying RSS context information when > > negotiating the RFF feature? For example, RSS context configuration commands > > and structures, etc. > > > > Or support RSS context functionality as a separate feature in another thread? > > > Support RSS context as separate feature. > > > A related point to consider is that when a user inserts a rule with an rss context, > > the RSS context cannot be deleted, otherwise the device will cause undefined > > behavior. > > > Yes, for now we can keep rss context as separate feature.


  • 88.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-14-2023 05:15
    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Tuesday, August 8, 2023 1:52 PM

    > > Yes, but having single interface for two use cases enables the device
    > implementation to not build driver interface specific infra.
    > > Both can be handled by unified interface.
    >
    > I reconsidered the number of groups, we don't necessarily have only two groups
    > for the time being (one is RFF, the other is ARFS). For example, the driver may
    > maintain groups with different priorities for RFF itself (for example, according
    > to the number of fields contained in ntuple, etc.), and the driver may also
    > maintain different groups with the same priority for different flow types of
    > ARFS, etc.

    This is fine and covered with the interface.
    Number of supported max groups is device capability that is exposed by the device.

    How many groups to use and which priority assign to each is driver's decision.
    So more than 2 groups is fine and supported by the requirements.

    In Linux net device example, 2 groups seem enough, but spec is not limited to it.

    When/if there is switch, it can also create a group and filter prioritize message before it reaches further nic processing.
    But we can keep this aside for now to not complicate the discussion more.

    So in nutshell, single interface is able to service the need of both ARFS and ethtool programming.



  • 89.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-14-2023 05:15
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Tuesday, August 8, 2023 1:52 PM > > Yes, but having single interface for two use cases enables the device > implementation to not build driver interface specific infra. > > Both can be handled by unified interface. > > I reconsidered the number of groups, we don't necessarily have only two groups > for the time being (one is RFF, the other is ARFS). For example, the driver may > maintain groups with different priorities for RFF itself (for example, according > to the number of fields contained in ntuple, etc.), and the driver may also > maintain different groups with the same priority for different flow types of > ARFS, etc. This is fine and covered with the interface. Number of supported max groups is device capability that is exposed by the device. How many groups to use and which priority assign to each is driver's decision. So more than 2 groups is fine and supported by the requirements. In Linux net device example, 2 groups seem enough, but spec is not limited to it. When/if there is switch, it can also create a group and filter prioritize message before it reaches further nic processing. But we can keep this aside for now to not complicate the discussion more. So in nutshell, single interface is able to service the need of both ARFS and ethtool programming.


  • 90.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-14-2023 06:19
    On Mon, Aug 14, 2023 at 05:15:06AM +0000, Parav Pandit wrote:
    > > From: Heng Qi <hengqi@linux.alibaba.com>
    > > Sent: Tuesday, August 8, 2023 1:52 PM
    >
    > > > Yes, but having single interface for two use cases enables the device
    > > implementation to not build driver interface specific infra.
    > > > Both can be handled by unified interface.
    > >
    > > I reconsidered the number of groups, we don't necessarily have only two groups
    > > for the time being (one is RFF, the other is ARFS). For example, the driver may
    > > maintain groups with different priorities for RFF itself (for example, according
    > > to the number of fields contained in ntuple, etc.), and the driver may also
    > > maintain different groups with the same priority for different flow types of
    > > ARFS, etc.
    >
    > This is fine and covered with the interface.
    > Number of supported max groups is device capability that is exposed by the device.
    >
    > How many groups to use and which priority assign to each is driver's decision.
    > So more than 2 groups is fine and supported by the requirements.

    Yes, that's what I want to stress too.

    >
    > In Linux net device example, 2 groups seem enough,

    Sorry I didn't understand this, are you referring to a net device
    documentation or a driver implementation?

    > but spec is not limited to it.
    >
    > When/if there is switch, it can also create a group and filter prioritize message before it reaches further nic processing.
    > But we can keep this aside for now to not complicate the discussion more.

    Yes, I noticed Xuan's thread, so this can be discussed in his thread.

    Thanks!








  • 91.  Re: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-14-2023 06:19
    On Mon, Aug 14, 2023 at 05:15:06AM +0000, Parav Pandit wrote: > > From: Heng Qi <hengqi@linux.alibaba.com> > > Sent: Tuesday, August 8, 2023 1:52 PM > > > > Yes, but having single interface for two use cases enables the device > > implementation to not build driver interface specific infra. > > > Both can be handled by unified interface. > > > > I reconsidered the number of groups, we don't necessarily have only two groups > > for the time being (one is RFF, the other is ARFS). For example, the driver may > > maintain groups with different priorities for RFF itself (for example, according > > to the number of fields contained in ntuple, etc.), and the driver may also > > maintain different groups with the same priority for different flow types of > > ARFS, etc. > > This is fine and covered with the interface. > Number of supported max groups is device capability that is exposed by the device. > > How many groups to use and which priority assign to each is driver's decision. > So more than 2 groups is fine and supported by the requirements. Yes, that's what I want to stress too. > > In Linux net device example, 2 groups seem enough, Sorry I didn't understand this, are you referring to a net device documentation or a driver implementation? > but spec is not limited to it. > > When/if there is switch, it can also create a group and filter prioritize message before it reaches further nic processing. > But we can keep this aside for now to not complicate the discussion more. Yes, I noticed Xuan's thread, so this can be discussed in his thread. Thanks!


  • 92.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-14-2023 06:35

    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Monday, August 14, 2023 11:49 AM
    > > How many groups to use and which priority assign to each is driver's decision.
    > > So more than 2 groups is fine and supported by the requirements.
    >
    > Yes, that's what I want to stress too.
    >
    Ok.
    > >
    > > In Linux net device example, 2 groups seem enough,
    >
    > Sorry I didn't understand this, are you referring to a net device documentation
    > or a driver implementation?
    >
    Driver implementation.



  • 93.  RE: [PATCH requirements 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-14-2023 06:35
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Monday, August 14, 2023 11:49 AM > > How many groups to use and which priority assign to each is driver's decision. > > So more than 2 groups is fine and supported by the requirements. > > Yes, that's what I want to stress too. > Ok. > > > > In Linux net device example, 2 groups seem enough, > > Sorry I didn't understand this, are you referring to a net device documentation > or a driver implementation? > Driver implementation.