OASIS Virtual I/O Device (VIRTIO) TC

 View Only
Expand all | Collapse all

[PATCH requirements v5 0/7] virtio net requirements for 1.4

  • 1.  [PATCH requirements v5 0/7] virtio net requirements for 1.4

    Posted 08-18-2023 04:36
    Hi All, This document captures the virtio net device requirements for the upcoming release 1.4 that some of us are currently working on. This is live document to be updated in coming time and work towards it for its design which can result in a draft specification. The objectives are: 1. to consider these requirements in introducing new features listed in the document and work towards the interface design followed by drafting the specification changes. 2. Define list of requirements that can be practical to achieve in 1.4 timeframe incrementally and also have the ability to implement them. Please review mainly patch 5 at the priority. Receive flow filters is the first item apart from counters to complete in this iteration to start drafting the design spec. Rest of the requirements are largly untouched other than Stefan's comment. TODO: 1. Some more refinement needed for rx low latency and header data split requirements. changelog: v4->v5: - Refined receive flow filter requirements to be ready for spec draft - updated timestamp requirement to feedback from Willem - Fixed counters rquirements on comments from David v3->v4: - receive flow filters requirements undergo major updates to take to spec draft level. - Addressed comments from Xuan, Heng, David, Satananda. - Refined wordings in rest of the requirements v2->v3: - addressed comments from Stefan for tx low latency and notification - redrafted the requirements to use rearm term and avoid queue enable confusion for notification - addressed all comments and refined receive flow filters requirements to take to design level v1->v2: - major update of receive flow filter requirements updated based on last two design discussions in community and offline research - examples added - link to use case and design goal added - control and operation side requirements split - more verbose v0->v1: - addressed comments from Heng Li - addressed few (not all) comments from Michael - per patch changelog Parav Pandit (7): net-features: Add requirements document for release 1.4 net-features: Add low latency transmit queue requirements net-features: Add low latency receive queue requirements net-features: Add notification coalescing requirements net-features: Add n-tuple receive flow filters requirements net-features: Add packet timestamp requirements net-features: Add header data split requirements net-workstream/features-1.4.md 383 +++++++++++++++++++++++++++++++++ 1 file changed, 383 insertions(+) create mode 100644 net-workstream/features-1.4.md -- 2.26.2


  • 2.  [PATCH requirements v5 1/7] net-features: Add requirements document for release 1.4

    Posted 08-18-2023 04:37
    Add requirements document template for the virtio net features. Add virtio net device counters visible to driver. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v4->v5: - Fixed attributes query and counters query v3->v4: - Addressed comment from David - Added link to more counters that we are already discussing v0->v1: - removed tx dropped counter - updated requirements to mention about virtqueue interface for counters query --- net-workstream/features-1.4.md 41 ++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 net-workstream/features-1.4.md diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md new file mode 100644 index 0000000..ea36f09 --- /dev/null +++ b/net-workstream/features-1.4.md @@ -0,0 +1,41 @@ +# 1. Introduction + +This document describes the overall requirements for virtio net device +improvements for upcoming release 1.4. Some of these requirements are +interrelated and influence the interface design, hence reviewing them +together is desired while updating the virtio net interface. + +# 2. Summary +1. Device counters visible to the driver + +# 3. Requirements +## 3.1 Device counters +1. The driver should be able to query the device and/or per vq counters for + debugging purpose using a virtqueue directly from driver to device for + example using a control vq. +2. The driver should be able to query which counters are supported using a + virtqueue command, for example using an existing control vq. +3. If this device is migrated between two hosts, the driver should be able + get the counter values in the destination host from where it was left + off in the source host. +4. If a virtio device is a group member device, it must be possible to query + all of the group member counters via the group owner device. +5. If a virtio device is a group member device, it must be possible to query + all of the group member counter attributes via the group owner device. + +### 3.1.1 Per receive queue counters +1. le64 rx_oversize_pkt_errors: Packet dropped due to receive packet being + oversize than the buffer size +2. le64 rx_no_buffer_pkt_errors: Packet dropped due to unavailability of the + buffer in the receive queue +3. le64 rx_gso_pkts: Packets treated as receive GSO sequence by the device +4. le64 rx_pkts: Total packets received by the device + +### 3.1.2 Per transmit queue counters +1. le64 tx_gso_pkts: Packets send as transmit GSO sequence +2. le64 tx_pkts: Total packets send by the device + +### 3.1.3 More counters +More counters discussed in [1]. + +[1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00176.html -- 2.26.2


  • 3.  Re: [virtio-comment] [PATCH requirements v5 1/7] net-features: Add requirements document for release 1.4

    Posted 08-21-2023 10:45

    On Friday, 2023-08-18 at 07:35:51 +03, Parav Pandit wrote:
    > Add requirements document template for the virtio net features.
    >
    > Add virtio net device counters visible to driver.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>

    Acked-by: David Edmondson <david.edmondson@oracle.com>

    > ---
    > changelog:
    > v4->v5:
    > - Fixed attributes query and counters query
    > v3->v4:
    > - Addressed comment from David
    > - Added link to more counters that we are already discussing
    > v0->v1:
    > - removed tx dropped counter
    > - updated requirements to mention about virtqueue interface for counters
    > query
    > ---
    > net-workstream/features-1.4.md | 41 ++++++++++++++++++++++++++++++++++
    > 1 file changed, 41 insertions(+)
    > create mode 100644 net-workstream/features-1.4.md
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > new file mode 100644
    > index 0000000..ea36f09
    > --- /dev/null
    > +++ b/net-workstream/features-1.4.md
    > @@ -0,0 +1,41 @@
    > +# 1. Introduction
    > +
    > +This document describes the overall requirements for virtio net device
    > +improvements for upcoming release 1.4. Some of these requirements are
    > +interrelated and influence the interface design, hence reviewing them
    > +together is desired while updating the virtio net interface.
    > +
    > +# 2. Summary
    > +1. Device counters visible to the driver
    > +
    > +# 3. Requirements
    > +## 3.1 Device counters
    > +1. The driver should be able to query the device and/or per vq counters for
    > + debugging purpose using a virtqueue directly from driver to device for
    > + example using a control vq.
    > +2. The driver should be able to query which counters are supported using a
    > + virtqueue command, for example using an existing control vq.
    > +3. If this device is migrated between two hosts, the driver should be able
    > + get the counter values in the destination host from where it was left
    > + off in the source host.
    > +4. If a virtio device is a group member device, it must be possible to query
    > + all of the group member counters via the group owner device.
    > +5. If a virtio device is a group member device, it must be possible to query
    > + all of the group member counter attributes via the group owner device.
    > +
    > +### 3.1.1 Per receive queue counters
    > +1. le64 rx_oversize_pkt_errors: Packet dropped due to receive packet being
    > + oversize than the buffer size
    > +2. le64 rx_no_buffer_pkt_errors: Packet dropped due to unavailability of the
    > + buffer in the receive queue
    > +3. le64 rx_gso_pkts: Packets treated as receive GSO sequence by the device
    > +4. le64 rx_pkts: Total packets received by the device
    > +
    > +### 3.1.2 Per transmit queue counters
    > +1. le64 tx_gso_pkts: Packets send as transmit GSO sequence
    > +2. le64 tx_pkts: Total packets send by the device
    > +
    > +### 3.1.3 More counters
    > +More counters discussed in [1].
    > +
    > +[1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00176.html
    --
    I know a man called Sylvester, him have to wear a bullet proof vest y'all.



  • 4.  Re: [virtio-comment] [PATCH requirements v5 1/7] net-features: Add requirements document for release 1.4

    Posted 08-21-2023 10:45
    On Friday, 2023-08-18 at 07:35:51 +03, Parav Pandit wrote: > Add requirements document template for the virtio net features. > > Add virtio net device counters visible to driver. > > Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> > --- > changelog: > v4->v5: > - Fixed attributes query and counters query > v3->v4: > - Addressed comment from David > - Added link to more counters that we are already discussing > v0->v1: > - removed tx dropped counter > - updated requirements to mention about virtqueue interface for counters > query > --- > net-workstream/features-1.4.md 41 ++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > create mode 100644 net-workstream/features-1.4.md > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > new file mode 100644 > index 0000000..ea36f09 > --- /dev/null > +++ b/net-workstream/features-1.4.md > @@ -0,0 +1,41 @@ > +# 1. Introduction > + > +This document describes the overall requirements for virtio net device > +improvements for upcoming release 1.4. Some of these requirements are > +interrelated and influence the interface design, hence reviewing them > +together is desired while updating the virtio net interface. > + > +# 2. Summary > +1. Device counters visible to the driver > + > +# 3. Requirements > +## 3.1 Device counters > +1. The driver should be able to query the device and/or per vq counters for > + debugging purpose using a virtqueue directly from driver to device for > + example using a control vq. > +2. The driver should be able to query which counters are supported using a > + virtqueue command, for example using an existing control vq. > +3. If this device is migrated between two hosts, the driver should be able > + get the counter values in the destination host from where it was left > + off in the source host. > +4. If a virtio device is a group member device, it must be possible to query > + all of the group member counters via the group owner device. > +5. If a virtio device is a group member device, it must be possible to query > + all of the group member counter attributes via the group owner device. > + > +### 3.1.1 Per receive queue counters > +1. le64 rx_oversize_pkt_errors: Packet dropped due to receive packet being > + oversize than the buffer size > +2. le64 rx_no_buffer_pkt_errors: Packet dropped due to unavailability of the > + buffer in the receive queue > +3. le64 rx_gso_pkts: Packets treated as receive GSO sequence by the device > +4. le64 rx_pkts: Total packets received by the device > + > +### 3.1.2 Per transmit queue counters > +1. le64 tx_gso_pkts: Packets send as transmit GSO sequence > +2. le64 tx_pkts: Total packets send by the device > + > +### 3.1.3 More counters > +More counters discussed in [1]. > + > +[1] https://lists.oasis-open.org/archives/virtio-comment/202308/msg00176.html -- I know a man called Sylvester, him have to wear a bullet proof vest y'all.


  • 5.  [PATCH requirements v5 4/7] net-features: Add notification coalescing requirements

    Posted 08-18-2023 04:37
    Add virtio net device notification coalescing improvements requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v3->v4: - no change v1->v2: - addressed comments from Stefan - redrafted the requirements to use rearm term and avoid queue enable confusion v0->v1: - updated the description --- net-workstream/features-1.4.md 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index bc9e971..72b4132 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport +3. Virtqueue notification coalescing re-arming support # 3. Requirements ## 3.1 Device counters @@ -170,3 +171,13 @@ struct vnet_rx_completion { which can be recycled by the driver when the packets from the completed page is fully consumed. 8. The device should be able to consume multiple pages for a receive GSO stream. + +## 3.3 Virtqueue notification coalescing re-arming support +0. Design goal: + a. Avoid constant notifications from the device even in conditions when + the driver may not have acted on the previous pending notification. +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such + a notification is reported by the device, the device stops sending further + notifications until the driver rearms the notifications of the virtqueue. +2. When the driver rearms the notification of the virtqueue, the device + to notify again if notification coalescing conditions are met. -- 2.26.2


  • 6.  [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 08-18-2023 04:37
    Add requirements for the low latency receive queue. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v0->v1: - clarified the requirements further - added line for the gro case - added design goals as the motivation for the requirements --- net-workstream/features-1.4.md 45 +++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 1167ce2..bc9e971 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver -2. Low latency tx virtqueue for PCI transport +2. Low latency tx and rx virtqueues for PCI transport # 3. Requirements ## 3.1 Device counters @@ -127,3 +127,46 @@ struct vnet_data_desc desc[2]; 9. A flow filter virtqueue also similarly need the ability to inline the short flow command header. + +### 3.2.2 Low latency rx virtqueue +0. Design goal: + a. Keep packet metadata and buffer data together which is consumed by driver + layer and make it available in a single cache line of cpu + b. Instead of having per packet descriptors which is complex to scale for + the device, supply the page directly to the device to consume it based + on packet size +1. The device should be able to write a packet receive completion that consists + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write + PCIe TLP. +2. The device should be able to perform DMA writes of multiple packets + completions in a single DMA transaction up to the PCIe maximum write limit + in a transaction. +3. The device should be able to zero pad packet write completion to align it to + 64B or CPU cache line size whenever possible. +4. An example of the above DMA completion structure: + +``` +/* Constant size receive packet completion */ +struct vnet_rx_completion { + u16 flags; + u16 id; /* buffer id */ + u8 gso_type; + u8 reserved[3]; + le16 gso_hdr_len; + le16 gso_size; + le16 csum_start; + le16 csum_offset; + u16 reserved2; + u64 timestamp; /* explained later */ + u8 padding[]; +}; +``` +5. The driver should be able to post constant-size buffer pages on a receive + queue which can be consumed by the device for an incoming packet of any size + from 64B to 9K bytes. +6. The device should be able to know the constant buffer size at receive + virtqueue level instead of per buffer level. +7. The device should be able to indicate when a full page buffer is consumed, + which can be recycled by the driver when the packets from the completed + page is fully consumed. +8. The device should be able to consume multiple pages for a receive GSO stream. -- 2.26.2


  • 7.  Re: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 08-21-2023 10:47

    On Friday, 2023-08-18 at 07:35:53 +03, Parav Pandit wrote:
    > Add requirements for the low latency receive queue.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v0->v1:
    > - clarified the requirements further
    > - added line for the gro case
    > - added design goals as the motivation for the requirements
    > ---
    > net-workstream/features-1.4.md | 45 +++++++++++++++++++++++++++++++++-
    > 1 file changed, 44 insertions(+), 1 deletion(-)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 1167ce2..bc9e971 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
    >
    > # 2. Summary
    > 1. Device counters visible to the driver
    > -2. Low latency tx virtqueue for PCI transport
    > +2. Low latency tx and rx virtqueues for PCI transport
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -127,3 +127,46 @@ struct vnet_data_desc desc[2];
    >
    > 9. A flow filter virtqueue also similarly need the ability to inline the short flow
    > command header.
    > +
    > +### 3.2.2 Low latency rx virtqueue
    > +0. Design goal:
    > + a. Keep packet metadata and buffer data together which is consumed by driver
    > + layer and make it available in a single cache line of cpu

    Phrased like this, it seems to run counter to the "header data split"
    requirement.

    Is there an implicit guard that this only applies for very small payloads?

    > + b. Instead of having per packet descriptors which is complex to scale for
    > + the device, supply the page directly to the device to consume it based
    > + on packet size
    > +1. The device should be able to write a packet receive completion that consists
    > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
    > + PCIe TLP.
    > +2. The device should be able to perform DMA writes of multiple packets
    > + completions in a single DMA transaction up to the PCIe maximum write limit
    > + in a transaction.
    > +3. The device should be able to zero pad packet write completion to align it to
    > + 64B or CPU cache line size whenever possible.
    > +4. An example of the above DMA completion structure:
    > +
    > +```
    > +/* Constant size receive packet completion */
    > +struct vnet_rx_completion {
    > + u16 flags;
    > + u16 id; /* buffer id */
    > + u8 gso_type;
    > + u8 reserved[3];
    > + le16 gso_hdr_len;
    > + le16 gso_size;
    > + le16 csum_start;
    > + le16 csum_offset;
    > + u16 reserved2;
    > + u64 timestamp; /* explained later */
    > + u8 padding[];
    > +};
    > +```
    > +5. The driver should be able to post constant-size buffer pages on a receive
    > + queue which can be consumed by the device for an incoming packet of any size
    > + from 64B to 9K bytes.
    > +6. The device should be able to know the constant buffer size at receive
    > + virtqueue level instead of per buffer level.
    > +7. The device should be able to indicate when a full page buffer is consumed,
    > + which can be recycled by the driver when the packets from the completed
    > + page is fully consumed.
    > +8. The device should be able to consume multiple pages for a receive GSO stream.
    --
    Modern people tend to dance.



  • 8.  Re: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 08-21-2023 10:48
    On Friday, 2023-08-18 at 07:35:53 +03, Parav Pandit wrote: > Add requirements for the low latency receive queue. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v0->v1: > - clarified the requirements further > - added line for the gro case > - added design goals as the motivation for the requirements > --- > net-workstream/features-1.4.md 45 +++++++++++++++++++++++++++++++++- > 1 file changed, 44 insertions(+), 1 deletion(-) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 1167ce2..bc9e971 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. > > # 2. Summary > 1. Device counters visible to the driver > -2. Low latency tx virtqueue for PCI transport > +2. Low latency tx and rx virtqueues for PCI transport > > # 3. Requirements > ## 3.1 Device counters > @@ -127,3 +127,46 @@ struct vnet_data_desc desc[2]; > > 9. A flow filter virtqueue also similarly need the ability to inline the short flow > command header. > + > +### 3.2.2 Low latency rx virtqueue > +0. Design goal: > + a. Keep packet metadata and buffer data together which is consumed by driver > + layer and make it available in a single cache line of cpu Phrased like this, it seems to run counter to the "header data split" requirement. Is there an implicit guard that this only applies for very small payloads? > + b. Instead of having per packet descriptors which is complex to scale for > + the device, supply the page directly to the device to consume it based > + on packet size > +1. The device should be able to write a packet receive completion that consists > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write > + PCIe TLP. > +2. The device should be able to perform DMA writes of multiple packets > + completions in a single DMA transaction up to the PCIe maximum write limit > + in a transaction. > +3. The device should be able to zero pad packet write completion to align it to > + 64B or CPU cache line size whenever possible. > +4. An example of the above DMA completion structure: > + > +``` > +/* Constant size receive packet completion */ > +struct vnet_rx_completion { > + u16 flags; > + u16 id; /* buffer id */ > + u8 gso_type; > + u8 reserved[3]; > + le16 gso_hdr_len; > + le16 gso_size; > + le16 csum_start; > + le16 csum_offset; > + u16 reserved2; > + u64 timestamp; /* explained later */ > + u8 padding[]; > +}; > +``` > +5. The driver should be able to post constant-size buffer pages on a receive > + queue which can be consumed by the device for an incoming packet of any size > + from 64B to 9K bytes. > +6. The device should be able to know the constant buffer size at receive > + virtqueue level instead of per buffer level. > +7. The device should be able to indicate when a full page buffer is consumed, > + which can be recycled by the driver when the packets from the completed > + page is fully consumed. > +8. The device should be able to consume multiple pages for a receive GSO stream. -- Modern people tend to dance.


  • 9.  RE: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 08-22-2023 06:12

    > From: David Edmondson <david.edmondson@oracle.com>
    > Sent: Monday, August 21, 2023 4:17 PM

    > > +### 3.2.2 Low latency rx virtqueue
    > > +0. Design goal:
    > > + a. Keep packet metadata and buffer data together which is consumed by
    > driver
    > > + layer and make it available in a single cache line of cpu
    >
    > Phrased like this, it seems to run counter to the "header data split"
    > requirement.
    >
    Mostly not. Currently, the packet metadata consumed by the driver is spread in two different DMAs at two different addresses.
    For split q: virtio_net_hdr + used ring
    For split q: virtio_net_hdr + desc.

    Instead, both to complete in single PCIe DMA and also read in single cache line from the cpu while processing it.

    > Is there an implicit guard that this only applies for very small payloads?
    >
    No.
    All packet sizes benefit from it.

    > > + b. Instead of having per packet descriptors which is complex to scale for
    > > + the device, supply the page directly to the device to consume it based
    > > + on packet size
    > > +1. The device should be able to write a packet receive completion that
    > consists
    > > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA
    > write
    > > + PCIe TLP.
    > > +2. The device should be able to perform DMA writes of multiple packets
    > > + completions in a single DMA transaction up to the PCIe maximum write
    > limit
    > > + in a transaction.
    > > +3. The device should be able to zero pad packet write completion to align it
    > to
    > > + 64B or CPU cache line size whenever possible.
    > > +4. An example of the above DMA completion structure:
    > > +
    > > +```
    > > +/* Constant size receive packet completion */ struct
    > > +vnet_rx_completion {
    > > + u16 flags;
    > > + u16 id; /* buffer id */
    > > + u8 gso_type;
    > > + u8 reserved[3];
    > > + le16 gso_hdr_len;
    > > + le16 gso_size;
    > > + le16 csum_start;
    > > + le16 csum_offset;
    > > + u16 reserved2;
    > > + u64 timestamp; /* explained later */
    > > + u8 padding[];
    > > +};
    > > +```
    > > +5. The driver should be able to post constant-size buffer pages on a receive
    > > + queue which can be consumed by the device for an incoming packet of any
    > size
    > > + from 64B to 9K bytes.
    > > +6. The device should be able to know the constant buffer size at receive
    > > + virtqueue level instead of per buffer level.
    > > +7. The device should be able to indicate when a full page buffer is consumed,
    > > + which can be recycled by the driver when the packets from the completed
    > > + page is fully consumed.
    > > +8. The device should be able to consume multiple pages for a receive GSO
    > stream.
    > --
    > Modern people tend to dance.



  • 10.  RE: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 08-22-2023 06:12
    > From: David Edmondson <david.edmondson@oracle.com> > Sent: Monday, August 21, 2023 4:17 PM > > +### 3.2.2 Low latency rx virtqueue > > +0. Design goal: > > + a. Keep packet metadata and buffer data together which is consumed by > driver > > + layer and make it available in a single cache line of cpu > > Phrased like this, it seems to run counter to the "header data split" > requirement. > Mostly not. Currently, the packet metadata consumed by the driver is spread in two different DMAs at two different addresses. For split q: virtio_net_hdr + used ring For split q: virtio_net_hdr + desc. Instead, both to complete in single PCIe DMA and also read in single cache line from the cpu while processing it. > Is there an implicit guard that this only applies for very small payloads? > No. All packet sizes benefit from it. > > + b. Instead of having per packet descriptors which is complex to scale for > > + the device, supply the page directly to the device to consume it based > > + on packet size > > +1. The device should be able to write a packet receive completion that > consists > > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA > write > > + PCIe TLP. > > +2. The device should be able to perform DMA writes of multiple packets > > + completions in a single DMA transaction up to the PCIe maximum write > limit > > + in a transaction. > > +3. The device should be able to zero pad packet write completion to align it > to > > + 64B or CPU cache line size whenever possible. > > +4. An example of the above DMA completion structure: > > + > > +``` > > +/* Constant size receive packet completion */ struct > > +vnet_rx_completion { > > + u16 flags; > > + u16 id; /* buffer id */ > > + u8 gso_type; > > + u8 reserved[3]; > > + le16 gso_hdr_len; > > + le16 gso_size; > > + le16 csum_start; > > + le16 csum_offset; > > + u16 reserved2; > > + u64 timestamp; /* explained later */ > > + u8 padding[]; > > +}; > > +``` > > +5. The driver should be able to post constant-size buffer pages on a receive > > + queue which can be consumed by the device for an incoming packet of any > size > > + from 64B to 9K bytes. > > +6. The device should be able to know the constant buffer size at receive > > + virtqueue level instead of per buffer level. > > +7. The device should be able to indicate when a full page buffer is consumed, > > + which can be recycled by the driver when the packets from the completed > > + page is fully consumed. > > +8. The device should be able to consume multiple pages for a receive GSO > stream. > -- > Modern people tend to dance.


  • 11.  Re: [virtio] [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 09-11-2023 13:47
    On Fri, Aug 18, 2023 at 07:35:53AM +0300, Parav Pandit wrote:
    > +### 3.2.2 Low latency rx virtqueue
    > +0. Design goal:
    > + a. Keep packet metadata and buffer data together which is consumed by driver
    > + layer and make it available in a single cache line of cpu
    > + b. Instead of having per packet descriptors which is complex to scale for
    > + the device, supply the page directly to the device to consume it based
    > + on packet size
    > +1. The device should be able to write a packet receive completion that consists
    > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
    > + PCIe TLP.
    > +2. The device should be able to perform DMA writes of multiple packets
    > + completions in a single DMA transaction up to the PCIe maximum write limit
    > + in a transaction.
    > +3. The device should be able to zero pad packet write completion to align it to
    > + 64B or CPU cache line size whenever possible.
    > +4. An example of the above DMA completion structure:
    > +
    > +```
    > +/* Constant size receive packet completion */
    > +struct vnet_rx_completion {
    > + u16 flags;
    > + u16 id; /* buffer id */
    > + u8 gso_type;
    > + u8 reserved[3];
    > + le16 gso_hdr_len;
    > + le16 gso_size;
    > + le16 csum_start;
    > + le16 csum_offset;
    > + u16 reserved2;
    > + u64 timestamp; /* explained later */
    > + u8 padding[];
    > +};
    > +```
    > +5. The driver should be able to post constant-size buffer pages on a receive
    > + queue which can be consumed by the device for an incoming packet of any size
    > + from 64B to 9K bytes.
    > +6. The device should be able to know the constant buffer size at receive
    > + virtqueue level instead of per buffer level.
    > +7. The device should be able to indicate when a full page buffer is consumed,
    > + which can be recycled by the driver when the packets from the completed
    > + page is fully consumed.
    > +8. The device should be able to consume multiple pages for a receive GSO stream.

    If I understand correctly there is no longer a 1:1 correspondence
    between driver-supplied rx pages (available buffers) and received
    packets (used buffers). Instead, the device consumes portions of
    driver-supplied rx pages as needed and notifies the driver, and the
    entire rx page is marked used later when it has been fully consumed.

    The virtqueue model is based on submitting available buffers and
    completing used buffers, not individual DMA transfers. It's not possible
    to do DMA piecewise in this model. If you think about a VIRTIO over TCP
    transport that uses message-passing for available and used buffers, then
    it's clear the rx page approach breaks the model because only entire
    virtqueues buffers can be marked used (there is a 1:1 correspondence
    between available buffers and used buffers).

    Two options:
    1. Extend the virtqueue model to support this.
    2. Document this violation of the virtqueue model clearly but treat it
    as an exception that may lead to complications in the future (e.g.
    incompatibility with VIRTIO over TCP).

    I think it's worth investigating #1 to see whether the virtqueue model
    can be extended cleanly.

    Stefan



  • 12.  Re: [virtio] [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 09-11-2023 13:47
    On Fri, Aug 18, 2023 at 07:35:53AM +0300, Parav Pandit wrote: > +### 3.2.2 Low latency rx virtqueue > +0. Design goal: > + a. Keep packet metadata and buffer data together which is consumed by driver > + layer and make it available in a single cache line of cpu > + b. Instead of having per packet descriptors which is complex to scale for > + the device, supply the page directly to the device to consume it based > + on packet size > +1. The device should be able to write a packet receive completion that consists > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write > + PCIe TLP. > +2. The device should be able to perform DMA writes of multiple packets > + completions in a single DMA transaction up to the PCIe maximum write limit > + in a transaction. > +3. The device should be able to zero pad packet write completion to align it to > + 64B or CPU cache line size whenever possible. > +4. An example of the above DMA completion structure: > + > +``` > +/* Constant size receive packet completion */ > +struct vnet_rx_completion { > + u16 flags; > + u16 id; /* buffer id */ > + u8 gso_type; > + u8 reserved[3]; > + le16 gso_hdr_len; > + le16 gso_size; > + le16 csum_start; > + le16 csum_offset; > + u16 reserved2; > + u64 timestamp; /* explained later */ > + u8 padding[]; > +}; > +``` > +5. The driver should be able to post constant-size buffer pages on a receive > + queue which can be consumed by the device for an incoming packet of any size > + from 64B to 9K bytes. > +6. The device should be able to know the constant buffer size at receive > + virtqueue level instead of per buffer level. > +7. The device should be able to indicate when a full page buffer is consumed, > + which can be recycled by the driver when the packets from the completed > + page is fully consumed. > +8. The device should be able to consume multiple pages for a receive GSO stream. If I understand correctly there is no longer a 1:1 correspondence between driver-supplied rx pages (available buffers) and received packets (used buffers). Instead, the device consumes portions of driver-supplied rx pages as needed and notifies the driver, and the entire rx page is marked used later when it has been fully consumed. The virtqueue model is based on submitting available buffers and completing used buffers, not individual DMA transfers. It's not possible to do DMA piecewise in this model. If you think about a VIRTIO over TCP transport that uses message-passing for available and used buffers, then it's clear the rx page approach breaks the model because only entire virtqueues buffers can be marked used (there is a 1:1 correspondence between available buffers and used buffers). Two options: 1. Extend the virtqueue model to support this. 2. Document this violation of the virtqueue model clearly but treat it as an exception that may lead to complications in the future (e.g. incompatibility with VIRTIO over TCP). I think it's worth investigating #1 to see whether the virtqueue model can be extended cleanly. Stefan Attachment: signature.asc Description: PGP signature


  • 13.  RE: [virtio] [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 09-11-2023 16:03


    > From: Stefan Hajnoczi <stefanha@redhat.com>
    > Sent: Monday, September 11, 2023 7:17 PM

    >
    > If I understand correctly there is no longer a 1:1 correspondence between
    > driver-supplied rx pages (available buffers) and received packets (used buffers).
    > Instead, the device consumes portions of driver-supplied rx pages as needed
    > and notifies the driver, and the entire rx page is marked used later when it has
    > been fully consumed.
    >
    > The virtqueue model is based on submitting available buffers and completing
    > used buffers, not individual DMA transfers. It's not possible to do DMA
    > piecewise in this model. If you think about a VIRTIO over TCP transport that
    > uses message-passing for available and used buffers, then it's clear the rx page
    > approach breaks the model because only entire virtqueues buffers can be
    > marked used (there is a 1:1 correspondence between available buffers and used
    > buffers).
    >
    > Two options:
    > 1. Extend the virtqueue model to support this.
    > 2. Document this violation of the virtqueue model clearly but treat it
    > as an exception that may lead to complications in the future (e.g.
    > incompatibility with VIRTIO over TCP).
    >
    I don't think it a violation. It is an extension of a new model. PCI and MMIO will support it.
    TCP transport may not be able support everything that exists today in PCI.
    But I am not fully sure at present this as limitation.

    I will consider #1 later in this month further.
    This week occupied with the LM and flow filters that we want to review on Wed meet.

    > I think it's worth investigating #1 to see whether the virtqueue model can be
    > extended cleanly.




  • 14.  RE: [virtio] [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

    Posted 09-11-2023 16:03
    > From: Stefan Hajnoczi <stefanha@redhat.com> > Sent: Monday, September 11, 2023 7:17 PM > > If I understand correctly there is no longer a 1:1 correspondence between > driver-supplied rx pages (available buffers) and received packets (used buffers). > Instead, the device consumes portions of driver-supplied rx pages as needed > and notifies the driver, and the entire rx page is marked used later when it has > been fully consumed. > > The virtqueue model is based on submitting available buffers and completing > used buffers, not individual DMA transfers. It's not possible to do DMA > piecewise in this model. If you think about a VIRTIO over TCP transport that > uses message-passing for available and used buffers, then it's clear the rx page > approach breaks the model because only entire virtqueues buffers can be > marked used (there is a 1:1 correspondence between available buffers and used > buffers). > > Two options: > 1. Extend the virtqueue model to support this. > 2. Document this violation of the virtqueue model clearly but treat it > as an exception that may lead to complications in the future (e.g. > incompatibility with VIRTIO over TCP). > I don't think it a violation. It is an extension of a new model. PCI and MMIO will support it. TCP transport may not be able support everything that exists today in PCI. But I am not fully sure at present this as limitation. I will consider #1 later in this month further. This week occupied with the LM and flow filters that we want to review on Wed meet. > I think it's worth investigating #1 to see whether the virtqueue model can be > extended cleanly.


  • 15.  [PATCH requirements v5 7/7] net-features: Add header data split requirements

    Posted 08-18-2023 04:37
    Add header data split requirements for the receive packets. Signed-off-by: Parav Pandit <parav@nvidia.com> --- net-workstream/features-1.4.md 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 31aa587..7a56fa8 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. 3. Virtqueue notification coalescing re-arming support 4 Virtqueue receive flow filters (RFF) 5. Device timestamp for tx and rx packets +6. Header data split for the receive virtqueue # 3. Requirements ## 3.1 Device counters @@ -368,3 +369,15 @@ c. If/when virtio switch object is implemented, support ingress/egress flow point of reception from the network. 3. The device should provide a receive packet timestamp in a single DMA transaction along with the rest of the receive completion fields. + +## 3.6 Header data split for the receive virtqueue +1. The device should be able to DMA the packet header and data to two different + memory locations, this enables driver and networking stack to perform zero + copy to application buffer(s). +2. The driver should be able to configure maximum header buffer size per + virtqueue. +3. The header buffer to be in a physically contiguous memory per virtqueue +4. The device should be able to indicate header data split in the receive + completion. +5. The device should be able to zero pad the header buffer when the received + header is shorter than cpu cache line size. -- 2.26.2


  • 16.  Re: [PATCH requirements v5 7/7] net-features: Add header data split requirements

    Posted 08-21-2023 10:46

    On Friday, 2023-08-18 at 07:35:57 +03, Parav Pandit wrote:
    > Add header data split requirements for the receive packets.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>

    Acked-by: David Edmondson <david.edmondson@oracle.com>

    > ---
    > net-workstream/features-1.4.md | 13 +++++++++++++
    > 1 file changed, 13 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 31aa587..7a56fa8 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface.
    > 3. Virtqueue notification coalescing re-arming support
    > 4 Virtqueue receive flow filters (RFF)
    > 5. Device timestamp for tx and rx packets
    > +6. Header data split for the receive virtqueue
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -368,3 +369,15 @@ c. If/when virtio switch object is implemented, support ingress/egress flow
    > point of reception from the network.
    > 3. The device should provide a receive packet timestamp in a single DMA
    > transaction along with the rest of the receive completion fields.
    > +
    > +## 3.6 Header data split for the receive virtqueue
    > +1. The device should be able to DMA the packet header and data to two different
    > + memory locations, this enables driver and networking stack to perform zero
    > + copy to application buffer(s).
    > +2. The driver should be able to configure maximum header buffer size per
    > + virtqueue.
    > +3. The header buffer to be in a physically contiguous memory per virtqueue
    > +4. The device should be able to indicate header data split in the receive
    > + completion.
    > +5. The device should be able to zero pad the header buffer when the received
    > + header is shorter than cpu cache line size.
    --
    Do I have to tell the story, of a thousand rainy days since we first met?



  • 17.  Re: [PATCH requirements v5 7/7] net-features: Add header data split requirements

    Posted 08-21-2023 10:46
    On Friday, 2023-08-18 at 07:35:57 +03, Parav Pandit wrote: > Add header data split requirements for the receive packets. > > Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> > --- > net-workstream/features-1.4.md 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 31aa587..7a56fa8 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. > 3. Virtqueue notification coalescing re-arming support > 4 Virtqueue receive flow filters (RFF) > 5. Device timestamp for tx and rx packets > +6. Header data split for the receive virtqueue > > # 3. Requirements > ## 3.1 Device counters > @@ -368,3 +369,15 @@ c. If/when virtio switch object is implemented, support ingress/egress flow > point of reception from the network. > 3. The device should provide a receive packet timestamp in a single DMA > transaction along with the rest of the receive completion fields. > + > +## 3.6 Header data split for the receive virtqueue > +1. The device should be able to DMA the packet header and data to two different > + memory locations, this enables driver and networking stack to perform zero > + copy to application buffer(s). > +2. The driver should be able to configure maximum header buffer size per > + virtqueue. > +3. The header buffer to be in a physically contiguous memory per virtqueue > +4. The device should be able to indicate header data split in the receive > + completion. > +5. The device should be able to zero pad the header buffer when the received > + header is shorter than cpu cache line size. -- Do I have to tell the story, of a thousand rainy days since we first met?


  • 18.  [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-18-2023 04:37
    Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> --- changelog: v4->v5: - Rewrote cvq and flow filter vq mutual exclusive text - added cvq command to enable flow filters on cvq - made commands more refined for priority, opcode and more - Addressed comments from Heng - restructured interface commands v3->v4: - Addressed comments from Satananda, Heng, David - removed context specific wording, replaced with destination - added group create/delete examples and updated requirements - added optional support to use cvq for flor filter commands - added example of transporting flow filter commands over cvq - made group size to be 16-bit - added concept of 0->n max flow filter entries based on max count - added concept of 0->n max flow group based on max count - split field bitmask to separate command from other filter capabilities - rewrote rx filter processing chain order with respect to existing filter commands and rss - made flow_id flat across all groups v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 163 +++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 72b4132..330949c 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -181,3 +182,165 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) +0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific destination to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. +5. Flow filter operations are often accelerated by device in a hardware. Ability + to handle them on a queue other than control vq is desired. This achieves near + zero modifications to existing implementations to add new operations on new + purpose built queues (similar to transmit and receive queue). Some devices + may not support flow filter queues and may want to support flow filter operations + over existing cvq, this gives the ability to utilize an existing cvq. + Therefore, + a. Flow filter queues and flow filter commands on cvq are mutually exclusive. + b. When flow filter queues are supported, the driver should use the flow filter + queues flow filter operations. + (Since cvq is not enabled for flow filters, any flow filter command coming + on cvq must fail). + c. If driver wants to use flow filters over cvq, driver must explicitly + enable flow filters on cvq via a command, when it is enabled on the cvq + driver cannot use flow filter queues. This eliminates any synchronization + needed by the device among different types of queues. +6. The filter masks are optional; the device should be able to expose if it + support filter masks. +7. The driver may want to have priority among group of flow entries; to facilitate + the device support grouping flow filter entries by a notion of a flow group. + Each flow group defines priority in processing flow. +8. The driver and group owner driver should be able to query supported device + limits for the receive flow filters. +9. Query the flow filter capabilities of the member device by the owner device + using administrative command. + +### 3.4.2 flow operations path +1. The driver should be able to define a receive packet match criteria, an + action and a destination for a packet. For example, an ipv4 packet with a + multicast address to be steered to the receive vq 0. The second example is + ipv4, tcp packet matching a specified IP address and tcp port tuple to + be steered to receive vq 10. +2. The match criteria should include exact tuple fields well-defined such as mac + address, IP addresses, tcp/udp ports, etc. +3. The match criteria should also optionally include the field mask. +4. Action includes (a) dropping or (b) forwarding the packet. +5. Destination is a receive virtqueue index. +6. Receive packet processing chain is: + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. + b. filters programmed using RFF functiionality. + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command. + Whichever filtering and steering functionality is enabled, they are applied + in the above order. +7. If multiple entries are programmed which has overlapping filtering attributes + for a received packet, the driver to define the location/priority of the entry. +8. The filter entries are usually short in size of few tens of bytes, + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is + high, hence supplying fields inside the queue descriptor is preferred for + up to a certain fixed size, say 96 bytes. +9. A flow filter entry consists of (a) match criteria, (b) action, + (c) destination and (d) a unique 32 bit flow id, all supplied by the + driver. +10. The driver should be able to query and delete flow filter entry + by the flow id. + +### 3.4.3 interface example + +1. Flow filter capabilities to query using a DMA interface such as cvq +using two different commands. + +``` +struct virtio_net_rff_cmd { + u8 class; /* RFF class */ + u8 commands; /* 0 = query cap + * 1 = query packet fields mask + * 2 = enable flow filter operations over cvq + * 3 = add flow group + * 4 = del flow group + * 5 = flow filter op. + */ + u8 command-specific-data[]; +}; + +/* command 1 (query) */ +struct flow_filter_capabilities { + le16 start_vq_index; + le16 num_flow_filter_vqs; + le16 max_flow_groups; /* valid group id = max_flow_groups - 1 */ + le16 max_group_priorities; /* max priorities of the group */ + le32 max_flow_filters_per_group; + le32 max_flow_filters; /* max flow_id in add/del + * is equal = max_flow_filters - 1. + */ + u8 max_priorities_per_group; + u8 cvq_supports_flow_filters_ops; +}; + +/* command 2 (query packet field masks) */ +struct flow_filter_fields_support_mask { + le64 supported_packet_field_mask_bmap[1]; +}; + +``` + +2. Group add/delete cvq commands: + +``` +/* command 3 */ +struct virtio_net_rff_group_add { + le16 priority; /* higher the value, higher priority */ + le16 group_id; +}; + + +/* command 4 */ +struct virtio_net_rff_group_delete { + le16 group_id; + +``` + +3. Flow filter entry add/modify, delete over flow vq: + +``` +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 priority; /* higher the value, higher priority */ + u16 group_id; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +struct virtio_net_rff_delete { + u8 flow_op; + u8 padding[3]; + le32 flow_id; +}; + +``` + +### 3.4.4 For incremental future +a. Driver should be able to specify a specific packet byte offset, number + of bytes and mask as math criteria. +b. Support RSS context, in addition to a specific RQ. +c. If/when virtio switch object is implemented, support ingress/egress flow + filters at the switch port level. -- 2.26.2


  • 19.  Re: [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-21-2023 05:06


    ? 2023/8/18 ??12:35, Parav Pandit ??:
    > Add virtio net device requirements for receive flow filters.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > Signed-off-by: Satananda Burla <sburla@marvell.com>
    > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
    > ---
    > changelog:
    > v4->v5:
    > - Rewrote cvq and flow filter vq mutual exclusive text
    > - added cvq command to enable flow filters on cvq
    > - made commands more refined for priority, opcode and more
    > - Addressed comments from Heng
    > - restructured interface commands
    > v3->v4:
    > - Addressed comments from Satananda, Heng, David
    > - removed context specific wording, replaced with destination
    > - added group create/delete examples and updated requirements
    > - added optional support to use cvq for flor filter commands
    > - added example of transporting flow filter commands over cvq
    > - made group size to be 16-bit
    > - added concept of 0->n max flow filter entries based on max count
    > - added concept of 0->n max flow group based on max count
    > - split field bitmask to separate command from other filter capabilities
    > - rewrote rx filter processing chain order with respect to existing
    > filter commands and rss
    > - made flow_id flat across all groups
    > v1->v2:
    > - split setup and operations requirements
    > - added design goal
    > - worded requirements more precisely
    > v0->v1:
    > - fixed comments from Heng Li
    > - renamed receive flow steering to receive flow filters
    > - clarified byte offset in match criteria
    > ---
    > net-workstream/features-1.4.md | 163 +++++++++++++++++++++++++++++++++
    > 1 file changed, 163 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 72b4132..330949c 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    > 1. Device counters visible to the driver
    > 2. Low latency tx and rx virtqueues for PCI transport
    > 3. Virtqueue notification coalescing re-arming support
    > +4 Virtqueue receive flow filters (RFF)
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -181,3 +182,165 @@ struct vnet_rx_completion {
    > notifications until the driver rearms the notifications of the virtqueue.
    > 2. When the driver rearms the notification of the virtqueue, the device
    > to notify again if notification coalescing conditions are met.
    > +
    > +## 3.4 Virtqueue receive flow filters (RFF)
    > +0. Design goal:
    > + To filter and/or to steer packet based on specific pattern match to a
    > + specific destination to support application/networking stack driven receive
    > + processing.
    > +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS
    > + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    > +
    > +### 3.4.1 control path
    > +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec
    > + or even more. Hence flow filter operations must be done over a queueing
    > + interface using one or more queues.
    > +2. The device should be able to expose one or more supported flow filter queue
    > + count and its start vq index to the driver.
    > +3. As each device may be operating for different performance characteristic,
    > + start vq index and count may be different for each device. Secondly, it is
    > + inefficient for device to provide flow filters capabilities via a config space
    > + region. Hence, the device should be able to share these attributes using
    > + dma interface, instead of transport registers.
    > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > + will likely create these queues when flow filters are enabled.

    I remember here that the creation phase of flow vq is in the probe
    phase, and a feature is used to indicate dynamic creation.
    Or do we want to revert to a workaround similar to vq reset here?

    Thanks!

    > +5. Flow filter operations are often accelerated by device in a hardware. Ability
    > + to handle them on a queue other than control vq is desired. This achieves near
    > + zero modifications to existing implementations to add new operations on new
    > + purpose built queues (similar to transmit and receive queue). Some devices
    > + may not support flow filter queues and may want to support flow filter operations
    > + over existing cvq, this gives the ability to utilize an existing cvq.
    > + Therefore,
    > + a. Flow filter queues and flow filter commands on cvq are mutually exclusive.
    > + b. When flow filter queues are supported, the driver should use the flow filter
    > + queues flow filter operations.
    > + (Since cvq is not enabled for flow filters, any flow filter command coming
    > + on cvq must fail).
    > + c. If driver wants to use flow filters over cvq, driver must explicitly
    > + enable flow filters on cvq via a command, when it is enabled on the cvq
    > + driver cannot use flow filter queues. This eliminates any synchronization
    > + needed by the device among different types of queues.
    > +6. The filter masks are optional; the device should be able to expose if it
    > + support filter masks.
    > +7. The driver may want to have priority among group of flow entries; to facilitate
    > + the device support grouping flow filter entries by a notion of a flow group.
    > + Each flow group defines priority in processing flow.
    > +8. The driver and group owner driver should be able to query supported device
    > + limits for the receive flow filters.
    > +9. Query the flow filter capabilities of the member device by the owner device
    > + using administrative command.
    > +
    > +### 3.4.2 flow operations path
    > +1. The driver should be able to define a receive packet match criteria, an
    > + action and a destination for a packet. For example, an ipv4 packet with a
    > + multicast address to be steered to the receive vq 0. The second example is
    > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > + be steered to receive vq 10.
    > +2. The match criteria should include exact tuple fields well-defined such as mac
    > + address, IP addresses, tcp/udp ports, etc.
    > +3. The match criteria should also optionally include the field mask.
    > +4. Action includes (a) dropping or (b) forwarding the packet.
    > +5. Destination is a receive virtqueue index.
    > +6. Receive packet processing chain is:
    > + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX,
    > + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN.
    > + b. filters programmed using RFF functiionality.
    > + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command.
    > + Whichever filtering and steering functionality is enabled, they are applied
    > + in the above order.
    > +7. If multiple entries are programmed which has overlapping filtering attributes
    > + for a received packet, the driver to define the location/priority of the entry.
    > +8. The filter entries are usually short in size of few tens of bytes,
    > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > + high, hence supplying fields inside the queue descriptor is preferred for
    > + up to a certain fixed size, say 96 bytes.
    > +9. A flow filter entry consists of (a) match criteria, (b) action,
    > + (c) destination and (d) a unique 32 bit flow id, all supplied by the
    > + driver.
    > +10. The driver should be able to query and delete flow filter entry
    > + by the flow id.
    > +
    > +### 3.4.3 interface example
    > +
    > +1. Flow filter capabilities to query using a DMA interface such as cvq
    > +using two different commands.
    > +
    > +```
    > +struct virtio_net_rff_cmd {
    > + u8 class; /* RFF class */
    > + u8 commands; /* 0 = query cap
    > + * 1 = query packet fields mask
    > + * 2 = enable flow filter operations over cvq
    > + * 3 = add flow group
    > + * 4 = del flow group
    > + * 5 = flow filter op.
    > + */
    > + u8 command-specific-data[];
    > +};
    > +
    > +/* command 1 (query) */
    > +struct flow_filter_capabilities {
    > + le16 start_vq_index;
    > + le16 num_flow_filter_vqs;
    > + le16 max_flow_groups; /* valid group id = max_flow_groups - 1 */
    > + le16 max_group_priorities; /* max priorities of the group */
    > + le32 max_flow_filters_per_group;
    > + le32 max_flow_filters; /* max flow_id in add/del
    > + * is equal = max_flow_filters - 1.
    > + */
    > + u8 max_priorities_per_group;
    > + u8 cvq_supports_flow_filters_ops;
    > +};
    > +
    > +/* command 2 (query packet field masks) */
    > +struct flow_filter_fields_support_mask {
    > + le64 supported_packet_field_mask_bmap[1];
    > +};
    > +
    > +```
    > +
    > +2. Group add/delete cvq commands:
    > +
    > +```
    > +/* command 3 */
    > +struct virtio_net_rff_group_add {
    > + le16 priority; /* higher the value, higher priority */
    > + le16 group_id;
    > +};
    > +
    > +
    > +/* command 4 */
    > +struct virtio_net_rff_group_delete {
    > + le16 group_id;
    > +
    > +```
    > +
    > +3. Flow filter entry add/modify, delete over flow vq:
    > +
    > +```
    > +struct virtio_net_rff_add_modify {
    > + u8 flow_op;
    > + u8 priority; /* higher the value, higher priority */
    > + u16 group_id;
    > + le32 flow_id;
    > + struct match_criteria mc;
    > + struct destination dest;
    > + struct action action;
    > +
    > + struct match_criteria mask; /* optional */
    > +};
    > +
    > +struct virtio_net_rff_delete {
    > + u8 flow_op;
    > + u8 padding[3];
    > + le32 flow_id;
    > +};
    > +
    > +```
    > +
    > +### 3.4.4 For incremental future
    > +a. Driver should be able to specify a specific packet byte offset, number
    > + of bytes and mask as math criteria.
    > +b. Support RSS context, in addition to a specific RQ.
    > +c. If/when virtio switch object is implemented, support ingress/egress flow
    > + filters at the switch port level.




  • 20.  Re: [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-21-2023 05:06
    å 2023/8/18 äå12:35, Parav Pandit åé: Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> --- changelog: v4->v5: - Rewrote cvq and flow filter vq mutual exclusive text - added cvq command to enable flow filters on cvq - made commands more refined for priority, opcode and more - Addressed comments from Heng - restructured interface commands v3->v4: - Addressed comments from Satananda, Heng, David - removed context specific wording, replaced with destination - added group create/delete examples and updated requirements - added optional support to use cvq for flor filter commands - added example of transporting flow filter commands over cvq - made group size to be 16-bit - added concept of 0->n max flow filter entries based on max count - added concept of 0->n max flow group based on max count - split field bitmask to separate command from other filter capabilities - rewrote rx filter processing chain order with respect to existing filter commands and rss - made flow_id flat across all groups v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 163 +++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 72b4132..330949c 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -181,3 +182,165 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) +0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific destination to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. I remember here that the creation phase of flow vq is in the probe phase, and a feature is used to indicate dynamic creation. Or do we want to revert to a workaround similar to vq reset here? Thanks! +5. Flow filter operations are often accelerated by device in a hardware. Ability + to handle them on a queue other than control vq is desired. This achieves near + zero modifications to existing implementations to add new operations on new + purpose built queues (similar to transmit and receive queue). Some devices + may not support flow filter queues and may want to support flow filter operations + over existing cvq, this gives the ability to utilize an existing cvq. + Therefore, + a. Flow filter queues and flow filter commands on cvq are mutually exclusive. + b. When flow filter queues are supported, the driver should use the flow filter + queues flow filter operations. + (Since cvq is not enabled for flow filters, any flow filter command coming + on cvq must fail). + c. If driver wants to use flow filters over cvq, driver must explicitly + enable flow filters on cvq via a command, when it is enabled on the cvq + driver cannot use flow filter queues. This eliminates any synchronization + needed by the device among different types of queues. +6. The filter masks are optional; the device should be able to expose if it + support filter masks. +7. The driver may want to have priority among group of flow entries; to facilitate + the device support grouping flow filter entries by a notion of a flow group. + Each flow group defines priority in processing flow. +8. The driver and group owner driver should be able to query supported device + limits for the receive flow filters. +9. Query the flow filter capabilities of the member device by the owner device + using administrative command. + +### 3.4.2 flow operations path +1. The driver should be able to define a receive packet match criteria, an + action and a destination for a packet. For example, an ipv4 packet with a + multicast address to be steered to the receive vq 0. The second example is + ipv4, tcp packet matching a specified IP address and tcp port tuple to + be steered to receive vq 10. +2. The match criteria should include exact tuple fields well-defined such as mac + address, IP addresses, tcp/udp ports, etc. +3. The match criteria should also optionally include the field mask. +4. Action includes (a) dropping or (b) forwarding the packet. +5. Destination is a receive virtqueue index. +6. Receive packet processing chain is: + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. + b. filters programmed using RFF functiionality. + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command. + Whichever filtering and steering functionality is enabled, they are applied + in the above order. +7. If multiple entries are programmed which has overlapping filtering attributes + for a received packet, the driver to define the location/priority of the entry. +8. The filter entries are usually short in size of few tens of bytes, + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is + high, hence supplying fields inside the queue descriptor is preferred for + up to a certain fixed size, say 96 bytes. +9. A flow filter entry consists of (a) match criteria, (b) action, + (c) destination and (d) a unique 32 bit flow id, all supplied by the + driver. +10. The driver should be able to query and delete flow filter entry + by the flow id. + +### 3.4.3 interface example + +1. Flow filter capabilities to query using a DMA interface such as cvq +using two different commands. + +``` +struct virtio_net_rff_cmd { + u8 class; /* RFF class */ + u8 commands; /* 0 = query cap + * 1 = query packet fields mask + * 2 = enable flow filter operations over cvq + * 3 = add flow group + * 4 = del flow group + * 5 = flow filter op. + */ + u8 command-specific-data[]; +}; + +/* command 1 (query) */ +struct flow_filter_capabilities { + le16 start_vq_index; + le16 num_flow_filter_vqs; + le16 max_flow_groups; /* valid group id = max_flow_groups - 1 */ + le16 max_group_priorities; /* max priorities of the group */ + le32 max_flow_filters_per_group; + le32 max_flow_filters; /* max flow_id in add/del + * is equal = max_flow_filters - 1. + */ + u8 max_priorities_per_group; + u8 cvq_supports_flow_filters_ops; +}; + +/* command 2 (query packet field masks) */ +struct flow_filter_fields_support_mask { + le64 supported_packet_field_mask_bmap[1]; +}; + +``` + +2. Group add/delete cvq commands: + +``` +/* command 3 */ +struct virtio_net_rff_group_add { + le16 priority; /* higher the value, higher priority */ + le16 group_id; +}; + + +/* command 4 */ +struct virtio_net_rff_group_delete { + le16 group_id; + +``` + +3. Flow filter entry add/modify, delete over flow vq: + +``` +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 priority; /* higher the value, higher priority */ + u16 group_id; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +struct virtio_net_rff_delete { + u8 flow_op; + u8 padding[3]; + le32 flow_id; +}; + +``` + +### 3.4.4 For incremental future +a. Driver should be able to specify a specific packet byte offset, number + of bytes and mask as math criteria. +b. Support RSS context, in addition to a specific RQ. +c. If/when virtio switch object is implemented, support ingress/egress flow + filters at the switch port level.


  • 21.  RE: [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-21-2023 05:14


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Monday, August 21, 2023 10:36 AM

    > > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > > + will likely create these queues when flow filters are enabled.
    >
    > I remember here that the creation phase of flow vq is in the probe phase, and a
    > feature is used to indicate dynamic creation.

    > Or do we want to revert to a workaround similar to vq reset here?

    As we discussed during v4, dynamic creation using a new feature bit.
    Say F_VQ_DYNAMIC_CREATE.
    Will introduce it as part of this series.
    Lets not build new features using workarounds.



  • 22.  RE: [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-21-2023 05:14
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Monday, August 21, 2023 10:36 AM > > +4. Since flow filters are enabled much later in the driver life cycle, driver > > + will likely create these queues when flow filters are enabled. > > I remember here that the creation phase of flow vq is in the probe phase, and a > feature is used to indicate dynamic creation. > Or do we want to revert to a workaround similar to vq reset here? As we discussed during v4, dynamic creation using a new feature bit. Say F_VQ_DYNAMIC_CREATE. Will introduce it as part of this series. Lets not build new features using workarounds.


  • 23.  RE: [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-22-2023 07:42
    Hi Michael, Jason,

    > From: Parav Pandit <parav@nvidia.com>
    > Sent: Friday, August 18, 2023 10:06 AM
    >
    > Add virtio net device requirements for receive flow filters.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > Signed-off-by: Satananda Burla <sburla@marvell.com>
    > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
    > ---
    > changelog:
    > v4->v5:
    > - Rewrote cvq and flow filter vq mutual exclusive text
    > - added cvq command to enable flow filters on cvq
    > - made commands more refined for priority, opcode and more
    > - Addressed comments from Heng
    > - restructured interface commands

    [..]

    We will be drafting the spec part for this patch which is far mature than other requirements.
    It has undergone many rounds of reviews and discussions.
    Do you have any more comments?
    We do not want to discuss the requirements again during the spec review.
    So if you have comments, please ask now.


    > ---
    > net-workstream/features-1.4.md | 163
    > +++++++++++++++++++++++++++++++++
    > 1 file changed, 163 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 72b4132..330949c 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    > 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for
    > PCI transport 3. Virtqueue notification coalescing re-arming support
    > +4 Virtqueue receive flow filters (RFF)
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -181,3 +182,165 @@ struct vnet_rx_completion {
    > notifications until the driver rearms the notifications of the virtqueue.
    > 2. When the driver rearms the notification of the virtqueue, the device
    > to notify again if notification coalescing conditions are met.
    > +
    > +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal:
    > + To filter and/or to steer packet based on specific pattern match to a
    > + specific destination to support application/networking stack driven receive
    > + processing.
    > +1. Two use cases are: to support Linux netdev set_rxnfc() for
    > ETHTOOL_SRXCLSRLINS
    > + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    > +
    > +### 3.4.1 control path
    > +1. The number of flow filter operations/sec can range from 100k/sec to
    > 1M/sec
    > + or even more. Hence flow filter operations must be done over a queueing
    > + interface using one or more queues.
    > +2. The device should be able to expose one or more supported flow filter
    > queue
    > + count and its start vq index to the driver.
    > +3. As each device may be operating for different performance characteristic,
    > + start vq index and count may be different for each device. Secondly, it is
    > + inefficient for device to provide flow filters capabilities via a config space
    > + region. Hence, the device should be able to share these attributes using
    > + dma interface, instead of transport registers.
    > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > + will likely create these queues when flow filters are enabled.
    > +5. Flow filter operations are often accelerated by device in a hardware. Ability
    > + to handle them on a queue other than control vq is desired. This achieves
    > near
    > + zero modifications to existing implementations to add new operations on
    > new
    > + purpose built queues (similar to transmit and receive queue). Some devices
    > + may not support flow filter queues and may want to support flow filter
    > operations
    > + over existing cvq, this gives the ability to utilize an existing cvq.
    > + Therefore,
    > + a. Flow filter queues and flow filter commands on cvq are mutually exclusive.
    > + b. When flow filter queues are supported, the driver should use the flow
    > filter
    > + queues flow filter operations.
    > + (Since cvq is not enabled for flow filters, any flow filter command coming
    > + on cvq must fail).
    > + c. If driver wants to use flow filters over cvq, driver must explicitly
    > + enable flow filters on cvq via a command, when it is enabled on the cvq
    > + driver cannot use flow filter queues. This eliminates any synchronization
    > + needed by the device among different types of queues.
    > +6. The filter masks are optional; the device should be able to expose if it
    > + support filter masks.
    > +7. The driver may want to have priority among group of flow entries; to
    > facilitate
    > + the device support grouping flow filter entries by a notion of a flow group.
    > + Each flow group defines priority in processing flow.
    > +8. The driver and group owner driver should be able to query supported
    > device
    > + limits for the receive flow filters.
    > +9. Query the flow filter capabilities of the member device by the owner device
    > + using administrative command.
    > +
    > +### 3.4.2 flow operations path
    > +1. The driver should be able to define a receive packet match criteria, an
    > + action and a destination for a packet. For example, an ipv4 packet with a
    > + multicast address to be steered to the receive vq 0. The second example is
    > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > + be steered to receive vq 10.
    > +2. The match criteria should include exact tuple fields well-defined such as
    > mac
    > + address, IP addresses, tcp/udp ports, etc.
    > +3. The match criteria should also optionally include the field mask.
    > +4. Action includes (a) dropping or (b) forwarding the packet.
    > +5. Destination is a receive virtqueue index.
    > +6. Receive packet processing chain is:
    > + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX,
    > + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN.
    > + b. filters programmed using RFF functiionality.
    > + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG
    > command.
    > + Whichever filtering and steering functionality is enabled, they are applied
    > + in the above order.
    > +7. If multiple entries are programmed which has overlapping filtering attributes
    > + for a received packet, the driver to define the location/priority of the entry.
    > +8. The filter entries are usually short in size of few tens of bytes,
    > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > + high, hence supplying fields inside the queue descriptor is preferred for
    > + up to a certain fixed size, say 96 bytes.
    > +9. A flow filter entry consists of (a) match criteria, (b) action,
    > + (c) destination and (d) a unique 32 bit flow id, all supplied by the
    > + driver.
    > +10. The driver should be able to query and delete flow filter entry
    > + by the flow id.
    > +
    > +### 3.4.3 interface example
    > +
    > +1. Flow filter capabilities to query using a DMA interface such as cvq
    > +using two different commands.
    > +
    > +```
    > +struct virtio_net_rff_cmd {
    > + u8 class; /* RFF class */
    > + u8 commands; /* 0 = query cap
    > + * 1 = query packet fields mask
    > + * 2 = enable flow filter operations over cvq
    > + * 3 = add flow group
    > + * 4 = del flow group
    > + * 5 = flow filter op.
    > + */
    > + u8 command-specific-data[];
    > +};
    > +
    > +/* command 1 (query) */
    > +struct flow_filter_capabilities {
    > + le16 start_vq_index;
    > + le16 num_flow_filter_vqs;
    > + le16 max_flow_groups; /* valid group id = max_flow_groups - 1 */
    > + le16 max_group_priorities; /* max priorities of the group */
    > + le32 max_flow_filters_per_group;
    > + le32 max_flow_filters; /* max flow_id in add/del
    > + * is equal = max_flow_filters - 1.
    > + */
    > + u8 max_priorities_per_group;
    > + u8 cvq_supports_flow_filters_ops;
    > +};
    > +
    > +/* command 2 (query packet field masks) */ struct
    > +flow_filter_fields_support_mask {
    > + le64 supported_packet_field_mask_bmap[1];
    > +};
    > +
    > +```
    > +
    > +2. Group add/delete cvq commands:
    > +
    > +```
    > +/* command 3 */
    > +struct virtio_net_rff_group_add {
    > + le16 priority; /* higher the value, higher priority */
    > + le16 group_id;
    > +};
    > +
    > +
    > +/* command 4 */
    > +struct virtio_net_rff_group_delete {
    > + le16 group_id;
    > +
    > +```
    > +
    > +3. Flow filter entry add/modify, delete over flow vq:
    > +
    > +```
    > +struct virtio_net_rff_add_modify {
    > + u8 flow_op;
    > + u8 priority; /* higher the value, higher priority */
    > + u16 group_id;
    > + le32 flow_id;
    > + struct match_criteria mc;
    > + struct destination dest;
    > + struct action action;
    > +
    > + struct match_criteria mask; /* optional */
    > +};
    > +
    > +struct virtio_net_rff_delete {
    > + u8 flow_op;
    > + u8 padding[3];
    > + le32 flow_id;
    > +};
    > +
    > +```
    > +
    > +### 3.4.4 For incremental future
    > +a. Driver should be able to specify a specific packet byte offset, number
    > + of bytes and mask as math criteria.
    > +b. Support RSS context, in addition to a specific RQ.
    > +c. If/when virtio switch object is implemented, support ingress/egress flow
    > + filters at the switch port level.
    > --
    > 2.26.2




  • 24.  RE: [PATCH requirements v5 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-22-2023 07:42
    Hi Michael, Jason, > From: Parav Pandit <parav@nvidia.com> > Sent: Friday, August 18, 2023 10:06 AM > > Add virtio net device requirements for receive flow filters. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > Signed-off-by: Satananda Burla <sburla@marvell.com> > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > --- > changelog: > v4->v5: > - Rewrote cvq and flow filter vq mutual exclusive text > - added cvq command to enable flow filters on cvq > - made commands more refined for priority, opcode and more > - Addressed comments from Heng > - restructured interface commands [..] We will be drafting the spec part for this patch which is far mature than other requirements. It has undergone many rounds of reviews and discussions. Do you have any more comments? We do not want to discuss the requirements again during the spec review. So if you have comments, please ask now. > --- > net-workstream/features-1.4.md 163 > +++++++++++++++++++++++++++++++++ > 1 file changed, 163 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 72b4132..330949c 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. > 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for > PCI transport 3. Virtqueue notification coalescing re-arming support > +4 Virtqueue receive flow filters (RFF) > > # 3. Requirements > ## 3.1 Device counters > @@ -181,3 +182,165 @@ struct vnet_rx_completion { > notifications until the driver rearms the notifications of the virtqueue. > 2. When the driver rearms the notification of the virtqueue, the device > to notify again if notification coalescing conditions are met. > + > +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal: > + To filter and/or to steer packet based on specific pattern match to a > + specific destination to support application/networking stack driven receive > + processing. > +1. Two use cases are: to support Linux netdev set_rxnfc() for > ETHTOOL_SRXCLSRLINS > + and to support netdev feature NETIF_F_NTUPLE aka ARFS. > + > +### 3.4.1 control path > +1. The number of flow filter operations/sec can range from 100k/sec to > 1M/sec > + or even more. Hence flow filter operations must be done over a queueing > + interface using one or more queues. > +2. The device should be able to expose one or more supported flow filter > queue > + count and its start vq index to the driver. > +3. As each device may be operating for different performance characteristic, > + start vq index and count may be different for each device. Secondly, it is > + inefficient for device to provide flow filters capabilities via a config space > + region. Hence, the device should be able to share these attributes using > + dma interface, instead of transport registers. > +4. Since flow filters are enabled much later in the driver life cycle, driver > + will likely create these queues when flow filters are enabled. > +5. Flow filter operations are often accelerated by device in a hardware. Ability > + to handle them on a queue other than control vq is desired. This achieves > near > + zero modifications to existing implementations to add new operations on > new > + purpose built queues (similar to transmit and receive queue). Some devices > + may not support flow filter queues and may want to support flow filter > operations > + over existing cvq, this gives the ability to utilize an existing cvq. > + Therefore, > + a. Flow filter queues and flow filter commands on cvq are mutually exclusive. > + b. When flow filter queues are supported, the driver should use the flow > filter > + queues flow filter operations. > + (Since cvq is not enabled for flow filters, any flow filter command coming > + on cvq must fail). > + c. If driver wants to use flow filters over cvq, driver must explicitly > + enable flow filters on cvq via a command, when it is enabled on the cvq > + driver cannot use flow filter queues. This eliminates any synchronization > + needed by the device among different types of queues. > +6. The filter masks are optional; the device should be able to expose if it > + support filter masks. > +7. The driver may want to have priority among group of flow entries; to > facilitate > + the device support grouping flow filter entries by a notion of a flow group. > + Each flow group defines priority in processing flow. > +8. The driver and group owner driver should be able to query supported > device > + limits for the receive flow filters. > +9. Query the flow filter capabilities of the member device by the owner device > + using administrative command. > + > +### 3.4.2 flow operations path > +1. The driver should be able to define a receive packet match criteria, an > + action and a destination for a packet. For example, an ipv4 packet with a > + multicast address to be steered to the receive vq 0. The second example is > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > + be steered to receive vq 10. > +2. The match criteria should include exact tuple fields well-defined such as > mac > + address, IP addresses, tcp/udp ports, etc. > +3. The match criteria should also optionally include the field mask. > +4. Action includes (a) dropping or (b) forwarding the packet. > +5. Destination is a receive virtqueue index. > +6. Receive packet processing chain is: > + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, > + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. > + b. filters programmed using RFF functiionality. > + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG > command. > + Whichever filtering and steering functionality is enabled, they are applied > + in the above order. > +7. If multiple entries are programmed which has overlapping filtering attributes > + for a received packet, the driver to define the location/priority of the entry. > +8. The filter entries are usually short in size of few tens of bytes, > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is > + high, hence supplying fields inside the queue descriptor is preferred for > + up to a certain fixed size, say 96 bytes. > +9. A flow filter entry consists of (a) match criteria, (b) action, > + (c) destination and (d) a unique 32 bit flow id, all supplied by the > + driver. > +10. The driver should be able to query and delete flow filter entry > + by the flow id. > + > +### 3.4.3 interface example > + > +1. Flow filter capabilities to query using a DMA interface such as cvq > +using two different commands. > + > +``` > +struct virtio_net_rff_cmd { > + u8 class; /* RFF class */ > + u8 commands; /* 0 = query cap > + * 1 = query packet fields mask > + * 2 = enable flow filter operations over cvq > + * 3 = add flow group > + * 4 = del flow group > + * 5 = flow filter op. > + */ > + u8 command-specific-data[]; > +}; > + > +/* command 1 (query) */ > +struct flow_filter_capabilities { > + le16 start_vq_index; > + le16 num_flow_filter_vqs; > + le16 max_flow_groups; /* valid group id = max_flow_groups - 1 */ > + le16 max_group_priorities; /* max priorities of the group */ > + le32 max_flow_filters_per_group; > + le32 max_flow_filters; /* max flow_id in add/del > + * is equal = max_flow_filters - 1. > + */ > + u8 max_priorities_per_group; > + u8 cvq_supports_flow_filters_ops; > +}; > + > +/* command 2 (query packet field masks) */ struct > +flow_filter_fields_support_mask { > + le64 supported_packet_field_mask_bmap[1]; > +}; > + > +``` > + > +2. Group add/delete cvq commands: > + > +``` > +/* command 3 */ > +struct virtio_net_rff_group_add { > + le16 priority; /* higher the value, higher priority */ > + le16 group_id; > +}; > + > + > +/* command 4 */ > +struct virtio_net_rff_group_delete { > + le16 group_id; > + > +``` > + > +3. Flow filter entry add/modify, delete over flow vq: > + > +``` > +struct virtio_net_rff_add_modify { > + u8 flow_op; > + u8 priority; /* higher the value, higher priority */ > + u16 group_id; > + le32 flow_id; > + struct match_criteria mc; > + struct destination dest; > + struct action action; > + > + struct match_criteria mask; /* optional */ > +}; > + > +struct virtio_net_rff_delete { > + u8 flow_op; > + u8 padding[3]; > + le32 flow_id; > +}; > + > +``` > + > +### 3.4.4 For incremental future > +a. Driver should be able to specify a specific packet byte offset, number > + of bytes and mask as math criteria. > +b. Support RSS context, in addition to a specific RQ. > +c. If/when virtio switch object is implemented, support ingress/egress flow > + filters at the switch port level. > -- > 2.26.2


  • 25.  [PATCH requirements v5 6/7] net-features: Add packet timestamp requirements

    Posted 08-18-2023 04:37
    Add tx and rx packet timestamp requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v4->v5: - relaxed mmio requirement on feedback from Wiliem v3->v4: - no change --- net-workstream/features-1.4.md 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 330949c..31aa587 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface. 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support 4 Virtqueue receive flow filters (RFF) +5. Device timestamp for tx and rx packets # 3. Requirements ## 3.1 Device counters @@ -344,3 +345,26 @@ a. Driver should be able to specify a specific packet byte offset, number b. Support RSS context, in addition to a specific RQ. c. If/when virtio switch object is implemented, support ingress/egress flow filters at the switch port level. + +## 3.5 Packet timestamp +1. Device should provide transmit timestamp and receive timestamp of the packets + at per packet level when the timestamping is enabled in the device. +2. Device should provide the current frequency and the frequency unit for the + software to synchronize the reference point of software and the device using + a control vq command. + +### 3.5.1 Transmit timestamp +1. Transmit completion must contain a packet transmission timestamp when the + device is enabled for it. +2. The device should record the packet transmit timestamp in the completion at + the farthest egress point towards the network. +3. The device must provide a transmit packet timestamp in a single DMA + transaction along with the rest of the transmit completion fields. + +### 3.5.2 Receive timestamp +1. Receive completion must contain a packet reception timestamp when the device + is enabled for it. +2. The device should record the received packet timestamp at the closet ingress + point of reception from the network. +3. The device should provide a receive packet timestamp in a single DMA + transaction along with the rest of the receive completion fields. -- 2.26.2