OASIS Virtual I/O Device (VIRTIO) TC

 View Only
Expand all | Collapse all

[PATCH requirements v4 0/7] virtio net requirements for 1.4

  • 1.  [PATCH requirements v4 0/7] virtio net requirements for 1.4

    Posted 08-15-2023 07:47
    Hi All, This document captures the virtio net device requirements for the upcoming release 1.4 that some of us are currently working on. This is live document to be updated in coming time and work towards it for its design which can result in a draft specification. The objectives are: 1. to consider these requirements in introducing new features listed in the document and otherwise and work towards the interface design followed by drafting the specification changes. 2. Define practical list of requirements that can be achieved in 1.4 timeframe incrementally and also have the ability to implement them. Please review mainly patch 5 at the priority. Receive flow filters is the first item apart from counters to complete in this iteration to start drafting the design spec. Rest of the requirements are largly untouched other than Stefan's comment. TODO: 1. Some more refinement needed for rx low latency and header data split requirements. --- note: This week update is late as I was unwell for several days, but managed to cover all comments from Heng, David, Satananda. We must close flow filters requirements this week to take to spec drafting design phase. --- changelog: v3->v4: - receive flow filters requirements undergo major updates to take to spec draft level. - Addressed comments from Xuan, Heng, David, Satananda. - Refined wordings in rest of the requirements v2->v3: - addressed comments from Stefan for tx low latency and notification - redrafted the requirements to use rearm term and avoid queue enable confusion for notification - addressed all comments and refined receive flow filters requirements to take to design level v1->v2: - major update of receive flow filter requirements updated based on last two design discussions in community and offline research - examples added - link to use case and design goal added - control and operation side requirements split - more verbose v0->v1: - addressed comments from Heng Li - addressed few (not all) comments from Michael - per patch changelog Parav Pandit (7): net-features: Add requirements document for release 1.4 net-features: Add low latency transmit queue requirements net-features: Add low latency receive queue requirements net-features: Add notification coalescing requirements net-features: Add n-tuple receive flow filters requirements net-features: Add packet timestamp requirements net-features: Add header data split requirements net-workstream/features-1.4.md 375 +++++++++++++++++++++++++++++++++ 1 file changed, 375 insertions(+) create mode 100644 net-workstream/features-1.4.md -- 2.26.2


  • 2.  [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-15-2023 07:47
    Add virtio net device notification coalescing improvements requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v3->v4: - no change v1->v2: - addressed comments from Stefan - redrafted the requirements to use rearm term and avoid queue enable confusion v0->v1: - updated the description --- net-workstream/features-1.4.md 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport +3. Virtqueue notification coalescing re-arming support # 3. Requirements ## 3.1 Device counters @@ -172,3 +173,13 @@ struct vnet_rx_completion { which can be recycled by the driver when the packets from the completed page is fully consumed. 8. The device should be able to consume multiple pages for a receive GSO stream. + +## 3.3 Virtqueue notification coalescing re-arming support +0. Design goal: + a. Avoid constant notifications from the device even in conditions when + the driver may not have acted on the previous pending notification. +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such + a notification is reported by the device, the device stops sending further + notifications until the driver rearms the notifications of the virtqueue. +2. When the driver rearms the notification of the virtqueue, the device + to notify again if notification coalescing conditions are met. -- 2.26.2


  • 3.  Re: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-16-2023 08:31


    ? 2023/8/15 ??3:45, Parav Pandit ??:
    > Add virtio net device notification coalescing improvements requirements.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > Acked-by: David Edmondson <david.edmondson@oracle.com>
    >
    > ---
    > changelog:
    > v3->v4:
    > - no change
    >
    > v1->v2:
    > - addressed comments from Stefan
    > - redrafted the requirements to use rearm term and avoid queue enable
    > confusion
    > v0->v1:
    > - updated the description
    > ---
    > net-workstream/features-1.4.md | 11 +++++++++++
    > 1 file changed, 11 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 72d04bd..cb72442 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface.
    > # 2. Summary
    > 1. Device counters visible to the driver
    > 2. Low latency tx and rx virtqueues for PCI transport
    > +3. Virtqueue notification coalescing re-arming support
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -172,3 +173,13 @@ struct vnet_rx_completion {
    > which can be recycled by the driver when the packets from the completed
    > page is fully consumed.
    > 8. The device should be able to consume multiple pages for a receive GSO stream.
    > +
    > +## 3.3 Virtqueue notification coalescing re-arming support
    > +0. Design goal:
    > + a. Avoid constant notifications from the device even in conditions when
    > + the driver may not have acted on the previous pending notification.
    > +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such
    > + a notification is reported by the device, the device stops sending further
    > + notifications until the driver rearms the notifications of the virtqueue.
    > +2. When the driver rearms the notification of the virtqueue, the device
    > + to notify again if notification coalescing conditions are met.

    I'm wondering how this relates to the existing notification coalesing[1]
    and notification suppression[2]:

    [1]
    The device sends a used buffer notification once the notification
    conditions are met and if the notifications
    are not suppressed as explained in \ref{sec:Basic Facilities of a Virtio
    Device / Virtqueues / Used Buffer Notification Supppression}.

    [2]
    If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
    \begin{itemize}
    \item The driver MUST ignore the \field{avail_event} value.
    \item After the driver writes a descriptor index into the available ring:
       \begin{itemize}
             \item If \field{flags} is 1, the driver SHOULD NOT send a
    notification.
             \item If \field{flags} is 0, the driver MUST send a notification.
       \end{itemize}
    \end{itemize}

    Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
    \begin{itemize}
    \item The driver MUST ignore the lower bit of \field{flags}.
    \item After the driver writes a descriptor index into the available ring:
       \begin{itemize}
             \item If the \field{idx} field in the available ring (which
    determined
               where that descriptor index was placed) was equal to
               \field{avail_event}, the driver MUST send a notification.
             \item Otherwise the driver SHOULD NOT send a notification.
       \end{itemize}
    \end{itemize}

    Regarding notification suppression:
    1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification
    coalesing condition is met, we need to wait for
    the used_event notification condition to be met(the driver does not
    rearms the notification of the virtqueue now and the avail ring  is set
    VRING_AVAIL_F_NO_INTERRUPT in flag).
    2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the
    notification, even if the notidication condition is met, the device
    cannot send the notification.

    Therefore, if I'm not wrong, a device can issue a notification only if
    the device is not suppressed from notifying the driver.
    [1][2] seems to have met this condition.

    Thanks!





  • 4.  Re: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-16-2023 08:31
    å 2023/8/15 äå3:45, Parav Pandit åé: Add virtio net device notification coalescing improvements requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v3->v4: - no change v1->v2: - addressed comments from Stefan - redrafted the requirements to use rearm term and avoid queue enable confusion v0->v1: - updated the description --- net-workstream/features-1.4.md 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport +3. Virtqueue notification coalescing re-arming support # 3. Requirements ## 3.1 Device counters @@ -172,3 +173,13 @@ struct vnet_rx_completion { which can be recycled by the driver when the packets from the completed page is fully consumed. 8. The device should be able to consume multiple pages for a receive GSO stream. + +## 3.3 Virtqueue notification coalescing re-arming support +0. Design goal: + a. Avoid constant notifications from the device even in conditions when + the driver may not have acted on the previous pending notification. +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such + a notification is reported by the device, the device stops sending further + notifications until the driver rearms the notifications of the virtqueue. +2. When the driver rearms the notification of the virtqueue, the device + to notify again if notification coalescing conditions are met. I'm wondering how this relates to the existing notification coalesing[1] and notification suppression[2]: [1] The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Supppression}. [2] If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: egin{itemize} item The driver MUST ignore the field{avail_event} value. item After the driver writes a descriptor index into the available ring:  egin{itemize}  item If field{flags} is 1, the driver SHOULD NOT send a notification.  item If field{flags} is 0, the driver MUST send a notification.  end{itemize} end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: egin{itemize} item The driver MUST ignore the lower bit of field{flags}. item After the driver writes a descriptor index into the available ring:  egin{itemize}  item If the field{idx} field in the available ring (which determined  where that descriptor index was placed) was equal to  field{avail_event}, the driver MUST send a notification.  item Otherwise the driver SHOULD NOT send a notification.  end{itemize} end{itemize} Regarding notification suppression: 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification coalesing condition is met, we need to wait for the used_event notification condition to be met(the driver does not rearms the notification of the virtqueue now and the avail ring is set VRING_AVAIL_F_NO_INTERRUPT in flag). 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the notification, even if the notidication condition is met, the device cannot send the notification. Therefore, if I'm not wrong, a device can issue a notification only if the device is not suppressed from notifying the driver. [1][2] seems to have met this condition. Thanks!


  • 5.  RE: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-16-2023 10:46


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Wednesday, August 16, 2023 2:01 PM
    >
    > ? 2023/8/15 ??3:45, Parav Pandit ??:
    > > Add virtio net device notification coalescing improvements requirements.
    > >
    > > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > > Acked-by: David Edmondson <david.edmondson@oracle.com>
    > >
    > > ---
    > > changelog:
    > > v3->v4:
    > > - no change
    > >
    > > v1->v2:
    > > - addressed comments from Stefan
    > > - redrafted the requirements to use rearm term and avoid queue enable
    > > confusion
    > > v0->v1:
    > > - updated the description
    > > ---
    > > net-workstream/features-1.4.md | 11 +++++++++++
    > > 1 file changed, 11 insertions(+)
    > >
    > > diff --git a/net-workstream/features-1.4.md
    > > b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644
    > > --- a/net-workstream/features-1.4.md
    > > +++ b/net-workstream/features-1.4.md
    > > @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface.
    > > # 2. Summary
    > > 1. Device counters visible to the driver
    > > 2. Low latency tx and rx virtqueues for PCI transport
    > > +3. Virtqueue notification coalescing re-arming support
    > >
    > > # 3. Requirements
    > > ## 3.1 Device counters
    > > @@ -172,3 +173,13 @@ struct vnet_rx_completion {
    > > which can be recycled by the driver when the packets from the completed
    > > page is fully consumed.
    > > 8. The device should be able to consume multiple pages for a receive GSO
    > stream.
    > > +
    > > +## 3.3 Virtqueue notification coalescing re-arming support 0. Design
    > > +goal:
    > > + a. Avoid constant notifications from the device even in conditions when
    > > + the driver may not have acted on the previous pending notification.
    > > +1. When Tx and Rx virtqueue notification coalescing is enabled, and when
    > such
    > > + a notification is reported by the device, the device stops sending further
    > > + notifications until the driver rearms the notifications of the virtqueue.
    > > +2. When the driver rearms the notification of the virtqueue, the device
    > > + to notify again if notification coalescing conditions are met.
    >
    > I'm wondering how this relates to the existing notification coalesing[1] and
    > notification suppression[2]:
    >
    > [1]
    > The device sends a used buffer notification once the notification conditions are
    > met and if the notifications are not suppressed as explained in \ref{sec:Basic
    > Facilities of a Virtio Device / Virtqueues / Used Buffer Notification
    > Supppression}.
    >
    > [2]
    > If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
    > \begin{itemize}
    > \item The driver MUST ignore the \field{avail_event} value.
    > \item After the driver writes a descriptor index into the available ring:
    >    \begin{itemize}
    >          \item If \field{flags} is 1, the driver SHOULD NOT send a notification.
    >          \item If \field{flags} is 0, the driver MUST send a notification.
    >    \end{itemize}
    > \end{itemize}
    >
    > Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
    > \begin{itemize}
    > \item The driver MUST ignore the lower bit of \field{flags}.
    > \item After the driver writes a descriptor index into the available ring:
    >    \begin{itemize}
    >          \item If the \field{idx} field in the available ring (which determined
    >            where that descriptor index was placed) was equal to
    >            \field{avail_event}, the driver MUST send a notification.
    >          \item Otherwise the driver SHOULD NOT send a notification.
    >    \end{itemize}
    > \end{itemize}
    >
    > Regarding notification suppression:
    > 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification coalesing
    > condition is met, we need to wait for the used_event notification condition to
    > be met(the driver does not rearms the notification of the virtqueue now and
    > the avail ring  is set VRING_AVAIL_F_NO_INTERRUPT in flag).
    > 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the
    > notification, even if the notidication condition is met, the device cannot send
    > the notification.
    >
    > Therefore, if I'm not wrong, a device can issue a notification only if the device is
    > not suppressed from notifying the driver.
    > [1][2] seems to have met this condition.

    Notification suppression using _EVENT_IDX for non-memory transport is just sub-optimal for two reasons.

    1. It requires device to poll on the used event to learn about when to un-suppress. (arm)
    2. this bit also controls driver notifications yet again demand device to arbitrarily poll on new descriptors posting

    Hence, an efficient scheme is needed and device notifications to be detached from driver notification.
    And now that VQ level notification coalescing is in place, which suppresses the device notifications, it is logical to combine it with VQ device notifications.





  • 6.  RE: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-16-2023 10:46
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Wednesday, August 16, 2023 2:01 PM > > å 2023/8/15 äå3:45, Parav Pandit åé: > > Add virtio net device notification coalescing improvements requirements. > > > > Signed-off-by: Parav Pandit <parav@nvidia.com> > > Acked-by: David Edmondson <david.edmondson@oracle.com> > > > > --- > > changelog: > > v3->v4: > > - no change > > > > v1->v2: > > - addressed comments from Stefan > > - redrafted the requirements to use rearm term and avoid queue enable > > confusion > > v0->v1: > > - updated the description > > --- > > net-workstream/features-1.4.md 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/net-workstream/features-1.4.md > > b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644 > > --- a/net-workstream/features-1.4.md > > +++ b/net-workstream/features-1.4.md > > @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. > > # 2. Summary > > 1. Device counters visible to the driver > > 2. Low latency tx and rx virtqueues for PCI transport > > +3. Virtqueue notification coalescing re-arming support > > > > # 3. Requirements > > ## 3.1 Device counters > > @@ -172,3 +173,13 @@ struct vnet_rx_completion { > > which can be recycled by the driver when the packets from the completed > > page is fully consumed. > > 8. The device should be able to consume multiple pages for a receive GSO > stream. > > + > > +## 3.3 Virtqueue notification coalescing re-arming support 0. Design > > +goal: > > + a. Avoid constant notifications from the device even in conditions when > > + the driver may not have acted on the previous pending notification. > > +1. When Tx and Rx virtqueue notification coalescing is enabled, and when > such > > + a notification is reported by the device, the device stops sending further > > + notifications until the driver rearms the notifications of the virtqueue. > > +2. When the driver rearms the notification of the virtqueue, the device > > + to notify again if notification coalescing conditions are met. > > I'm wondering how this relates to the existing notification coalesing[1] and > notification suppression[2]: > > [1] > The device sends a used buffer notification once the notification conditions are > met and if the notifications are not suppressed as explained in
    ef{sec:Basic > Facilities of a Virtio Device / Virtqueues / Used Buffer Notification > Supppression}. > > [2] > If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: > egin{itemize} > item The driver MUST ignore the field{avail_event} value. > item After the driver writes a descriptor index into the available ring: >  egin{itemize} >  item If field{flags} is 1, the driver SHOULD NOT send a notification. >  item If field{flags} is 0, the driver MUST send a notification. >  end{itemize} > end{itemize} > > Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: > egin{itemize} > item The driver MUST ignore the lower bit of field{flags}. > item After the driver writes a descriptor index into the available ring: >  egin{itemize} >  item If the field{idx} field in the available ring (which determined >  where that descriptor index was placed) was equal to >  field{avail_event}, the driver MUST send a notification. >  item Otherwise the driver SHOULD NOT send a notification. >  end{itemize} > end{itemize} > > Regarding notification suppression: > 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification coalesing > condition is met, we need to wait for the used_event notification condition to > be met(the driver does not rearms the notification of the virtqueue now and > the avail ring is set VRING_AVAIL_F_NO_INTERRUPT in flag). > 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the > notification, even if the notidication condition is met, the device cannot send > the notification. > > Therefore, if I'm not wrong, a device can issue a notification only if the device is > not suppressed from notifying the driver. > [1][2] seems to have met this condition. Notification suppression using _EVENT_IDX for non-memory transport is just sub-optimal for two reasons. 1. It requires device to poll on the used event to learn about when to un-suppress. (arm) 2. this bit also controls driver notifications yet again demand device to arbitrarily poll on new descriptors posting Hence, an efficient scheme is needed and device notifications to be detached from driver notification. And now that VQ level notification coalescing is in place, which suppresses the device notifications, it is logical to combine it with VQ device notifications.


  • 7.  Re: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-16-2023 12:37


    ? 2023/8/16 ??6:46, Parav Pandit ??:
    >
    >> From: Heng Qi <hengqi@linux.alibaba.com>
    >> Sent: Wednesday, August 16, 2023 2:01 PM
    >>
    >> ? 2023/8/15 ??3:45, Parav Pandit ??:
    >>> Add virtio net device notification coalescing improvements requirements.
    >>>
    >>> Signed-off-by: Parav Pandit <parav@nvidia.com>
    >>> Acked-by: David Edmondson <david.edmondson@oracle.com>
    >>>
    >>> ---
    >>> changelog:
    >>> v3->v4:
    >>> - no change
    >>>
    >>> v1->v2:
    >>> - addressed comments from Stefan
    >>> - redrafted the requirements to use rearm term and avoid queue enable
    >>> confusion
    >>> v0->v1:
    >>> - updated the description
    >>> ---
    >>> net-workstream/features-1.4.md | 11 +++++++++++
    >>> 1 file changed, 11 insertions(+)
    >>>
    >>> diff --git a/net-workstream/features-1.4.md
    >>> b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644
    >>> --- a/net-workstream/features-1.4.md
    >>> +++ b/net-workstream/features-1.4.md
    >>> @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface.
    >>> # 2. Summary
    >>> 1. Device counters visible to the driver
    >>> 2. Low latency tx and rx virtqueues for PCI transport
    >>> +3. Virtqueue notification coalescing re-arming support
    >>>
    >>> # 3. Requirements
    >>> ## 3.1 Device counters
    >>> @@ -172,3 +173,13 @@ struct vnet_rx_completion {
    >>> which can be recycled by the driver when the packets from the completed
    >>> page is fully consumed.
    >>> 8. The device should be able to consume multiple pages for a receive GSO
    >> stream.
    >>> +
    >>> +## 3.3 Virtqueue notification coalescing re-arming support 0. Design
    >>> +goal:
    >>> + a. Avoid constant notifications from the device even in conditions when
    >>> + the driver may not have acted on the previous pending notification.
    >>> +1. When Tx and Rx virtqueue notification coalescing is enabled, and when
    >> such
    >>> + a notification is reported by the device, the device stops sending further
    >>> + notifications until the driver rearms the notifications of the virtqueue.
    >>> +2. When the driver rearms the notification of the virtqueue, the device
    >>> + to notify again if notification coalescing conditions are met.
    >> I'm wondering how this relates to the existing notification coalesing[1] and
    >> notification suppression[2]:
    >>
    >> [1]
    >> The device sends a used buffer notification once the notification conditions are
    >> met and if the notifications are not suppressed as explained in \ref{sec:Basic
    >> Facilities of a Virtio Device / Virtqueues / Used Buffer Notification
    >> Supppression}.
    >>
    >> [2]
    >> If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
    >> \begin{itemize}
    >> \item The driver MUST ignore the \field{avail_event} value.
    >> \item After the driver writes a descriptor index into the available ring:
    >>    \begin{itemize}
    >>          \item If \field{flags} is 1, the driver SHOULD NOT send a notification.
    >>          \item If \field{flags} is 0, the driver MUST send a notification.
    >>    \end{itemize}
    >> \end{itemize}
    >>
    >> Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
    >> \begin{itemize}
    >> \item The driver MUST ignore the lower bit of \field{flags}.
    >> \item After the driver writes a descriptor index into the available ring:
    >>    \begin{itemize}
    >>          \item If the \field{idx} field in the available ring (which determined
    >>            where that descriptor index was placed) was equal to
    >>            \field{avail_event}, the driver MUST send a notification.
    >>          \item Otherwise the driver SHOULD NOT send a notification.
    >>    \end{itemize}
    >> \end{itemize}
    >>
    >> Regarding notification suppression:
    >> 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification coalesing
    >> condition is met, we need to wait for the used_event notification condition to
    >> be met(the driver does not rearms the notification of the virtqueue now and
    >> the avail ring  is set VRING_AVAIL_F_NO_INTERRUPT in flag).
    >> 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the
    >> notification, even if the notidication condition is met, the device cannot send
    >> the notification.
    >>
    >> Therefore, if I'm not wrong, a device can issue a notification only if the device is
    >> not suppressed from notifying the driver.
    >> [1][2] seems to have met this condition.
    > Notification suppression using _EVENT_IDX for non-memory transport is just sub-optimal for two reasons.
    >
    > 1. It requires device to poll on the used event to learn about when to un-suppress. (arm)
    > 2. this bit also controls driver notifications yet again demand device to arbitrarily poll on new descriptors posting
    >
    > Hence, an efficient scheme is needed and device notifications to be detached from driver notification.
    > And now that VQ level notification coalescing is in place, which suppresses the device notifications, it is logical to combine it with VQ device notifications.
    >

    Let me summarize:
    1. When used idx notification is satisfied, but coalescing is not
    satisfied, the driver continues to suppress device notifications.
    2. When used idx notification is not satisfied, even if coalescing is
    satisfied, the device still cannot notify the driver.
    I think that's what coalescing does, and the description below has
    satisfied this behavior:
    "The device sends a used buffer notification once the notification
    conditions are met and if the notifications are
    not suppressed as explained in \ref{sec:Basic Facilities of a Virtio
    Device / Virtqueues / Used Buffer Notification Supppression}."

    Or we want to say that it has nothing to do with the used idx
    notification. When the coalescing is satisfied and the driver
    rearms the notification of the virtqueue, the device now send a
    notification.

    Thanks!




  • 8.  Re: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-16-2023 12:37
    å 2023/8/16 äå6:46, Parav Pandit åé: From: Heng Qi <hengqi@linux.alibaba.com> Sent: Wednesday, August 16, 2023 2:01 PM å 2023/8/15 äå3:45, Parav Pandit åé: Add virtio net device notification coalescing improvements requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v3->v4: - no change v1->v2: - addressed comments from Stefan - redrafted the requirements to use rearm term and avoid queue enable confusion v0->v1: - updated the description --- net-workstream/features-1.4.md 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport +3. Virtqueue notification coalescing re-arming support # 3. Requirements ## 3.1 Device counters @@ -172,3 +173,13 @@ struct vnet_rx_completion { which can be recycled by the driver when the packets from the completed page is fully consumed. 8. The device should be able to consume multiple pages for a receive GSO stream. + +## 3.3 Virtqueue notification coalescing re-arming support 0. Design +goal: + a. Avoid constant notifications from the device even in conditions when + the driver may not have acted on the previous pending notification. +1. When Tx and Rx virtqueue notification coalescing is enabled, and when such + a notification is reported by the device, the device stops sending further + notifications until the driver rearms the notifications of the virtqueue. +2. When the driver rearms the notification of the virtqueue, the device + to notify again if notification coalescing conditions are met. I'm wondering how this relates to the existing notification coalesing[1] and notification suppression[2]: [1] The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Supppression}. [2] If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: egin{itemize} item The driver MUST ignore the field{avail_event} value. item After the driver writes a descriptor index into the available ring:  egin{itemize}  item If field{flags} is 1, the driver SHOULD NOT send a notification.  item If field{flags} is 0, the driver MUST send a notification.  end{itemize} end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: egin{itemize} item The driver MUST ignore the lower bit of field{flags}. item After the driver writes a descriptor index into the available ring:  egin{itemize}  item If the field{idx} field in the available ring (which determined  where that descriptor index was placed) was equal to  field{avail_event}, the driver MUST send a notification.  item Otherwise the driver SHOULD NOT send a notification.  end{itemize} end{itemize} Regarding notification suppression: 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification coalesing condition is met, we need to wait for the used_event notification condition to be met(the driver does not rearms the notification of the virtqueue now and the avail ring is set VRING_AVAIL_F_NO_INTERRUPT in flag). 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the notification, even if the notidication condition is met, the device cannot send the notification. Therefore, if I'm not wrong, a device can issue a notification only if the device is not suppressed from notifying the driver. [1][2] seems to have met this condition. Notification suppression using _EVENT_IDX for non-memory transport is just sub-optimal for two reasons. 1. It requires device to poll on the used event to learn about when to un-suppress. (arm) 2. this bit also controls driver notifications yet again demand device to arbitrarily poll on new descriptors posting Hence, an efficient scheme is needed and device notifications to be detached from driver notification. And now that VQ level notification coalescing is in place, which suppresses the device notifications, it is logical to combine it with VQ device notifications. Let me summarize: 1. When used idx notification is satisfied, but coalescing is not satisfied, the driver continues to suppress device notifications. 2. When used idx notification is not satisfied, even if coalescing is satisfied, the device still cannot notify the driver. I think that's what coalescing does, and the description below has satisfied this behavior: "The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Supppression}." Or we want to say that it has nothing to do with the used idx notification. When the coalescing is satisfied and the driver rearms the notification of the virtqueue, the device now send a notification. Thanks!


  • 9.  RE: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-17-2023 04:57


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Wednesday, August 16, 2023 6:07 PM
    >
    >
    > ? 2023/8/16 ??6:46, Parav Pandit ??:
    > >
    > >> From: Heng Qi <hengqi@linux.alibaba.com>
    > >> Sent: Wednesday, August 16, 2023 2:01 PM
    > >>
    > >> ? 2023/8/15 ??3:45, Parav Pandit ??:
    > >>> Add virtio net device notification coalescing improvements requirements.
    > >>>
    > >>> Signed-off-by: Parav Pandit <parav@nvidia.com>
    > >>> Acked-by: David Edmondson <david.edmondson@oracle.com>
    > >>>
    > >>> ---
    > >>> changelog:
    > >>> v3->v4:
    > >>> - no change
    > >>>
    > >>> v1->v2:
    > >>> - addressed comments from Stefan
    > >>> - redrafted the requirements to use rearm term and avoid queue enable
    > >>> confusion
    > >>> v0->v1:
    > >>> - updated the description
    > >>> ---
    > >>> net-workstream/features-1.4.md | 11 +++++++++++
    > >>> 1 file changed, 11 insertions(+)
    > >>>
    > >>> diff --git a/net-workstream/features-1.4.md
    > >>> b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644
    > >>> --- a/net-workstream/features-1.4.md
    > >>> +++ b/net-workstream/features-1.4.md
    > >>> @@ -8,6 +8,7 @@ together is desired while updating the virtio net
    > interface.
    > >>> # 2. Summary
    > >>> 1. Device counters visible to the driver
    > >>> 2. Low latency tx and rx virtqueues for PCI transport
    > >>> +3. Virtqueue notification coalescing re-arming support
    > >>>
    > >>> # 3. Requirements
    > >>> ## 3.1 Device counters
    > >>> @@ -172,3 +173,13 @@ struct vnet_rx_completion {
    > >>> which can be recycled by the driver when the packets from the
    > completed
    > >>> page is fully consumed.
    > >>> 8. The device should be able to consume multiple pages for a
    > >>> receive GSO
    > >> stream.
    > >>> +
    > >>> +## 3.3 Virtqueue notification coalescing re-arming support 0.
    > >>> +Design
    > >>> +goal:
    > >>> + a. Avoid constant notifications from the device even in conditions when
    > >>> + the driver may not have acted on the previous pending notification.
    > >>> +1. When Tx and Rx virtqueue notification coalescing is enabled, and
    > >>> +when
    > >> such
    > >>> + a notification is reported by the device, the device stops sending further
    > >>> + notifications until the driver rearms the notifications of the virtqueue.
    > >>> +2. When the driver rearms the notification of the virtqueue, the device
    > >>> + to notify again if notification coalescing conditions are met.
    > >> I'm wondering how this relates to the existing notification
    > >> coalesing[1] and notification suppression[2]:
    > >>
    > >> [1]
    > >> The device sends a used buffer notification once the notification
    > >> conditions are met and if the notifications are not suppressed as
    > >> explained in \ref{sec:Basic Facilities of a Virtio Device /
    > >> Virtqueues / Used Buffer Notification Supppression}.
    > >>
    > >> [2]
    > >> If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
    > >> \begin{itemize}
    > >> \item The driver MUST ignore the \field{avail_event} value.
    > >> \item After the driver writes a descriptor index into the available ring:
    > >>    \begin{itemize}
    > >>          \item If \field{flags} is 1, the driver SHOULD NOT send a notification.
    > >>          \item If \field{flags} is 0, the driver MUST send a notification.
    > >>    \end{itemize}
    > >> \end{itemize}
    > >>
    > >> Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
    > >> \begin{itemize}
    > >> \item The driver MUST ignore the lower bit of \field{flags}.
    > >> \item After the driver writes a descriptor index into the available ring:
    > >>    \begin{itemize}
    > >>          \item If the \field{idx} field in the available ring (which determined
    > >>            where that descriptor index was placed) was equal to
    > >>            \field{avail_event}, the driver MUST send a notification.
    > >>          \item Otherwise the driver SHOULD NOT send a notification.
    > >>    \end{itemize}
    > >> \end{itemize}
    > >>
    > >> Regarding notification suppression:
    > >> 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification
    > >> coalesing condition is met, we need to wait for the used_event
    > >> notification condition to be met(the driver does not rearms the
    > >> notification of the virtqueue now and the avail ring  is set
    > VRING_AVAIL_F_NO_INTERRUPT in flag).
    > >> 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off
    > >> the notification, even if the notidication condition is met, the
    > >> device cannot send the notification.
    > >>
    > >> Therefore, if I'm not wrong, a device can issue a notification only
    > >> if the device is not suppressed from notifying the driver.
    > >> [1][2] seems to have met this condition.
    > > Notification suppression using _EVENT_IDX for non-memory transport is just
    > sub-optimal for two reasons.
    > >
    > > 1. It requires device to poll on the used event to learn about when to
    > > un-suppress. (arm) 2. this bit also controls driver notifications yet
    > > again demand device to arbitrarily poll on new descriptors posting
    > >
    > > Hence, an efficient scheme is needed and device notifications to be detached
    > from driver notification.
    > > And now that VQ level notification coalescing is in place, which suppresses
    > the device notifications, it is logical to combine it with VQ device notifications.
    > >
    >
    > Let me summarize:
    > 1. When used idx notification is satisfied, but coalescing is not satisfied, the
    > driver continues to suppress device notifications.
    Ack.

    > 2. When used idx notification is not satisfied, even if coalescing is satisfied, the
    > device still cannot notify the driver.
    Ack.

    > I think that's what coalescing does, and the description below has satisfied this
    > behavior:
    > "The device sends a used buffer notification once the notification conditions
    > are met and if the notifications are not suppressed as explained in \ref{sec:Basic
    > Facilities of a Virtio Device / Virtqueues / Used Buffer Notification
    > Supppression}."
    >
    Ack.
    the proposal here is to not use EVENT_IDX scheme, instead driver to enable/disable notification coalescing in different way, even when notification coalescing parameters are configured.
    And this to be done in fairly fast way (not like a cvq) command. For example like driver notifications.

    > Or we want to say that it has nothing to do with the used idx notification.

    When
    > the coalescing is satisfied and the driver rearms the notification of the
    > virtqueue, the device now send a notification.
    >
    Right.
    F_NOTIFION_ARM is mutually exclusive with F_EVENT_IDX.
    (Like packed vq is mutually exclusive with split q.)



  • 10.  RE: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-17-2023 04:57
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Wednesday, August 16, 2023 6:07 PM > > > å 2023/8/16 äå6:46, Parav Pandit åé: > > > >> From: Heng Qi <hengqi@linux.alibaba.com> > >> Sent: Wednesday, August 16, 2023 2:01 PM > >> > >> å 2023/8/15 äå3:45, Parav Pandit åé: > >>> Add virtio net device notification coalescing improvements requirements. > >>> > >>> Signed-off-by: Parav Pandit <parav@nvidia.com> > >>> Acked-by: David Edmondson <david.edmondson@oracle.com> > >>> > >>> --- > >>> changelog: > >>> v3->v4: > >>> - no change > >>> > >>> v1->v2: > >>> - addressed comments from Stefan > >>> - redrafted the requirements to use rearm term and avoid queue enable > >>> confusion > >>> v0->v1: > >>> - updated the description > >>> --- > >>> net-workstream/features-1.4.md 11 +++++++++++ > >>> 1 file changed, 11 insertions(+) > >>> > >>> diff --git a/net-workstream/features-1.4.md > >>> b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644 > >>> --- a/net-workstream/features-1.4.md > >>> +++ b/net-workstream/features-1.4.md > >>> @@ -8,6 +8,7 @@ together is desired while updating the virtio net > interface. > >>> # 2. Summary > >>> 1. Device counters visible to the driver > >>> 2. Low latency tx and rx virtqueues for PCI transport > >>> +3. Virtqueue notification coalescing re-arming support > >>> > >>> # 3. Requirements > >>> ## 3.1 Device counters > >>> @@ -172,3 +173,13 @@ struct vnet_rx_completion { > >>> which can be recycled by the driver when the packets from the > completed > >>> page is fully consumed. > >>> 8. The device should be able to consume multiple pages for a > >>> receive GSO > >> stream. > >>> + > >>> +## 3.3 Virtqueue notification coalescing re-arming support 0. > >>> +Design > >>> +goal: > >>> + a. Avoid constant notifications from the device even in conditions when > >>> + the driver may not have acted on the previous pending notification. > >>> +1. When Tx and Rx virtqueue notification coalescing is enabled, and > >>> +when > >> such > >>> + a notification is reported by the device, the device stops sending further > >>> + notifications until the driver rearms the notifications of the virtqueue. > >>> +2. When the driver rearms the notification of the virtqueue, the device > >>> + to notify again if notification coalescing conditions are met. > >> I'm wondering how this relates to the existing notification > >> coalesing[1] and notification suppression[2]: > >> > >> [1] > >> The device sends a used buffer notification once the notification > >> conditions are met and if the notifications are not suppressed as > >> explained in
    ef{sec:Basic Facilities of a Virtio Device / > >> Virtqueues / Used Buffer Notification Supppression}. > >> > >> [2] > >> If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: > >> egin{itemize} > >> item The driver MUST ignore the field{avail_event} value. > >> item After the driver writes a descriptor index into the available ring: > >>  egin{itemize} > >>  item If field{flags} is 1, the driver SHOULD NOT send a notification. > >>  item If field{flags} is 0, the driver MUST send a notification. > >>  end{itemize} > >> end{itemize} > >> > >> Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: > >> egin{itemize} > >> item The driver MUST ignore the lower bit of field{flags}. > >> item After the driver writes a descriptor index into the available ring: > >>  egin{itemize} > >>  item If the field{idx} field in the available ring (which determined > >>  where that descriptor index was placed) was equal to > >>  field{avail_event}, the driver MUST send a notification. > >>  item Otherwise the driver SHOULD NOT send a notification. > >>  end{itemize} > >> end{itemize} > >> > >> Regarding notification suppression: > >> 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification > >> coalesing condition is met, we need to wait for the used_event > >> notification condition to be met(the driver does not rearms the > >> notification of the virtqueue now and the avail ring is set > VRING_AVAIL_F_NO_INTERRUPT in flag). > >> 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off > >> the notification, even if the notidication condition is met, the > >> device cannot send the notification. > >> > >> Therefore, if I'm not wrong, a device can issue a notification only > >> if the device is not suppressed from notifying the driver. > >> [1][2] seems to have met this condition. > > Notification suppression using _EVENT_IDX for non-memory transport is just > sub-optimal for two reasons. > > > > 1. It requires device to poll on the used event to learn about when to > > un-suppress. (arm) 2. this bit also controls driver notifications yet > > again demand device to arbitrarily poll on new descriptors posting > > > > Hence, an efficient scheme is needed and device notifications to be detached > from driver notification. > > And now that VQ level notification coalescing is in place, which suppresses > the device notifications, it is logical to combine it with VQ device notifications. > > > > Let me summarize: > 1. When used idx notification is satisfied, but coalescing is not satisfied, the > driver continues to suppress device notifications. Ack. > 2. When used idx notification is not satisfied, even if coalescing is satisfied, the > device still cannot notify the driver. Ack. > I think that's what coalescing does, and the description below has satisfied this > behavior: > "The device sends a used buffer notification once the notification conditions > are met and if the notifications are not suppressed as explained in
    ef{sec:Basic > Facilities of a Virtio Device / Virtqueues / Used Buffer Notification > Supppression}." > Ack. the proposal here is to not use EVENT_IDX scheme, instead driver to enable/disable notification coalescing in different way, even when notification coalescing parameters are configured. And this to be done in fairly fast way (not like a cvq) command. For example like driver notifications. > Or we want to say that it has nothing to do with the used idx notification. When > the coalescing is satisfied and the driver rearms the notification of the > virtqueue, the device now send a notification. > Right. F_NOTIFION_ARM is mutually exclusive with F_EVENT_IDX. (Like packed vq is mutually exclusive with split q.)


  • 11.  Re: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-17-2023 05:14


    ? 2023/8/17 ??12:57, Parav Pandit ??:
    >
    >> From: Heng Qi <hengqi@linux.alibaba.com>
    >> Sent: Wednesday, August 16, 2023 6:07 PM
    >>
    >>
    >> ? 2023/8/16 ??6:46, Parav Pandit ??:
    >>>> From: Heng Qi <hengqi@linux.alibaba.com>
    >>>> Sent: Wednesday, August 16, 2023 2:01 PM
    >>>>
    >>>> ? 2023/8/15 ??3:45, Parav Pandit ??:
    >>>>> Add virtio net device notification coalescing improvements requirements.
    >>>>>
    >>>>> Signed-off-by: Parav Pandit <parav@nvidia.com>
    >>>>> Acked-by: David Edmondson <david.edmondson@oracle.com>
    >>>>>
    >>>>> ---
    >>>>> changelog:
    >>>>> v3->v4:
    >>>>> - no change
    >>>>>
    >>>>> v1->v2:
    >>>>> - addressed comments from Stefan
    >>>>> - redrafted the requirements to use rearm term and avoid queue enable
    >>>>> confusion
    >>>>> v0->v1:
    >>>>> - updated the description
    >>>>> ---
    >>>>> net-workstream/features-1.4.md | 11 +++++++++++
    >>>>> 1 file changed, 11 insertions(+)
    >>>>>
    >>>>> diff --git a/net-workstream/features-1.4.md
    >>>>> b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644
    >>>>> --- a/net-workstream/features-1.4.md
    >>>>> +++ b/net-workstream/features-1.4.md
    >>>>> @@ -8,6 +8,7 @@ together is desired while updating the virtio net
    >> interface.
    >>>>> # 2. Summary
    >>>>> 1. Device counters visible to the driver
    >>>>> 2. Low latency tx and rx virtqueues for PCI transport
    >>>>> +3. Virtqueue notification coalescing re-arming support
    >>>>>
    >>>>> # 3. Requirements
    >>>>> ## 3.1 Device counters
    >>>>> @@ -172,3 +173,13 @@ struct vnet_rx_completion {
    >>>>> which can be recycled by the driver when the packets from the
    >> completed
    >>>>> page is fully consumed.
    >>>>> 8. The device should be able to consume multiple pages for a
    >>>>> receive GSO
    >>>> stream.
    >>>>> +
    >>>>> +## 3.3 Virtqueue notification coalescing re-arming support 0.
    >>>>> +Design
    >>>>> +goal:
    >>>>> + a. Avoid constant notifications from the device even in conditions when
    >>>>> + the driver may not have acted on the previous pending notification.
    >>>>> +1. When Tx and Rx virtqueue notification coalescing is enabled, and
    >>>>> +when
    >>>> such
    >>>>> + a notification is reported by the device, the device stops sending further
    >>>>> + notifications until the driver rearms the notifications of the virtqueue.
    >>>>> +2. When the driver rearms the notification of the virtqueue, the device
    >>>>> + to notify again if notification coalescing conditions are met.
    >>>> I'm wondering how this relates to the existing notification
    >>>> coalesing[1] and notification suppression[2]:
    >>>>
    >>>> [1]
    >>>> The device sends a used buffer notification once the notification
    >>>> conditions are met and if the notifications are not suppressed as
    >>>> explained in \ref{sec:Basic Facilities of a Virtio Device /
    >>>> Virtqueues / Used Buffer Notification Supppression}.
    >>>>
    >>>> [2]
    >>>> If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
    >>>> \begin{itemize}
    >>>> \item The driver MUST ignore the \field{avail_event} value.
    >>>> \item After the driver writes a descriptor index into the available ring:
    >>>>    \begin{itemize}
    >>>>          \item If \field{flags} is 1, the driver SHOULD NOT send a notification.
    >>>>          \item If \field{flags} is 0, the driver MUST send a notification.
    >>>>    \end{itemize}
    >>>> \end{itemize}
    >>>>
    >>>> Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
    >>>> \begin{itemize}
    >>>> \item The driver MUST ignore the lower bit of \field{flags}.
    >>>> \item After the driver writes a descriptor index into the available ring:
    >>>>    \begin{itemize}
    >>>>          \item If the \field{idx} field in the available ring (which determined
    >>>>            where that descriptor index was placed) was equal to
    >>>>            \field{avail_event}, the driver MUST send a notification.
    >>>>          \item Otherwise the driver SHOULD NOT send a notification.
    >>>>    \end{itemize}
    >>>> \end{itemize}
    >>>>
    >>>> Regarding notification suppression:
    >>>> 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification
    >>>> coalesing condition is met, we need to wait for the used_event
    >>>> notification condition to be met(the driver does not rearms the
    >>>> notification of the virtqueue now and the avail ring  is set
    >> VRING_AVAIL_F_NO_INTERRUPT in flag).
    >>>> 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off
    >>>> the notification, even if the notidication condition is met, the
    >>>> device cannot send the notification.
    >>>>
    >>>> Therefore, if I'm not wrong, a device can issue a notification only
    >>>> if the device is not suppressed from notifying the driver.
    >>>> [1][2] seems to have met this condition.
    >>> Notification suppression using _EVENT_IDX for non-memory transport is just
    >> sub-optimal for two reasons.
    >>> 1. It requires device to poll on the used event to learn about when to
    >>> un-suppress. (arm) 2. this bit also controls driver notifications yet
    >>> again demand device to arbitrarily poll on new descriptors posting
    >>>
    >>> Hence, an efficient scheme is needed and device notifications to be detached
    >> from driver notification.
    >>> And now that VQ level notification coalescing is in place, which suppresses
    >> the device notifications, it is logical to combine it with VQ device notifications.
    >> Let me summarize:
    >> 1. When used idx notification is satisfied, but coalescing is not satisfied, the
    >> driver continues to suppress device notifications.
    > Ack.
    >
    >> 2. When used idx notification is not satisfied, even if coalescing is satisfied, the
    >> device still cannot notify the driver.
    > Ack.
    >
    >> I think that's what coalescing does, and the description below has satisfied this
    >> behavior:
    >> "The device sends a used buffer notification once the notification conditions
    >> are met and if the notifications are not suppressed as explained in \ref{sec:Basic
    >> Facilities of a Virtio Device / Virtqueues / Used Buffer Notification
    >> Supppression}."
    >>
    > Ack.
    > the proposal here is to not use EVENT_IDX scheme, instead driver to enable/disable notification coalescing in different way, even when notification coalescing parameters are configured.
    > And this to be done in fairly fast way (not like a cvq) command. For example like driver notifications.
    >
    >> Or we want to say that it has nothing to do with the used idx notification.
    > When
    >> the coalescing is satisfied and the driver rearms the notification of the
    >> virtqueue, the device now send a notification.
    >>
    > Right.
    > F_NOTIFION_ARM is mutually exclusive with F_EVENT_IDX.

    OK, I think I get your point, F_NOTIFION_ARM is mutually exclusive
    with VIRTQ_AVAIL_F_NO_INTERRUPT\used idx\VIRTIO_F_NOTIFY_ON_EMPT,
    and it seems that F_NOTIFION_ARM has the highest priority, and it needs
    a new feature bit. Am I right :)?

    Thanks!

    > (Like packed vq is mutually exclusive with split q.)




  • 12.  Re: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-17-2023 05:14
    å 2023/8/17 äå12:57, Parav Pandit åé: From: Heng Qi <hengqi@linux.alibaba.com> Sent: Wednesday, August 16, 2023 6:07 PM å 2023/8/16 äå6:46, Parav Pandit åé: From: Heng Qi <hengqi@linux.alibaba.com> Sent: Wednesday, August 16, 2023 2:01 PM å 2023/8/15 äå3:45, Parav Pandit åé: Add virtio net device notification coalescing improvements requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v3->v4: - no change v1->v2: - addressed comments from Stefan - redrafted the requirements to use rearm term and avoid queue enable confusion v0->v1: - updated the description --- net-workstream/features-1.4.md 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 72d04bd..cb72442 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -8,6 +8,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport +3. Virtqueue notification coalescing re-arming support # 3. Requirements ## 3.1 Device counters @@ -172,3 +173,13 @@ struct vnet_rx_completion { which can be recycled by the driver when the packets from the completed page is fully consumed. 8. The device should be able to consume multiple pages for a receive GSO stream. + +## 3.3 Virtqueue notification coalescing re-arming support 0. +Design +goal: + a. Avoid constant notifications from the device even in conditions when + the driver may not have acted on the previous pending notification. +1. When Tx and Rx virtqueue notification coalescing is enabled, and +when such + a notification is reported by the device, the device stops sending further + notifications until the driver rearms the notifications of the virtqueue. +2. When the driver rearms the notification of the virtqueue, the device + to notify again if notification coalescing conditions are met. I'm wondering how this relates to the existing notification coalesing[1] and notification suppression[2]: [1] The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Supppression}. [2] If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: egin{itemize} item The driver MUST ignore the field{avail_event} value. item After the driver writes a descriptor index into the available ring:  egin{itemize}  item If field{flags} is 1, the driver SHOULD NOT send a notification.  item If field{flags} is 0, the driver MUST send a notification.  end{itemize} end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: egin{itemize} item The driver MUST ignore the lower bit of field{flags}. item After the driver writes a descriptor index into the available ring:  egin{itemize}  item If the field{idx} field in the available ring (which determined  where that descriptor index was placed) was equal to  field{avail_event}, the driver MUST send a notification.  item Otherwise the driver SHOULD NOT send a notification.  end{itemize} end{itemize} Regarding notification suppression: 1.When there is VIRTIO_NET_F_EVENT_IDX, even if the notification coalesing condition is met, we need to wait for the used_event notification condition to be met(the driver does not rearms the notification of the virtqueue now and the avail ring is set VRING_AVAIL_F_NO_INTERRUPT in flag). 2.When there is no VIRTIO_NET_F_EVENT_IDX, if the driver turns off the notification, even if the notidication condition is met, the device cannot send the notification. Therefore, if I'm not wrong, a device can issue a notification only if the device is not suppressed from notifying the driver. [1][2] seems to have met this condition. Notification suppression using _EVENT_IDX for non-memory transport is just sub-optimal for two reasons. 1. It requires device to poll on the used event to learn about when to un-suppress. (arm) 2. this bit also controls driver notifications yet again demand device to arbitrarily poll on new descriptors posting Hence, an efficient scheme is needed and device notifications to be detached from driver notification. And now that VQ level notification coalescing is in place, which suppresses the device notifications, it is logical to combine it with VQ device notifications. Let me summarize: 1. When used idx notification is satisfied, but coalescing is not satisfied, the driver continues to suppress device notifications. Ack. 2. When used idx notification is not satisfied, even if coalescing is satisfied, the device still cannot notify the driver. Ack. I think that's what coalescing does, and the description below has satisfied this behavior: "The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in
    ef{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Supppression}." Ack. the proposal here is to not use EVENT_IDX scheme, instead driver to enable/disable notification coalescing in different way, even when notification coalescing parameters are configured. And this to be done in fairly fast way (not like a cvq) command. For example like driver notifications. Or we want to say that it has nothing to do with the used idx notification. When the coalescing is satisfied and the driver rearms the notification of the virtqueue, the device now send a notification. Right. F_NOTIFION_ARM is mutually exclusive with F_EVENT_IDX. OK, I think I get your point, F_NOTIFION_ARM is mutually exclusive with VIRTQ_AVAIL_F_NO_INTERRUPTused idxVIRTIO_F_NOTIFY_ON_EMPT, and it seems that F_NOTIFION_ARM has the highest priority, and it needs a new feature bit. Am I right :)? Thanks! (Like packed vq is mutually exclusive with split q.)


  • 13.  RE: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-17-2023 05:21


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Thursday, August 17, 2023 10:44 AM


    > > F_NOTIFION_ARM is mutually exclusive with F_EVENT_IDX.
    >
    > OK, I think I get your point, F_NOTIFION_ARM is mutually exclusive with
    > VIRTQ_AVAIL_F_NO_INTERRUPT\used idx\VIRTIO_F_NOTIFY_ON_EMPT, and it
    > seems that F_NOTIFION_ARM has the highest priority, and it needs a new
    > feature bit. Am I right :)?
    >
    Yes for the new feature bit.
    Since its mutually exclusive, there is no notion of priority.

    > Thanks!
    >
    > > (Like packed vq is mutually exclusive with split q.)




  • 14.  RE: [PATCH requirements v4 4/7] net-features: Add notification coalescing requirements

    Posted 08-17-2023 05:21
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Thursday, August 17, 2023 10:44 AM > > F_NOTIFION_ARM is mutually exclusive with F_EVENT_IDX. > > OK, I think I get your point, F_NOTIFION_ARM is mutually exclusive with > VIRTQ_AVAIL_F_NO_INTERRUPTused idxVIRTIO_F_NOTIFY_ON_EMPT, and it > seems that F_NOTIFION_ARM has the highest priority, and it needs a new > feature bit. Am I right :)? > Yes for the new feature bit. Since its mutually exclusive, there is no notion of priority. > Thanks! > > > (Like packed vq is mutually exclusive with split q.)


  • 15.  [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-15-2023 07:47
    Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v3->v4: - Addressed comments from Satananda, Heng, David - removed context specific wording, replaced with destination - added group create/delete examples and updated requirements - added optional support to use cvq for flor filter commands - added example of transporting flow filter commands over cvq - made group size to be 16-bit - added concept of 0->n max flow filter entries based on max count - added concept of 0->n max flow group based on max count - split field bitmask to separate command from other filter capabilities - rewrote rx filter processing chain order with respect to existing filter commands and rss - made flow_id flat across all groups v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 151 +++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -183,3 +184,153 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) +0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific destination to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. +5. Flow filter operations are often accelerated by device in a hardware. Ability + to handle them on a queue other than control vq is desired. This achieves near + zero modifications to existing implementations to add new operations on new + purpose built queues (similar to transmit and receive queue). + Therefore, when flow filter queues are supported, it is strongly recommended + to use it, when flow filter queues are not supported, if the device support + it using cvq, driver should be able to use over cvq. +6. The filter masks are optional; the device should be able to expose if it + support filter masks. +7. The driver may want to have priority among group of flow entries; to facilitate + the device support grouping flow filter entries by a notion of a flow group. + Each flow group defines priority in processing flow. +8. The driver and group owner driver should be able to query supported device + limits for the receive flow filters. + +### 3.4.2 flow operations path +1. The driver should be able to define a receive packet match criteria, an + action and a destination for a packet. For example, an ipv4 packet with a + multicast address to be steered to the receive vq 0. The second example is + ipv4, tcp packet matching a specified IP address and tcp port tuple to + be steered to receive vq 10. +2. The match criteria should include exact tuple fields well-defined such as mac + address, IP addresses, tcp/udp ports, etc. +3. The match criteria should also optionally include the field mask. +5. Action includes (a) dropping or (b) forwarding the packet. +6. Destination is a receive virtqueue index. +7. Receive packet processing chain is: + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. + b. filters programmed using RFF functiionality. + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command. + Whichever filtering and steering functionality is enabled, they are applied + in the above order. +9. If multiple entries are programmed which has overlapping filtering attributes + for a received packet, the driver to define the location/priority of the entry. +10. The filter entries are usually short in size of few tens of bytes, + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is + high, hence supplying fields inside the queue descriptor is preferred for + up to a certain fixed size, say 96 bytes. +11. A flow filter entry consists of (a) match criteria, (b) action, + (c) destination and (d) a unique 32 bit flow id, all supplied by the + driver. +12. The driver should be able to query and delete flow filter entry by the + the device by the flow id. + +### 3.4.3 interface example + +1. Flow filter capabilities to query using a DMA interface such as cvq +using two different commands. + +``` +/* command 1 */ +struct flow_filter_capabilities { + le16 start_vq_index; + le16 num_flow_filter_vqs; + le16 max_flow_groups; + le16 max_group_priorities; /* max priorities of the group */ + le32 max_flow_filters_per_group; + le32 max_flow_filters; /* max flow_id in add/del + * is equal = max_flow_filters - 1. + */ + u8 max_priorities_per_group; +}; + +/* command 2 */ +struct flow_filter_fields_support_mask { + le64 supported_packet_field_mask_bmap[1]; +}; + +``` + +2. Group add/delete cvq commands: +``` + +struct virtio_net_rff_group_add { + le16 priority; + le16 group_id; +}; + + +struct virtio_net_rff_group_delete { + le16 group_id; + +``` + +3. Flow filter entry add/modify, delete over flow vq: + +``` +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 padding; + u16 group_id; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +struct virtio_net_rff_delete { + u8 flow_op; + u8 padding[3]; + le32 flow_id; +}; + +``` + +4. Flow filter commands over cvq: + +``` + +struct virtio_net_rff_cmd { + u8 class; /* RFF class */ + u8 commands; /* RFF cmd = A */ + u8 command-specific-data[]; /* contains struct virtio_net_rff_add_modify or + * struct virtio_net_rff_delete + */ +}; + +``` + +### 3.4.4 For incremental future +a. Driver should be able to specify a specific packet byte offset, number + of bytes and mask as math criteria. +b. Support RSS context, in addition to a specific RQ. +c. If/when virtio switch object is implemented, support ingress/egress flow + filters at the switch port level. -- 2.26.2


  • 16.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 06:28
    Comments below from today's bi-weekly meeting to address in v5.


    > From: Parav Pandit <parav@nvidia.com>
    > Sent: Tuesday, August 15, 2023 1:16 PM
    >
    > Add virtio net device requirements for receive flow filters.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v3->v4:
    > - Addressed comments from Satananda, Heng, David
    > - removed context specific wording, replaced with destination
    > - added group create/delete examples and updated requirements
    > - added optional support to use cvq for flor filter commands
    > - added example of transporting flow filter commands over cvq
    > - made group size to be 16-bit
    > - added concept of 0->n max flow filter entries based on max count
    > - added concept of 0->n max flow group based on max count
    > - split field bitmask to separate command from other filter capabilities
    > - rewrote rx filter processing chain order with respect to existing
    > filter commands and rss
    > - made flow_id flat across all groups
    > v1->v2:
    > - split setup and operations requirements
    > - added design goal
    > - worded requirements more precisely
    > v0->v1:
    > - fixed comments from Heng Li
    > - renamed receive flow steering to receive flow filters
    > - clarified byte offset in match criteria
    > ---
    > net-workstream/features-1.4.md | 151
    > +++++++++++++++++++++++++++++++++
    > 1 file changed, 151 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index cb72442..78bb3d2 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    > 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for
    > PCI transport 3. Virtqueue notification coalescing re-arming support
    > +4 Virtqueue receive flow filters (RFF)
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -183,3 +184,153 @@ struct vnet_rx_completion {
    > notifications until the driver rearms the notifications of the virtqueue.
    > 2. When the driver rearms the notification of the virtqueue, the device
    > to notify again if notification coalescing conditions are met.
    > +
    > +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal:
    > + To filter and/or to steer packet based on specific pattern match to a
    > + specific destination to support application/networking stack driven receive
    > + processing.
    > +1. Two use cases are: to support Linux netdev set_rxnfc() for
    > ETHTOOL_SRXCLSRLINS
    > + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    > +
    > +### 3.4.1 control path
    > +1. The number of flow filter operations/sec can range from 100k/sec to
    > 1M/sec
    > + or even more. Hence flow filter operations must be done over a queueing
    > + interface using one or more queues.
    > +2. The device should be able to expose one or more supported flow filter
    > queue
    > + count and its start vq index to the driver.
    > +3. As each device may be operating for different performance characteristic,
    > + start vq index and count may be different for each device. Secondly, it is
    > + inefficient for device to provide flow filters capabilities via a config space
    > + region. Hence, the device should be able to share these attributes using
    > + dma interface, instead of transport registers.
    > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > + will likely create these queues when flow filters are enabled.
    > +5. Flow filter operations are often accelerated by device in a hardware. Ability
    > + to handle them on a queue other than control vq is desired. This achieves
    > near
    > + zero modifications to existing implementations to add new operations on
    > new
    > + purpose built queues (similar to transmit and receive queue).
    > + Therefore, when flow filter queues are supported, it is strongly
    > recommended
    > + to use it, when flow filter queues are not supported, if the device support
    > + it using cvq, driver should be able to use over cvq.

    Rephase is like below.
    0. Flow filter queues and flow filter commands on cvq are mutually exclusive.

    1. When flow queues are supported, driver should create flow filter queues and use it.
    (Since cvq is not enabled for flow filters, any flow filter command coming on cvq must fail).

    2. If driver wants to use flow filters over cvq, driver must explicitly enable flow filters on cvq via a command, when it is enabled on the cvq driver cannot use flow filter queues.
    This eliminates any synchronization needed by the device among different types of queues.


    > +6. The filter masks are optional; the device should be able to expose if it
    > + support filter masks.
    > +7. The driver may want to have priority among group of flow entries; to
    > facilitate
    > + the device support grouping flow filter entries by a notion of a flow group.
    > + Each flow group defines priority in processing flow.
    > +8. The driver and group owner driver should be able to query supported
    > device
    > + limits for the receive flow filters.
    > +
    > +### 3.4.2 flow operations path
    > +1. The driver should be able to define a receive packet match criteria, an
    > + action and a destination for a packet. For example, an ipv4 packet with a
    > + multicast address to be steered to the receive vq 0. The second example is
    > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > + be steered to receive vq 10.
    > +2. The match criteria should include exact tuple fields well-defined such as
    > mac
    > + address, IP addresses, tcp/udp ports, etc.
    > +3. The match criteria should also optionally include the field mask.
    > +5. Action includes (a) dropping or (b) forwarding the packet.
    > +6. Destination is a receive virtqueue index.
    > +7. Receive packet processing chain is:
    > + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX,
    > + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN.
    > + b. filters programmed using RFF functiionality.
    > + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG
    > command.
    > + Whichever filtering and steering functionality is enabled, they are applied
    > + in the above order.
    > +9. If multiple entries are programmed which has overlapping filtering attributes
    > + for a received packet, the driver to define the location/priority of the entry.
    > +10. The filter entries are usually short in size of few tens of bytes,
    > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > + high, hence supplying fields inside the queue descriptor is preferred for
    > + up to a certain fixed size, say 96 bytes.
    > +11. A flow filter entry consists of (a) match criteria, (b) action,
    > + (c) destination and (d) a unique 32 bit flow id, all supplied by the
    > + driver.
    > +12. The driver should be able to query and delete flow filter entry by the
    > + the device by the flow id.
    > +
    > +### 3.4.3 interface example
    > +
    > +1. Flow filter capabilities to query using a DMA interface such as cvq
    > +using two different commands.
    > +
    > +```
    > +/* command 1 */
    > +struct flow_filter_capabilities {
    > + le16 start_vq_index;
    > + le16 num_flow_filter_vqs;
    > + le16 max_flow_groups;
    > + le16 max_group_priorities; /* max priorities of the group */
    > + le32 max_flow_filters_per_group;
    > + le32 max_flow_filters; /* max flow_id in add/del
    > + * is equal = max_flow_filters - 1.
    > + */
    > + u8 max_priorities_per_group;
    > +};
    > +
    > +/* command 2 */
    > +struct flow_filter_fields_support_mask {
    > + le64 supported_packet_field_mask_bmap[1];
    > +};
    Explain this bitmap that it indicates well known packet field such as src mac, dest ip, etc.

    Also expose it on AQ command so that live migration flow/provision flow can decide which device to use.

    > +
    > +```
    > +
    > +2. Group add/delete cvq commands:
    > +```
    > +
    > +struct virtio_net_rff_group_add {
    > + le16 priority;
    > + le16 group_id;
    > +};
    > +
    > +
    > +struct virtio_net_rff_group_delete {
    > + le16 group_id;
    > +
    > +```
    > +
    > +3. Flow filter entry add/modify, delete over flow vq:
    > +
    > +```
    > +struct virtio_net_rff_add_modify {
    > + u8 flow_op;
    > + u8 padding;
    > + u16 group_id;
    > + le32 flow_id;
    > + struct match_criteria mc;
    > + struct destination dest;
    > + struct action action;
    > +
    > + struct match_criteria mask; /* optional */
    > +};
    > +
    > +struct virtio_net_rff_delete {
    > + u8 flow_op;
    > + u8 padding[3];
    > + le32 flow_id;
    > +};
    > +
    > +```
    > +
    > +4. Flow filter commands over cvq:
    > +
    > +```
    > +
    > +struct virtio_net_rff_cmd {
    > + u8 class; /* RFF class */
    > + u8 commands; /* RFF cmd = A */
    > + u8 command-specific-data[]; /* contains struct
    > virtio_net_rff_add_modify or
    > + * struct virtio_net_rff_delete
    > + */ };
    > +
    > +```
    > +
    > +### 3.4.4 For incremental future
    > +a. Driver should be able to specify a specific packet byte offset, number
    > + of bytes and mask as math criteria.
    > +b. Support RSS context, in addition to a specific RQ.
    > +c. If/when virtio switch object is implemented, support ingress/egress flow
    > + filters at the switch port level.
    > --
    > 2.26.2




  • 17.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 06:28
    Comments below from today's bi-weekly meeting to address in v5. > From: Parav Pandit <parav@nvidia.com> > Sent: Tuesday, August 15, 2023 1:16 PM > > Add virtio net device requirements for receive flow filters. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v3->v4: > - Addressed comments from Satananda, Heng, David > - removed context specific wording, replaced with destination > - added group create/delete examples and updated requirements > - added optional support to use cvq for flor filter commands > - added example of transporting flow filter commands over cvq > - made group size to be 16-bit > - added concept of 0->n max flow filter entries based on max count > - added concept of 0->n max flow group based on max count > - split field bitmask to separate command from other filter capabilities > - rewrote rx filter processing chain order with respect to existing > filter commands and rss > - made flow_id flat across all groups > v1->v2: > - split setup and operations requirements > - added design goal > - worded requirements more precisely > v0->v1: > - fixed comments from Heng Li > - renamed receive flow steering to receive flow filters > - clarified byte offset in match criteria > --- > net-workstream/features-1.4.md 151 > +++++++++++++++++++++++++++++++++ > 1 file changed, 151 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index cb72442..78bb3d2 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. > 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for > PCI transport 3. Virtqueue notification coalescing re-arming support > +4 Virtqueue receive flow filters (RFF) > > # 3. Requirements > ## 3.1 Device counters > @@ -183,3 +184,153 @@ struct vnet_rx_completion { > notifications until the driver rearms the notifications of the virtqueue. > 2. When the driver rearms the notification of the virtqueue, the device > to notify again if notification coalescing conditions are met. > + > +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal: > + To filter and/or to steer packet based on specific pattern match to a > + specific destination to support application/networking stack driven receive > + processing. > +1. Two use cases are: to support Linux netdev set_rxnfc() for > ETHTOOL_SRXCLSRLINS > + and to support netdev feature NETIF_F_NTUPLE aka ARFS. > + > +### 3.4.1 control path > +1. The number of flow filter operations/sec can range from 100k/sec to > 1M/sec > + or even more. Hence flow filter operations must be done over a queueing > + interface using one or more queues. > +2. The device should be able to expose one or more supported flow filter > queue > + count and its start vq index to the driver. > +3. As each device may be operating for different performance characteristic, > + start vq index and count may be different for each device. Secondly, it is > + inefficient for device to provide flow filters capabilities via a config space > + region. Hence, the device should be able to share these attributes using > + dma interface, instead of transport registers. > +4. Since flow filters are enabled much later in the driver life cycle, driver > + will likely create these queues when flow filters are enabled. > +5. Flow filter operations are often accelerated by device in a hardware. Ability > + to handle them on a queue other than control vq is desired. This achieves > near > + zero modifications to existing implementations to add new operations on > new > + purpose built queues (similar to transmit and receive queue). > + Therefore, when flow filter queues are supported, it is strongly > recommended > + to use it, when flow filter queues are not supported, if the device support > + it using cvq, driver should be able to use over cvq. Rephase is like below. 0. Flow filter queues and flow filter commands on cvq are mutually exclusive. 1. When flow queues are supported, driver should create flow filter queues and use it. (Since cvq is not enabled for flow filters, any flow filter command coming on cvq must fail). 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow filters on cvq via a command, when it is enabled on the cvq driver cannot use flow filter queues. This eliminates any synchronization needed by the device among different types of queues. > +6. The filter masks are optional; the device should be able to expose if it > + support filter masks. > +7. The driver may want to have priority among group of flow entries; to > facilitate > + the device support grouping flow filter entries by a notion of a flow group. > + Each flow group defines priority in processing flow. > +8. The driver and group owner driver should be able to query supported > device > + limits for the receive flow filters. > + > +### 3.4.2 flow operations path > +1. The driver should be able to define a receive packet match criteria, an > + action and a destination for a packet. For example, an ipv4 packet with a > + multicast address to be steered to the receive vq 0. The second example is > + ipv4, tcp packet matching a specified IP address and tcp port tuple to > + be steered to receive vq 10. > +2. The match criteria should include exact tuple fields well-defined such as > mac > + address, IP addresses, tcp/udp ports, etc. > +3. The match criteria should also optionally include the field mask. > +5. Action includes (a) dropping or (b) forwarding the packet. > +6. Destination is a receive virtqueue index. > +7. Receive packet processing chain is: > + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, > + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. > + b. filters programmed using RFF functiionality. > + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG > command. > + Whichever filtering and steering functionality is enabled, they are applied > + in the above order. > +9. If multiple entries are programmed which has overlapping filtering attributes > + for a received packet, the driver to define the location/priority of the entry. > +10. The filter entries are usually short in size of few tens of bytes, > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is > + high, hence supplying fields inside the queue descriptor is preferred for > + up to a certain fixed size, say 96 bytes. > +11. A flow filter entry consists of (a) match criteria, (b) action, > + (c) destination and (d) a unique 32 bit flow id, all supplied by the > + driver. > +12. The driver should be able to query and delete flow filter entry by the > + the device by the flow id. > + > +### 3.4.3 interface example > + > +1. Flow filter capabilities to query using a DMA interface such as cvq > +using two different commands. > + > +``` > +/* command 1 */ > +struct flow_filter_capabilities { > + le16 start_vq_index; > + le16 num_flow_filter_vqs; > + le16 max_flow_groups; > + le16 max_group_priorities; /* max priorities of the group */ > + le32 max_flow_filters_per_group; > + le32 max_flow_filters; /* max flow_id in add/del > + * is equal = max_flow_filters - 1. > + */ > + u8 max_priorities_per_group; > +}; > + > +/* command 2 */ > +struct flow_filter_fields_support_mask { > + le64 supported_packet_field_mask_bmap[1]; > +}; Explain this bitmap that it indicates well known packet field such as src mac, dest ip, etc. Also expose it on AQ command so that live migration flow/provision flow can decide which device to use. > + > +``` > + > +2. Group add/delete cvq commands: > +``` > + > +struct virtio_net_rff_group_add { > + le16 priority; > + le16 group_id; > +}; > + > + > +struct virtio_net_rff_group_delete { > + le16 group_id; > + > +``` > + > +3. Flow filter entry add/modify, delete over flow vq: > + > +``` > +struct virtio_net_rff_add_modify { > + u8 flow_op; > + u8 padding; > + u16 group_id; > + le32 flow_id; > + struct match_criteria mc; > + struct destination dest; > + struct action action; > + > + struct match_criteria mask; /* optional */ > +}; > + > +struct virtio_net_rff_delete { > + u8 flow_op; > + u8 padding[3]; > + le32 flow_id; > +}; > + > +``` > + > +4. Flow filter commands over cvq: > + > +``` > + > +struct virtio_net_rff_cmd { > + u8 class; /* RFF class */ > + u8 commands; /* RFF cmd = A */ > + u8 command-specific-data[]; /* contains struct > virtio_net_rff_add_modify or > + * struct virtio_net_rff_delete > + */ }; > + > +``` > + > +### 3.4.4 For incremental future > +a. Driver should be able to specify a specific packet byte offset, number > + of bytes and mask as math criteria. > +b. Support RSS context, in addition to a specific RQ. > +c. If/when virtio switch object is implemented, support ingress/egress flow > + filters at the switch port level. > -- > 2.26.2


  • 18.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 07:38


    ? 2023/8/16 ??2:27, Parav Pandit ??:
    > Comments below from today's bi-weekly meeting to address in v5.

    Thanks Parav!

    >
    >
    >> From: Parav Pandit <parav@nvidia.com>
    >> Sent: Tuesday, August 15, 2023 1:16 PM
    >>
    >> Add virtio net device requirements for receive flow filters.
    >>
    >> Signed-off-by: Parav Pandit <parav@nvidia.com>
    >> ---
    >> changelog:
    >> v3->v4:
    >> - Addressed comments from Satananda, Heng, David
    >> - removed context specific wording, replaced with destination
    >> - added group create/delete examples and updated requirements
    >> - added optional support to use cvq for flor filter commands
    >> - added example of transporting flow filter commands over cvq
    >> - made group size to be 16-bit
    >> - added concept of 0->n max flow filter entries based on max count
    >> - added concept of 0->n max flow group based on max count
    >> - split field bitmask to separate command from other filter capabilities
    >> - rewrote rx filter processing chain order with respect to existing
    >> filter commands and rss
    >> - made flow_id flat across all groups
    >> v1->v2:
    >> - split setup and operations requirements
    >> - added design goal
    >> - worded requirements more precisely
    >> v0->v1:
    >> - fixed comments from Heng Li
    >> - renamed receive flow steering to receive flow filters
    >> - clarified byte offset in match criteria
    >> ---
    >> net-workstream/features-1.4.md | 151
    >> +++++++++++++++++++++++++++++++++
    >> 1 file changed, 151 insertions(+)
    >>
    >> diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    >> index cb72442..78bb3d2 100644
    >> --- a/net-workstream/features-1.4.md
    >> +++ b/net-workstream/features-1.4.md
    >> @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    >> 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for
    >> PCI transport 3. Virtqueue notification coalescing re-arming support
    >> +4 Virtqueue receive flow filters (RFF)
    >>
    >> # 3. Requirements
    >> ## 3.1 Device counters
    >> @@ -183,3 +184,153 @@ struct vnet_rx_completion {
    >> notifications until the driver rearms the notifications of the virtqueue.
    >> 2. When the driver rearms the notification of the virtqueue, the device
    >> to notify again if notification coalescing conditions are met.
    >> +
    >> +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal:
    >> + To filter and/or to steer packet based on specific pattern match to a
    >> + specific destination to support application/networking stack driven receive
    >> + processing.
    >> +1. Two use cases are: to support Linux netdev set_rxnfc() for
    >> ETHTOOL_SRXCLSRLINS
    >> + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    >> +
    >> +### 3.4.1 control path
    >> +1. The number of flow filter operations/sec can range from 100k/sec to
    >> 1M/sec
    >> + or even more. Hence flow filter operations must be done over a queueing
    >> + interface using one or more queues.
    >> +2. The device should be able to expose one or more supported flow filter
    >> queue
    >> + count and its start vq index to the driver.
    >> +3. As each device may be operating for different performance characteristic,
    >> + start vq index and count may be different for each device. Secondly, it is
    >> + inefficient for device to provide flow filters capabilities via a config space
    >> + region. Hence, the device should be able to share these attributes using
    >> + dma interface, instead of transport registers.
    >> +4. Since flow filters are enabled much later in the driver life cycle, driver
    >> + will likely create these queues when flow filters are enabled.

    Regarding this description, I want to say that ARFS will be enabled at
    runtime.
    But ethtool RFF will be used at any time as long as the device is ready.

    Combining what was discussed in today's meeting, flow vqs and ctrlq are
    mutually exclusive,
    so if flow vqs are supported, then ethtool RFF can use flow vq:

    "
    0. Flow filter queues and flow filter commands on cvq are mutually
    exclusive.

    1. When flow queues are supported, the driver should create flow filter
    queues and use it.
    (Since cvq is not enabled for flow filters, any flow filter command
    coming on cvq must fail).

    2. If driver wants to use flow filters over cvq, driver must explicitly
    enable flow filters on cvq via a command, when it is enabled on the cvq
    driver cannot use flow filter queues.
    This eliminates any synchronization needed by the device among different
    types of queues.
    "

    Well the "likely create these queues when flow filters are enabled"
    described here is confusing.
    Because if ethtool RFF is used, we need to create a flow vq in the probe
    stage, right?

    There are several other reasons:
    1. The behavior of dynamically creating flow vq will break the current
    virtio spec.
       Please see the "Device Initialization" chapter. ctrlq, as a
    configuration queue
       similar to flow vq, is also created in the probe phase. So if we
    support the "dynamically creating",
       we need to update the spec.

    2. Flow vq is similar to transmit q, and does not need to fill
    descriptors in advance,
       so the consumption of resources is relatively small.

    3. Dynamic creation of virtqueue seems to be a new thread of virtio
    spec, and it should also be
       applicable to rxqs and txqs. We can temporarily support creating
    flow vq in the probe stage,
       and subsequent dynamic creation can be an extension.

    So, should we create the flow vqs at the initial stage of the driver probe?

    Thanks!


    >> +5. Flow filter operations are often accelerated by device in a hardware. Ability
    >> + to handle them on a queue other than control vq is desired. This achieves
    >> near
    >> + zero modifications to existing implementations to add new operations on
    >> new
    >> + purpose built queues (similar to transmit and receive queue).
    >> + Therefore, when flow filter queues are supported, it is strongly
    >> recommended
    >> + to use it, when flow filter queues are not supported, if the device support
    >> + it using cvq, driver should be able to use over cvq.
    > Rephase is like below.
    > 0. Flow filter queues and flow filter commands on cvq are mutually exclusive.
    >
    > 1. When flow queues are supported, driver should create flow filter queues and use it.
    > (Since cvq is not enabled for flow filters, any flow filter command coming on cvq must fail).
    >
    > 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow filters on cvq via a command, when it is enabled on the cvq driver cannot use flow filter queues.
    > This eliminates any synchronization needed by the device among different types of queues.
    >
    >
    >> +6. The filter masks are optional; the device should be able to expose if it
    >> + support filter masks.
    >> +7. The driver may want to have priority among group of flow entries; to
    >> facilitate
    >> + the device support grouping flow filter entries by a notion of a flow group.
    >> + Each flow group defines priority in processing flow.
    >> +8. The driver and group owner driver should be able to query supported
    >> device
    >> + limits for the receive flow filters.
    >> +
    >> +### 3.4.2 flow operations path
    >> +1. The driver should be able to define a receive packet match criteria, an
    >> + action and a destination for a packet. For example, an ipv4 packet with a
    >> + multicast address to be steered to the receive vq 0. The second example is
    >> + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    >> + be steered to receive vq 10.
    >> +2. The match criteria should include exact tuple fields well-defined such as
    >> mac
    >> + address, IP addresses, tcp/udp ports, etc.
    >> +3. The match criteria should also optionally include the field mask.
    >> +5. Action includes (a) dropping or (b) forwarding the packet.
    >> +6. Destination is a receive virtqueue index.
    >> +7. Receive packet processing chain is:
    >> + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX,
    >> + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN.
    >> + b. filters programmed using RFF functiionality.
    >> + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG
    >> command.
    >> + Whichever filtering and steering functionality is enabled, they are applied
    >> + in the above order.
    >> +9. If multiple entries are programmed which has overlapping filtering attributes
    >> + for a received packet, the driver to define the location/priority of the entry.
    >> +10. The filter entries are usually short in size of few tens of bytes,
    >> + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    >> + high, hence supplying fields inside the queue descriptor is preferred for
    >> + up to a certain fixed size, say 96 bytes.
    >> +11. A flow filter entry consists of (a) match criteria, (b) action,
    >> + (c) destination and (d) a unique 32 bit flow id, all supplied by the
    >> + driver.
    >> +12. The driver should be able to query and delete flow filter entry by the
    >> + the device by the flow id.
    >> +
    >> +### 3.4.3 interface example
    >> +
    >> +1. Flow filter capabilities to query using a DMA interface such as cvq
    >> +using two different commands.
    >> +
    >> +```
    >> +/* command 1 */
    >> +struct flow_filter_capabilities {
    >> + le16 start_vq_index;
    >> + le16 num_flow_filter_vqs;
    >> + le16 max_flow_groups;
    >> + le16 max_group_priorities; /* max priorities of the group */
    >> + le32 max_flow_filters_per_group;
    >> + le32 max_flow_filters; /* max flow_id in add/del
    >> + * is equal = max_flow_filters - 1.
    >> + */
    >> + u8 max_priorities_per_group;
    >> +};
    >> +
    >> +/* command 2 */
    >> +struct flow_filter_fields_support_mask {
    >> + le64 supported_packet_field_mask_bmap[1];
    >> +};
    > Explain this bitmap that it indicates well known packet field such as src mac, dest ip, etc.
    >
    > Also expose it on AQ command so that live migration flow/provision flow can decide which device to use.
    >
    >> +
    >> +```
    >> +
    >> +2. Group add/delete cvq commands:
    >> +```
    >> +
    >> +struct virtio_net_rff_group_add {
    >> + le16 priority;
    >> + le16 group_id;
    >> +};
    >> +
    >> +
    >> +struct virtio_net_rff_group_delete {
    >> + le16 group_id;
    >> +
    >> +```
    >> +
    >> +3. Flow filter entry add/modify, delete over flow vq:
    >> +
    >> +```
    >> +struct virtio_net_rff_add_modify {
    >> + u8 flow_op;
    >> + u8 padding;
    >> + u16 group_id;
    >> + le32 flow_id;
    >> + struct match_criteria mc;
    >> + struct destination dest;
    >> + struct action action;
    >> +
    >> + struct match_criteria mask; /* optional */
    >> +};
    >> +
    >> +struct virtio_net_rff_delete {
    >> + u8 flow_op;
    >> + u8 padding[3];
    >> + le32 flow_id;
    >> +};
    >> +
    >> +```
    >> +
    >> +4. Flow filter commands over cvq:
    >> +
    >> +```
    >> +
    >> +struct virtio_net_rff_cmd {
    >> + u8 class; /* RFF class */
    >> + u8 commands; /* RFF cmd = A */
    >> + u8 command-specific-data[]; /* contains struct
    >> virtio_net_rff_add_modify or
    >> + * struct virtio_net_rff_delete
    >> + */ };
    >> +
    >> +```
    >> +
    >> +### 3.4.4 For incremental future
    >> +a. Driver should be able to specify a specific packet byte offset, number
    >> + of bytes and mask as math criteria.
    >> +b. Support RSS context, in addition to a specific RQ.
    >> +c. If/when virtio switch object is implemented, support ingress/egress flow
    >> + filters at the switch port level.
    >> --
    >> 2.26.2




  • 19.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 07:39
    å 2023/8/16 äå2:27, Parav Pandit åé: Comments below from today's bi-weekly meeting to address in v5. Thanks Parav! From: Parav Pandit <parav@nvidia.com> Sent: Tuesday, August 15, 2023 1:16 PM Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v3->v4: - Addressed comments from Satananda, Heng, David - removed context specific wording, replaced with destination - added group create/delete examples and updated requirements - added optional support to use cvq for flor filter commands - added example of transporting flow filter commands over cvq - made group size to be 16-bit - added concept of 0->n max flow filter entries based on max count - added concept of 0->n max flow group based on max count - split field bitmask to separate command from other filter capabilities - rewrote rx filter processing chain order with respect to existing filter commands and rss - made flow_id flat across all groups v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 151 +++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -183,3 +184,153 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific destination to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. Regarding this description, I want to say that ARFS will be enabled at runtime. But ethtool RFF will be used at any time as long as the device is ready. Combining what was discussed in today's meeting, flow vqs and ctrlq are mutually exclusive, so if flow vqs are supported, then ethtool RFF can use flow vq: " 0. Flow filter queues and flow filter commands on cvq are mutually exclusive. 1. When flow queues are supported, the driver should create flow filter queues and use it. (Since cvq is not enabled for flow filters, any flow filter command coming on cvq must fail). 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow filters on cvq via a command, when it is enabled on the cvq driver cannot use flow filter queues. This eliminates any synchronization needed by the device among different types of queues. " Well the "likely create these queues when flow filters are enabled" described here is confusing. Because if ethtool RFF is used, we need to create a flow vq in the probe stage, right? There are several other reasons: 1. The behavior of dynamically creating flow vq will break the current virtio spec. ÂÂ Please see the "Device Initialization" chapter. ctrlq, as a configuration queue ÂÂ similar to flow vq, is also created in the probe phase. So if we support the "dynamically creating", ÂÂ we need to update the spec. 2. Flow vq is similar to transmit q, and does not need to fill descriptors in advance, ÂÂ so the consumption of resources is relatively small. 3. Dynamic creation of virtqueue seems to be a new thread of virtio spec, and it should also be ÂÂ applicable to rxqs and txqs. We can temporarily support creating flow vq in the probe stage, ÂÂ and subsequent dynamic creation can be an extension. So, should we create the flow vqs at the initial stage of the driver probe? Thanks! +5. Flow filter operations are often accelerated by device in a hardware. Ability + to handle them on a queue other than control vq is desired. This achieves near + zero modifications to existing implementations to add new operations on new + purpose built queues (similar to transmit and receive queue). + Therefore, when flow filter queues are supported, it is strongly recommended + to use it, when flow filter queues are not supported, if the device support + it using cvq, driver should be able to use over cvq. Rephase is like below. 0. Flow filter queues and flow filter commands on cvq are mutually exclusive. 1. When flow queues are supported, driver should create flow filter queues and use it. (Since cvq is not enabled for flow filters, any flow filter command coming on cvq must fail). 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow filters on cvq via a command, when it is enabled on the cvq driver cannot use flow filter queues. This eliminates any synchronization needed by the device among different types of queues. +6. The filter masks are optional; the device should be able to expose if it + support filter masks. +7. The driver may want to have priority among group of flow entries; to facilitate + the device support grouping flow filter entries by a notion of a flow group. + Each flow group defines priority in processing flow. +8. The driver and group owner driver should be able to query supported device + limits for the receive flow filters. + +### 3.4.2 flow operations path +1. The driver should be able to define a receive packet match criteria, an + action and a destination for a packet. For example, an ipv4 packet with a + multicast address to be steered to the receive vq 0. The second example is + ipv4, tcp packet matching a specified IP address and tcp port tuple to + be steered to receive vq 10. +2. The match criteria should include exact tuple fields well-defined such as mac + address, IP addresses, tcp/udp ports, etc. +3. The match criteria should also optionally include the field mask. +5. Action includes (a) dropping or (b) forwarding the packet. +6. Destination is a receive virtqueue index. +7. Receive packet processing chain is: + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. + b. filters programmed using RFF functiionality. + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command. + Whichever filtering and steering functionality is enabled, they are applied + in the above order. +9. If multiple entries are programmed which has overlapping filtering attributes + for a received packet, the driver to define the location/priority of the entry. +10. The filter entries are usually short in size of few tens of bytes, + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is + high, hence supplying fields inside the queue descriptor is preferred for + up to a certain fixed size, say 96 bytes. +11. A flow filter entry consists of (a) match criteria, (b) action, + (c) destination and (d) a unique 32 bit flow id, all supplied by the + driver. +12. The driver should be able to query and delete flow filter entry by the + the device by the flow id. + +### 3.4.3 interface example + +1. Flow filter capabilities to query using a DMA interface such as cvq +using two different commands. + +``` +/* command 1 */ +struct flow_filter_capabilities { + le16 start_vq_index; + le16 num_flow_filter_vqs; + le16 max_flow_groups; + le16 max_group_priorities; /* max priorities of the group */ + le32 max_flow_filters_per_group; + le32 max_flow_filters; /* max flow_id in add/del + * is equal = max_flow_filters - 1. + */ + u8 max_priorities_per_group; +}; + +/* command 2 */ +struct flow_filter_fields_support_mask { + le64 supported_packet_field_mask_bmap[1]; +}; Explain this bitmap that it indicates well known packet field such as src mac, dest ip, etc. Also expose it on AQ command so that live migration flow/provision flow can decide which device to use. + +``` + +2. Group add/delete cvq commands: +``` + +struct virtio_net_rff_group_add { + le16 priority; + le16 group_id; +}; + + +struct virtio_net_rff_group_delete { + le16 group_id; + +``` + +3. Flow filter entry add/modify, delete over flow vq: + +``` +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 padding; + u16 group_id; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +struct virtio_net_rff_delete { + u8 flow_op; + u8 padding[3]; + le32 flow_id; +}; + +``` + +4. Flow filter commands over cvq: + +``` + +struct virtio_net_rff_cmd { + u8 class; /* RFF class */ + u8 commands; /* RFF cmd = A */ + u8 command-specific-data[]; /* contains struct virtio_net_rff_add_modify or + * struct virtio_net_rff_delete + */ }; + +``` + +### 3.4.4 For incremental future +a. Driver should be able to specify a specific packet byte offset, number + of bytes and mask as math criteria. +b. Support RSS context, in addition to a specific RQ. +c. If/when virtio switch object is implemented, support ingress/egress flow + filters at the switch port level. -- 2.26.2


  • 20.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 10:31


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Wednesday, August 16, 2023 1:08 PM

    > >> From: Parav Pandit <parav@nvidia.com>
    > >> Sent: Tuesday, August 15, 2023 1:16 PM
    > >>
    > >> Add virtio net device requirements for receive flow filters.
    > >>
    > >> Signed-off-by: Parav Pandit <parav@nvidia.com>
    > >> ---
    > >> changelog:
    > >> v3->v4:
    > >> - Addressed comments from Satananda, Heng, David
    > >> - removed context specific wording, replaced with destination
    > >> - added group create/delete examples and updated requirements
    > >> - added optional support to use cvq for flor filter commands
    > >> - added example of transporting flow filter commands over cvq
    > >> - made group size to be 16-bit
    > >> - added concept of 0->n max flow filter entries based on max count
    > >> - added concept of 0->n max flow group based on max count
    > >> - split field bitmask to separate command from other filter
    > >> capabilities
    > >> - rewrote rx filter processing chain order with respect to existing
    > >> filter commands and rss
    > >> - made flow_id flat across all groups
    > >> v1->v2:
    > >> - split setup and operations requirements
    > >> - added design goal
    > >> - worded requirements more precisely
    > >> v0->v1:
    > >> - fixed comments from Heng Li
    > >> - renamed receive flow steering to receive flow filters
    > >> - clarified byte offset in match criteria
    > >> ---
    > >> net-workstream/features-1.4.md | 151
    > >> +++++++++++++++++++++++++++++++++
    > >> 1 file changed, 151 insertions(+)
    > >>
    > >> diff --git a/net-workstream/features-1.4.md
    > >> b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644
    > >> --- a/net-workstream/features-1.4.md
    > >> +++ b/net-workstream/features-1.4.md
    > >> @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    > >> 1. Device counters visible to the driver 2. Low latency tx and rx
    > >> virtqueues for PCI transport 3. Virtqueue notification coalescing
    > >> re-arming support
    > >> +4 Virtqueue receive flow filters (RFF)
    > >>
    > >> # 3. Requirements
    > >> ## 3.1 Device counters
    > >> @@ -183,3 +184,153 @@ struct vnet_rx_completion {
    > >> notifications until the driver rearms the notifications of the virtqueue.
    > >> 2. When the driver rearms the notification of the virtqueue, the device
    > >> to notify again if notification coalescing conditions are met.
    > >> +
    > >> +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal:
    > >> + To filter and/or to steer packet based on specific pattern match to a
    > >> + specific destination to support application/networking stack driven
    > receive
    > >> + processing.
    > >> +1. Two use cases are: to support Linux netdev set_rxnfc() for
    > >> ETHTOOL_SRXCLSRLINS
    > >> + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    > >> +
    > >> +### 3.4.1 control path
    > >> +1. The number of flow filter operations/sec can range from 100k/sec
    > >> +to
    > >> 1M/sec
    > >> + or even more. Hence flow filter operations must be done over a queueing
    > >> + interface using one or more queues.
    > >> +2. The device should be able to expose one or more supported flow
    > >> +filter
    > >> queue
    > >> + count and its start vq index to the driver.
    > >> +3. As each device may be operating for different performance
    > characteristic,
    > >> + start vq index and count may be different for each device. Secondly, it is
    > >> + inefficient for device to provide flow filters capabilities via a config space
    > >> + region. Hence, the device should be able to share these attributes using
    > >> + dma interface, instead of transport registers.
    > >> +4. Since flow filters are enabled much later in the driver life cycle, driver
    > >> + will likely create these queues when flow filters are enabled.
    >
    > Regarding this description, I want to say that ARFS will be enabled at runtime.
    > But ethtool RFF will be used at any time as long as the device is ready.
    >
    Yes, but ethool RFS is blocking callback in which slow task such as q creation can be done, only when one wants to add flows.
    ARFS is anyway controlled using set_features() callback.

    > Combining what was discussed in today's meeting, flow vqs and ctrlq are
    > mutually exclusive, so if flow vqs are supported, then ethtool RFF can use flow
    > vq:
    >
    > "
    > 0. Flow filter queues and flow filter commands on cvq are mutually exclusive.
    >
    > 1. When flow queues are supported, the driver should create flow filter queues
    > and use it.
    > (Since cvq is not enabled for flow filters, any flow filter command coming on cvq
    > must fail).
    >
    > 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow
    > filters on cvq via a command, when it is enabled on the cvq driver cannot use
    > flow filter queues.
    > This eliminates any synchronization needed by the device among different types
    > of queues.
    > "
    >
    Ack.

    > Well the "likely create these queues when flow filters are enabled"
    > described here is confusing.
    > Because if ethtool RFF is used, we need to create a flow vq in the probe stage,
    > right?
    >
    Current spec wording limits one to create queues before DRIVER_OK.
    But with introduction of _RESET bit one can create an empty queue and disable it (reset it! What a grand name).

    And re-enable it during ethtool callbacks.
    This would be workaround to dynamically create the queue.

    > There are several other reasons:
    > 1. The behavior of dynamically creating flow vq will break the current virtio
    > spec.
    >    Please see the "Device Initialization" chapter. ctrlq, as a configuration queue
    >    similar to flow vq, is also created in the probe phase. So if we support the
    > "dynamically creating",
    >    we need to update the spec.
    >
    > 2. Flow vq is similar to transmit q, and does not need to fill descriptors in
    > advance,
    >    so the consumption of resources is relatively small.
    >
    Only the queue descriptors memory is consumed, which is not a lot.
    But concept of creating resource without consuming is just bad.
    We learnt the lesson from mlx5 driver that dynamic creation is efficient.
    Many part of Linux kernel also moving in this direction, all the way upto dynamically individual msix vector.

    So we should strive to enable them dynamically and improve the virtio spec.

    It should an orthogonal feature, sadly how the RING_RESET feature is done. :(

    > 3. Dynamic creation of virtqueue seems to be a new thread of virtio spec, and it
    > should also be
    >    applicable to rxqs and txqs. We can temporarily support creating flow vq in
    > the probe stage,
    >    and subsequent dynamic creation can be an extension.
    >
    > So, should we create the flow vqs at the initial stage of the driver probe?
    One option is to follow the above workaround.
    Second option is to add feature bit to feature bit to indicate dynamic Q creation.



  • 21.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 10:31
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Wednesday, August 16, 2023 1:08 PM > >> From: Parav Pandit <parav@nvidia.com> > >> Sent: Tuesday, August 15, 2023 1:16 PM > >> > >> Add virtio net device requirements for receive flow filters. > >> > >> Signed-off-by: Parav Pandit <parav@nvidia.com> > >> --- > >> changelog: > >> v3->v4: > >> - Addressed comments from Satananda, Heng, David > >> - removed context specific wording, replaced with destination > >> - added group create/delete examples and updated requirements > >> - added optional support to use cvq for flor filter commands > >> - added example of transporting flow filter commands over cvq > >> - made group size to be 16-bit > >> - added concept of 0->n max flow filter entries based on max count > >> - added concept of 0->n max flow group based on max count > >> - split field bitmask to separate command from other filter > >> capabilities > >> - rewrote rx filter processing chain order with respect to existing > >> filter commands and rss > >> - made flow_id flat across all groups > >> v1->v2: > >> - split setup and operations requirements > >> - added design goal > >> - worded requirements more precisely > >> v0->v1: > >> - fixed comments from Heng Li > >> - renamed receive flow steering to receive flow filters > >> - clarified byte offset in match criteria > >> --- > >> net-workstream/features-1.4.md 151 > >> +++++++++++++++++++++++++++++++++ > >> 1 file changed, 151 insertions(+) > >> > >> diff --git a/net-workstream/features-1.4.md > >> b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644 > >> --- a/net-workstream/features-1.4.md > >> +++ b/net-workstream/features-1.4.md > >> @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. > >> 1. Device counters visible to the driver 2. Low latency tx and rx > >> virtqueues for PCI transport 3. Virtqueue notification coalescing > >> re-arming support > >> +4 Virtqueue receive flow filters (RFF) > >> > >> # 3. Requirements > >> ## 3.1 Device counters > >> @@ -183,3 +184,153 @@ struct vnet_rx_completion { > >> notifications until the driver rearms the notifications of the virtqueue. > >> 2. When the driver rearms the notification of the virtqueue, the device > >> to notify again if notification coalescing conditions are met. > >> + > >> +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal: > >> + To filter and/or to steer packet based on specific pattern match to a > >> + specific destination to support application/networking stack driven > receive > >> + processing. > >> +1. Two use cases are: to support Linux netdev set_rxnfc() for > >> ETHTOOL_SRXCLSRLINS > >> + and to support netdev feature NETIF_F_NTUPLE aka ARFS. > >> + > >> +### 3.4.1 control path > >> +1. The number of flow filter operations/sec can range from 100k/sec > >> +to > >> 1M/sec > >> + or even more. Hence flow filter operations must be done over a queueing > >> + interface using one or more queues. > >> +2. The device should be able to expose one or more supported flow > >> +filter > >> queue > >> + count and its start vq index to the driver. > >> +3. As each device may be operating for different performance > characteristic, > >> + start vq index and count may be different for each device. Secondly, it is > >> + inefficient for device to provide flow filters capabilities via a config space > >> + region. Hence, the device should be able to share these attributes using > >> + dma interface, instead of transport registers. > >> +4. Since flow filters are enabled much later in the driver life cycle, driver > >> + will likely create these queues when flow filters are enabled. > > Regarding this description, I want to say that ARFS will be enabled at runtime. > But ethtool RFF will be used at any time as long as the device is ready. > Yes, but ethool RFS is blocking callback in which slow task such as q creation can be done, only when one wants to add flows. ARFS is anyway controlled using set_features() callback. > Combining what was discussed in today's meeting, flow vqs and ctrlq are > mutually exclusive, so if flow vqs are supported, then ethtool RFF can use flow > vq: > > " > 0. Flow filter queues and flow filter commands on cvq are mutually exclusive. > > 1. When flow queues are supported, the driver should create flow filter queues > and use it. > (Since cvq is not enabled for flow filters, any flow filter command coming on cvq > must fail). > > 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow > filters on cvq via a command, when it is enabled on the cvq driver cannot use > flow filter queues. > This eliminates any synchronization needed by the device among different types > of queues. > " > Ack. > Well the "likely create these queues when flow filters are enabled" > described here is confusing. > Because if ethtool RFF is used, we need to create a flow vq in the probe stage, > right? > Current spec wording limits one to create queues before DRIVER_OK. But with introduction of _RESET bit one can create an empty queue and disable it (reset it! What a grand name). And re-enable it during ethtool callbacks. This would be workaround to dynamically create the queue. > There are several other reasons: > 1. The behavior of dynamically creating flow vq will break the current virtio > spec. > ÂÂ Please see the "Device Initialization" chapter. ctrlq, as a configuration queue > ÂÂ similar to flow vq, is also created in the probe phase. So if we support the > "dynamically creating", > ÂÂ we need to update the spec. > > 2. Flow vq is similar to transmit q, and does not need to fill descriptors in > advance, > ÂÂ so the consumption of resources is relatively small. > Only the queue descriptors memory is consumed, which is not a lot. But concept of creating resource without consuming is just bad. We learnt the lesson from mlx5 driver that dynamic creation is efficient. Many part of Linux kernel also moving in this direction, all the way upto dynamically individual msix vector. So we should strive to enable them dynamically and improve the virtio spec. It should an orthogonal feature, sadly how the RING_RESET feature is done. :( > 3. Dynamic creation of virtqueue seems to be a new thread of virtio spec, and it > should also be > ÂÂ applicable to rxqs and txqs. We can temporarily support creating flow vq in > the probe stage, > ÂÂ and subsequent dynamic creation can be an extension. > > So, should we create the flow vqs at the initial stage of the driver probe? One option is to follow the above workaround. Second option is to add feature bit to feature bit to indicate dynamic Q creation.


  • 22.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 11:11


    ? 2023/8/16 ??6:31, Parav Pandit ??:
    >
    >> From: Heng Qi <hengqi@linux.alibaba.com>
    >> Sent: Wednesday, August 16, 2023 1:08 PM
    >>>> From: Parav Pandit <parav@nvidia.com>
    >>>> Sent: Tuesday, August 15, 2023 1:16 PM
    >>>>
    >>>> Add virtio net device requirements for receive flow filters.
    >>>>
    >>>> Signed-off-by: Parav Pandit <parav@nvidia.com>
    >>>> ---
    >>>> changelog:
    >>>> v3->v4:
    >>>> - Addressed comments from Satananda, Heng, David
    >>>> - removed context specific wording, replaced with destination
    >>>> - added group create/delete examples and updated requirements
    >>>> - added optional support to use cvq for flor filter commands
    >>>> - added example of transporting flow filter commands over cvq
    >>>> - made group size to be 16-bit
    >>>> - added concept of 0->n max flow filter entries based on max count
    >>>> - added concept of 0->n max flow group based on max count
    >>>> - split field bitmask to separate command from other filter
    >>>> capabilities
    >>>> - rewrote rx filter processing chain order with respect to existing
    >>>> filter commands and rss
    >>>> - made flow_id flat across all groups
    >>>> v1->v2:
    >>>> - split setup and operations requirements
    >>>> - added design goal
    >>>> - worded requirements more precisely
    >>>> v0->v1:
    >>>> - fixed comments from Heng Li
    >>>> - renamed receive flow steering to receive flow filters
    >>>> - clarified byte offset in match criteria
    >>>> ---
    >>>> net-workstream/features-1.4.md | 151
    >>>> +++++++++++++++++++++++++++++++++
    >>>> 1 file changed, 151 insertions(+)
    >>>>
    >>>> diff --git a/net-workstream/features-1.4.md
    >>>> b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644
    >>>> --- a/net-workstream/features-1.4.md
    >>>> +++ b/net-workstream/features-1.4.md
    >>>> @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    >>>> 1. Device counters visible to the driver 2. Low latency tx and rx
    >>>> virtqueues for PCI transport 3. Virtqueue notification coalescing
    >>>> re-arming support
    >>>> +4 Virtqueue receive flow filters (RFF)
    >>>>
    >>>> # 3. Requirements
    >>>> ## 3.1 Device counters
    >>>> @@ -183,3 +184,153 @@ struct vnet_rx_completion {
    >>>> notifications until the driver rearms the notifications of the virtqueue.
    >>>> 2. When the driver rearms the notification of the virtqueue, the device
    >>>> to notify again if notification coalescing conditions are met.
    >>>> +
    >>>> +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal:
    >>>> + To filter and/or to steer packet based on specific pattern match to a
    >>>> + specific destination to support application/networking stack driven
    >> receive
    >>>> + processing.
    >>>> +1. Two use cases are: to support Linux netdev set_rxnfc() for
    >>>> ETHTOOL_SRXCLSRLINS
    >>>> + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    >>>> +
    >>>> +### 3.4.1 control path
    >>>> +1. The number of flow filter operations/sec can range from 100k/sec
    >>>> +to
    >>>> 1M/sec
    >>>> + or even more. Hence flow filter operations must be done over a queueing
    >>>> + interface using one or more queues.
    >>>> +2. The device should be able to expose one or more supported flow
    >>>> +filter
    >>>> queue
    >>>> + count and its start vq index to the driver.
    >>>> +3. As each device may be operating for different performance
    >> characteristic,
    >>>> + start vq index and count may be different for each device. Secondly, it is
    >>>> + inefficient for device to provide flow filters capabilities via a config space
    >>>> + region. Hence, the device should be able to share these attributes using
    >>>> + dma interface, instead of transport registers.
    >>>> +4. Since flow filters are enabled much later in the driver life cycle, driver
    >>>> + will likely create these queues when flow filters are enabled.
    >> Regarding this description, I want to say that ARFS will be enabled at runtime.
    >> But ethtool RFF will be used at any time as long as the device is ready.
    >>
    > Yes, but ethool RFS is blocking callback in which slow task such as q creation can be done, only when one wants to add flows.
    > ARFS is anyway controlled using set_features() callback.
    >
    >> Combining what was discussed in today's meeting, flow vqs and ctrlq are
    >> mutually exclusive, so if flow vqs are supported, then ethtool RFF can use flow
    >> vq:
    >>
    >> "
    >> 0. Flow filter queues and flow filter commands on cvq are mutually exclusive.
    >>
    >> 1. When flow queues are supported, the driver should create flow filter queues
    >> and use it.
    >> (Since cvq is not enabled for flow filters, any flow filter command coming on cvq
    >> must fail).
    >>
    >> 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow
    >> filters on cvq via a command, when it is enabled on the cvq driver cannot use
    >> flow filter queues.
    >> This eliminates any synchronization needed by the device among different types
    >> of queues.
    >> "
    >>
    > Ack.
    >
    >> Well the "likely create these queues when flow filters are enabled"
    >> described here is confusing.
    >> Because if ethtool RFF is used, we need to create a flow vq in the probe stage,
    >> right?
    >>
    > Current spec wording limits one to create queues before DRIVER_OK.
    > But with introduction of _RESET bit one can create an empty queue and disable it (reset it! What a grand name).
    >
    > And re-enable it during ethtool callbacks.
    > This would be workaround to dynamically create the queue.

    Yes, this is workaround, we can just set the number of flow vqs for the
    device, but not allocate resources nor enable.
    But this is not exhaustive, because xdp may also require dynamic q
    creation/destruction.

    >
    >> There are several other reasons:
    >> 1. The behavior of dynamically creating flow vq will break the current virtio
    >> spec.
    >>    Please see the "Device Initialization" chapter. ctrlq, as a configuration queue
    >>    similar to flow vq, is also created in the probe phase. So if we support the
    >> "dynamically creating",
    >>    we need to update the spec.
    >>
    >> 2. Flow vq is similar to transmit q, and does not need to fill descriptors in
    >> advance,
    >>    so the consumption of resources is relatively small.
    >>
    > Only the queue descriptors memory is consumed, which is not a lot.
    > But concept of creating resource without consuming is just bad.
    > We learnt the lesson from mlx5 driver that dynamic creation is efficient.
    > Many part of Linux kernel also moving in this direction, all the way upto dynamically individual msix vector.

    Ok. I got it.

    >
    > So we should strive to enable them dynamically and improve the virtio spec.
    >
    > It should an orthogonal feature, sadly how the RING_RESET feature is done. :(

    RING_RESET is performed without changing the number of queues. But what
    you said above is a workaround.

    >
    >> 3. Dynamic creation of virtqueue seems to be a new thread of virtio spec, and it
    >> should also be
    >>    applicable to rxqs and txqs. We can temporarily support creating flow vq in
    >> the probe stage,
    >>    and subsequent dynamic creation can be an extension.
    >>
    >> So, should we create the flow vqs at the initial stage of the driver probe?
    > One option is to follow the above workaround.
    > Second option is to add feature bit to feature bit to indicate dynamic Q creation.

    I'm leaning towards the second option, which makes the work orthogonal
    and also works in the case of XDP.

    Thanks!





  • 23.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 11:11
    å 2023/8/16 äå6:31, Parav Pandit åé: From: Heng Qi <hengqi@linux.alibaba.com> Sent: Wednesday, August 16, 2023 1:08 PM From: Parav Pandit <parav@nvidia.com> Sent: Tuesday, August 15, 2023 1:16 PM Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v3->v4: - Addressed comments from Satananda, Heng, David - removed context specific wording, replaced with destination - added group create/delete examples and updated requirements - added optional support to use cvq for flor filter commands - added example of transporting flow filter commands over cvq - made group size to be 16-bit - added concept of 0->n max flow filter entries based on max count - added concept of 0->n max flow group based on max count - split field bitmask to separate command from other filter capabilities - rewrote rx filter processing chain order with respect to existing filter commands and rss - made flow_id flat across all groups v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 151 +++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -183,3 +184,153 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) 0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific destination to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec +to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow +filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. Regarding this description, I want to say that ARFS will be enabled at runtime. But ethtool RFF will be used at any time as long as the device is ready. Yes, but ethool RFS is blocking callback in which slow task such as q creation can be done, only when one wants to add flows. ARFS is anyway controlled using set_features() callback. Combining what was discussed in today's meeting, flow vqs and ctrlq are mutually exclusive, so if flow vqs are supported, then ethtool RFF can use flow vq: " 0. Flow filter queues and flow filter commands on cvq are mutually exclusive. 1. When flow queues are supported, the driver should create flow filter queues and use it. (Since cvq is not enabled for flow filters, any flow filter command coming on cvq must fail). 2. If driver wants to use flow filters over cvq, driver must explicitly enable flow filters on cvq via a command, when it is enabled on the cvq driver cannot use flow filter queues. This eliminates any synchronization needed by the device among different types of queues. " Ack. Well the "likely create these queues when flow filters are enabled" described here is confusing. Because if ethtool RFF is used, we need to create a flow vq in the probe stage, right? Current spec wording limits one to create queues before DRIVER_OK. But with introduction of _RESET bit one can create an empty queue and disable it (reset it! What a grand name). And re-enable it during ethtool callbacks. This would be workaround to dynamically create the queue. Yes, this is workaround, we can just set the number of flow vqs for the device, but not allocate resources nor enable. But this is not exhaustive, because xdp may also require dynamic q creation/destruction. There are several other reasons: 1. The behavior of dynamically creating flow vq will break the current virtio spec. ÂÂ Please see the "Device Initialization" chapter. ctrlq, as a configuration queue ÂÂ similar to flow vq, is also created in the probe phase. So if we support the "dynamically creating", ÂÂ we need to update the spec. 2. Flow vq is similar to transmit q, and does not need to fill descriptors in advance, ÂÂ so the consumption of resources is relatively small. Only the queue descriptors memory is consumed, which is not a lot. But concept of creating resource without consuming is just bad. We learnt the lesson from mlx5 driver that dynamic creation is efficient. Many part of Linux kernel also moving in this direction, all the way upto dynamically individual msix vector. Ok. I got it. So we should strive to enable them dynamically and improve the virtio spec. It should an orthogonal feature, sadly how the RING_RESET feature is done. :( RING_RESET is performed without changing the number of queues. But what you said above is a workaround. 3. Dynamic creation of virtqueue seems to be a new thread of virtio spec, and it should also be ÂÂ applicable to rxqs and txqs. We can temporarily support creating flow vq in the probe stage, ÂÂ and subsequent dynamic creation can be an extension. So, should we create the flow vqs at the initial stage of the driver probe? One option is to follow the above workaround. Second option is to add feature bit to feature bit to indicate dynamic Q creation. I'm leaning towards the second option, which makes the work orthogonal and also works in the case of XDP. Thanks!


  • 24.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 11:19

    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Wednesday, August 16, 2023 4:41 PM


    > > One option is to follow the above workaround.
    > > Second option is to add feature bit to feature bit to indicate dynamic Q
    > creation.
    >
    > I'm leaning towards the second option, which makes the work orthogonal and
    > also works in the case of XDP.

    Yes. lets add the bit. I will send patch as part of this work.



  • 25.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 11:19
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Wednesday, August 16, 2023 4:41 PM > > One option is to follow the above workaround. > > Second option is to add feature bit to feature bit to indicate dynamic Q > creation. > > I'm leaning towards the second option, which makes the work orthogonal and > also works in the case of XDP. Yes. lets add the bit. I will send patch as part of this work.


  • 26.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 11:42
    Hi, Parav.

    There are some minor updates!

    ? 2023/8/15 ??3:45, Parav Pandit ??:
    > Add virtio net device requirements for receive flow filters.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v3->v4:
    > - Addressed comments from Satananda, Heng, David
    > - removed context specific wording, replaced with destination
    > - added group create/delete examples and updated requirements
    > - added optional support to use cvq for flor filter commands
    > - added example of transporting flow filter commands over cvq
    > - made group size to be 16-bit
    > - added concept of 0->n max flow filter entries based on max count
    > - added concept of 0->n max flow group based on max count
    > - split field bitmask to separate command from other filter capabilities
    > - rewrote rx filter processing chain order with respect to existing
    > filter commands and rss
    > - made flow_id flat across all groups
    > v1->v2:
    > - split setup and operations requirements
    > - added design goal
    > - worded requirements more precisely
    > v0->v1:
    > - fixed comments from Heng Li
    > - renamed receive flow steering to receive flow filters
    > - clarified byte offset in match criteria
    > ---
    > net-workstream/features-1.4.md | 151 +++++++++++++++++++++++++++++++++
    > 1 file changed, 151 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index cb72442..78bb3d2 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface.
    > 1. Device counters visible to the driver
    > 2. Low latency tx and rx virtqueues for PCI transport
    > 3. Virtqueue notification coalescing re-arming support
    > +4 Virtqueue receive flow filters (RFF)
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -183,3 +184,153 @@ struct vnet_rx_completion {
    > notifications until the driver rearms the notifications of the virtqueue.
    > 2. When the driver rearms the notification of the virtqueue, the device
    > to notify again if notification coalescing conditions are met.
    > +
    > +## 3.4 Virtqueue receive flow filters (RFF)
    > +0. Design goal:
    > + To filter and/or to steer packet based on specific pattern match to a
    > + specific destination to support application/networking stack driven receive
    > + processing.
    > +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS
    > + and to support netdev feature NETIF_F_NTUPLE aka ARFS.
    > +
    > +### 3.4.1 control path
    > +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec
    > + or even more. Hence flow filter operations must be done over a queueing
    > + interface using one or more queues.
    > +2. The device should be able to expose one or more supported flow filter queue
    > + count and its start vq index to the driver.
    > +3. As each device may be operating for different performance characteristic,
    > + start vq index and count may be different for each device. Secondly, it is
    > + inefficient for device to provide flow filters capabilities via a config space
    > + region. Hence, the device should be able to share these attributes using
    > + dma interface, instead of transport registers.
    > +4. Since flow filters are enabled much later in the driver life cycle, driver
    > + will likely create these queues when flow filters are enabled.
    > +5. Flow filter operations are often accelerated by device in a hardware. Ability
    > + to handle them on a queue other than control vq is desired. This achieves near
    > + zero modifications to existing implementations to add new operations on new
    > + purpose built queues (similar to transmit and receive queue).
    > + Therefore, when flow filter queues are supported, it is strongly recommended
    > + to use it, when flow filter queues are not supported, if the device support
    > + it using cvq, driver should be able to use over cvq.
    > +6. The filter masks are optional; the device should be able to expose if it
    > + support filter masks.
    > +7. The driver may want to have priority among group of flow entries; to facilitate
    > + the device support grouping flow filter entries by a notion of a flow group.
    > + Each flow group defines priority in processing flow.
    > +8. The driver and group owner driver should be able to query supported device
    > + limits for the receive flow filters.
    > +
    > +### 3.4.2 flow operations path
    > +1. The driver should be able to define a receive packet match criteria, an
    > + action and a destination for a packet. For example, an ipv4 packet with a
    > + multicast address to be steered to the receive vq 0. The second example is
    > + ipv4, tcp packet matching a specified IP address and tcp port tuple to
    > + be steered to receive vq 10.
    > +2. The match criteria should include exact tuple fields well-defined such as mac
    > + address, IP addresses, tcp/udp ports, etc.
    > +3. The match criteria should also optionally include the field mask.
    > +5. Action includes (a) dropping or (b) forwarding the packet.
    > +6. Destination is a receive virtqueue index.
    > +7. Receive packet processing chain is:
    > + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX,
    > + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN.
    > + b. filters programmed using RFF functiionality.
    > + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command.
    > + Whichever filtering and steering functionality is enabled, they are applied
    > + in the above order.
    > +9. If multiple entries are programmed which has overlapping filtering attributes
    > + for a received packet, the driver to define the location/priority of the entry.
    > +10. The filter entries are usually short in size of few tens of bytes,
    > + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is
    > + high, hence supplying fields inside the queue descriptor is preferred for
    > + up to a certain fixed size, say 96 bytes.
    > +11. A flow filter entry consists of (a) match criteria, (b) action,
    > + (c) destination and (d) a unique 32 bit flow id, all supplied by the
    > + driver.
    > +12. The driver should be able to query and delete flow filter entry by the
    > + the device by the flow id.
    > +
    > +### 3.4.3 interface example
    > +
    > +1. Flow filter capabilities to query using a DMA interface such as cvq
    > +using two different commands.
    > +
    > +```
    > +/* command 1 */
    > +struct flow_filter_capabilities {
    > + le16 start_vq_index;
    > + le16 num_flow_filter_vqs;
    > + le16 max_flow_groups;
    > + le16 max_group_priorities; /* max priorities of the group */
    > + le32 max_flow_filters_per_group;
    > + le32 max_flow_filters; /* max flow_id in add/del
    > + * is equal = max_flow_filters - 1.
    > + */
    > + u8 max_priorities_per_group;

    + u8 padding[3];

    > +};
    > +
    > +/* command 2 */
    > +struct flow_filter_fields_support_mask {
    > + le64 supported_packet_field_mask_bmap[1];
    > +};
    > +
    > +```
    > +
    > +2. Group add/delete cvq commands:
    > +```
    > +
    > +struct virtio_net_rff_group_add {
    > + le16 priority;

    Please explicitly explain the relationship between the number and the
    priority, for example, the smaller the number, the higher the priority :)

    > + le16 group_id;
    > +};
    > +
    > +
    > +struct virtio_net_rff_group_delete {
    > + le16 group_id;
    > +
    > +```
    > +
    > +3. Flow filter entry add/modify, delete over flow vq:
    > +
    > +```
    > +struct virtio_net_rff_add_modify {
    > + u8 flow_op;
    > + u8 padding;

    s/padding/priority

    Each rule needs a priority.

    > + u16 group_id;
    > + le32 flow_id;
    > + struct match_criteria mc;
    > + struct destination dest;
    > + struct action action;
    > +
    > + struct match_criteria mask; /* optional */
    > +};
    > +
    > +struct virtio_net_rff_delete {
    > + u8 flow_op;
    > + u8 padding[3];
    > + le32 flow_id;
    > +};
    > +
    > +```
    > +
    > +4. Flow filter commands over cvq:
    > +
    > +```
    > +
    > +struct virtio_net_rff_cmd {
    > + u8 class; /* RFF class */
    > + u8 commands; /* RFF cmd = A */
    > + u8 command-specific-data[]; /* contains struct virtio_net_rff_add_modify or
    > + * struct virtio_net_rff_delete

    For flow vq, we no longer distinguish operations by command, but by flow_op.
    But for ctrlq, this field will be carried. We should make it clear that
    when ctrlq is delivered based on cmd, the flow_op field is ignored.

    Thanks!

    > + */
    > +};
    > +
    > +```
    > +
    > +### 3.4.4 For incremental future
    > +a. Driver should be able to specify a specific packet byte offset, number
    > + of bytes and mask as math criteria.
    > +b. Support RSS context, in addition to a specific RQ.
    > +c. If/when virtio switch object is implemented, support ingress/egress flow
    > + filters at the switch port level.




  • 27.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-16-2023 11:43
    Hi, Parav. There are some minor updates! å 2023/8/15 äå3:45, Parav Pandit åé: Add virtio net device requirements for receive flow filters. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v3->v4: - Addressed comments from Satananda, Heng, David - removed context specific wording, replaced with destination - added group create/delete examples and updated requirements - added optional support to use cvq for flor filter commands - added example of transporting flow filter commands over cvq - made group size to be 16-bit - added concept of 0->n max flow filter entries based on max count - added concept of 0->n max flow group based on max count - split field bitmask to separate command from other filter capabilities - rewrote rx filter processing chain order with respect to existing filter commands and rss - made flow_id flat across all groups v1->v2: - split setup and operations requirements - added design goal - worded requirements more precisely v0->v1: - fixed comments from Heng Li - renamed receive flow steering to receive flow filters - clarified byte offset in match criteria --- net-workstream/features-1.4.md 151 +++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index cb72442..78bb3d2 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -9,6 +9,7 @@ together is desired while updating the virtio net interface. 1. Device counters visible to the driver 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support +4 Virtqueue receive flow filters (RFF) # 3. Requirements ## 3.1 Device counters @@ -183,3 +184,153 @@ struct vnet_rx_completion { notifications until the driver rearms the notifications of the virtqueue. 2. When the driver rearms the notification of the virtqueue, the device to notify again if notification coalescing conditions are met. + +## 3.4 Virtqueue receive flow filters (RFF) +0. Design goal: + To filter and/or to steer packet based on specific pattern match to a + specific destination to support application/networking stack driven receive + processing. +1. Two use cases are: to support Linux netdev set_rxnfc() for ETHTOOL_SRXCLSRLINS + and to support netdev feature NETIF_F_NTUPLE aka ARFS. + +### 3.4.1 control path +1. The number of flow filter operations/sec can range from 100k/sec to 1M/sec + or even more. Hence flow filter operations must be done over a queueing + interface using one or more queues. +2. The device should be able to expose one or more supported flow filter queue + count and its start vq index to the driver. +3. As each device may be operating for different performance characteristic, + start vq index and count may be different for each device. Secondly, it is + inefficient for device to provide flow filters capabilities via a config space + region. Hence, the device should be able to share these attributes using + dma interface, instead of transport registers. +4. Since flow filters are enabled much later in the driver life cycle, driver + will likely create these queues when flow filters are enabled. +5. Flow filter operations are often accelerated by device in a hardware. Ability + to handle them on a queue other than control vq is desired. This achieves near + zero modifications to existing implementations to add new operations on new + purpose built queues (similar to transmit and receive queue). + Therefore, when flow filter queues are supported, it is strongly recommended + to use it, when flow filter queues are not supported, if the device support + it using cvq, driver should be able to use over cvq. +6. The filter masks are optional; the device should be able to expose if it + support filter masks. +7. The driver may want to have priority among group of flow entries; to facilitate + the device support grouping flow filter entries by a notion of a flow group. + Each flow group defines priority in processing flow. +8. The driver and group owner driver should be able to query supported device + limits for the receive flow filters. + +### 3.4.2 flow operations path +1. The driver should be able to define a receive packet match criteria, an + action and a destination for a packet. For example, an ipv4 packet with a + multicast address to be steered to the receive vq 0. The second example is + ipv4, tcp packet matching a specified IP address and tcp port tuple to + be steered to receive vq 10. +2. The match criteria should include exact tuple fields well-defined such as mac + address, IP addresses, tcp/udp ports, etc. +3. The match criteria should also optionally include the field mask. +5. Action includes (a) dropping or (b) forwarding the packet. +6. Destination is a receive virtqueue index. +7. Receive packet processing chain is: + a. filters programmed using cvq commands VIRTIO_NET_CTRL_RX, + VIRTIO_NET_CTRL_MAC and VIRTIO_NET_CTRL_VLAN. + b. filters programmed using RFF functiionality. + c. filters programmed using RSS VIRTIO_NET_CTRL_MQ_RSS_CONFIG command. + Whichever filtering and steering functionality is enabled, they are applied + in the above order. +9. If multiple entries are programmed which has overlapping filtering attributes + for a received packet, the driver to define the location/priority of the entry. +10. The filter entries are usually short in size of few tens of bytes, + for example IPv6 + TCP tuple would be 36 bytes, and ops/sec rate is + high, hence supplying fields inside the queue descriptor is preferred for + up to a certain fixed size, say 96 bytes. +11. A flow filter entry consists of (a) match criteria, (b) action, + (c) destination and (d) a unique 32 bit flow id, all supplied by the + driver. +12. The driver should be able to query and delete flow filter entry by the + the device by the flow id. + +### 3.4.3 interface example + +1. Flow filter capabilities to query using a DMA interface such as cvq +using two different commands. + +``` +/* command 1 */ +struct flow_filter_capabilities { + le16 start_vq_index; + le16 num_flow_filter_vqs; + le16 max_flow_groups; + le16 max_group_priorities; /* max priorities of the group */ + le32 max_flow_filters_per_group; + le32 max_flow_filters; /* max flow_id in add/del + * is equal = max_flow_filters - 1. + */ + u8 max_priorities_per_group; + u8 padding[3]; +}; + +/* command 2 */ +struct flow_filter_fields_support_mask { + le64 supported_packet_field_mask_bmap[1]; +}; + +``` + +2. Group add/delete cvq commands: +``` + +struct virtio_net_rff_group_add { + le16 priority; Please explicitly explain the relationship between the number and the priority, for example, the smaller the number, the higher the priority :) + le16 group_id; +}; + + +struct virtio_net_rff_group_delete { + le16 group_id; + +``` + +3. Flow filter entry add/modify, delete over flow vq: + +``` +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 padding; s/padding/priority Each rule needs a priority. + u16 group_id; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +struct virtio_net_rff_delete { + u8 flow_op; + u8 padding[3]; + le32 flow_id; +}; + +``` + +4. Flow filter commands over cvq: + +``` + +struct virtio_net_rff_cmd { + u8 class; /* RFF class */ + u8 commands; /* RFF cmd = A */ + u8 command-specific-data[]; /* contains struct virtio_net_rff_add_modify or + * struct virtio_net_rff_delete For flow vq, we no longer distinguish operations by command, but by flow_op. But for ctrlq, this field will be carried. We should make it clear that when ctrlq is delivered based on cmd, the flow_op field is ignored. Thanks! + */ +}; + +``` + +### 3.4.4 For incremental future +a. Driver should be able to specify a specific packet byte offset, number + of bytes and mask as math criteria. +b. Support RSS context, in addition to a specific RQ. +c. If/when virtio switch object is implemented, support ingress/egress flow + filters at the switch port level.


  • 28.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-17-2023 04:52


    > From: Heng Qi <hengqi@linux.alibaba.com>
    > Sent: Wednesday, August 16, 2023 5:12 PM
    > > +/* command 1 */
    > > +struct flow_filter_capabilities {
    > > + le16 start_vq_index;
    > > + le16 num_flow_filter_vqs;
    > > + le16 max_flow_groups;
    > > + le16 max_group_priorities; /* max priorities of the group */
    > > + le32 max_flow_filters_per_group;
    > > + le32 max_flow_filters; /* max flow_id in add/del
    > > + * is equal = max_flow_filters - 1.
    > > + */
    > > + u8 max_priorities_per_group;
    >
    > + u8 padding[3];
    >
    Ack.

    > > +struct virtio_net_rff_group_add {
    > > + le16 priority;
    >
    > Please explicitly explain the relationship between the number and the priority,
    > for example, the smaller the number, the higher the priority :)
    >
    Right. Will do.
    I was thinking of higher the value higher the priority, so that one doesnt need to invert this in brain every time seeing the priority field. :)

    > > + le16 group_id;
    > > +};
    > > +
    > > +
    > > +struct virtio_net_rff_group_delete {
    > > + le16 group_id;
    > > +
    > > +```
    > > +
    > > +3. Flow filter entry add/modify, delete over flow vq:
    > > +
    > > +```
    > > +struct virtio_net_rff_add_modify {
    > > + u8 flow_op;
    > > + u8 padding;
    >
    > s/padding/priority
    >
    Ack.

    > Each rule needs a priority.
    >
    > > + u16 group_id;
    > > + le32 flow_id;
    > > + struct match_criteria mc;
    > > + struct destination dest;
    > > + struct action action;
    > > +
    > > + struct match_criteria mask; /* optional */
    > > +};
    > > +
    > > +struct virtio_net_rff_delete {
    > > + u8 flow_op;
    > > + u8 padding[3];
    > > + le32 flow_id;
    > > +};
    > > +
    > > +```
    > > +
    > > +4. Flow filter commands over cvq:
    > > +
    > > +```
    > > +
    > > +struct virtio_net_rff_cmd {
    > > + u8 class; /* RFF class */
    > > + u8 commands; /* RFF cmd = A */
    > > + u8 command-specific-data[]; /* contains struct
    > virtio_net_rff_add_modify or
    > > + * struct virtio_net_rff_delete
    >
    > For flow vq, we no longer distinguish operations by command, but by flow_op.
    > But for ctrlq, this field will be carried. We should make it clear that when ctrlq is
    > delivered based on cmd, the flow_op field is ignored.
    >
    Since cvq is only the communication medium for delivering of command, it is better use the flow_op as is, and cvq commands field to be ignored.
    This way, software layers are more organized cvq or flow vq.




  • 29.  RE: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-17-2023 04:52
    > From: Heng Qi <hengqi@linux.alibaba.com> > Sent: Wednesday, August 16, 2023 5:12 PM > > +/* command 1 */ > > +struct flow_filter_capabilities { > > + le16 start_vq_index; > > + le16 num_flow_filter_vqs; > > + le16 max_flow_groups; > > + le16 max_group_priorities; /* max priorities of the group */ > > + le32 max_flow_filters_per_group; > > + le32 max_flow_filters; /* max flow_id in add/del > > + * is equal = max_flow_filters - 1. > > + */ > > + u8 max_priorities_per_group; > > + u8 padding[3]; > Ack. > > +struct virtio_net_rff_group_add { > > + le16 priority; > > Please explicitly explain the relationship between the number and the priority, > for example, the smaller the number, the higher the priority :) > Right. Will do. I was thinking of higher the value higher the priority, so that one doesnt need to invert this in brain every time seeing the priority field. :) > > + le16 group_id; > > +}; > > + > > + > > +struct virtio_net_rff_group_delete { > > + le16 group_id; > > + > > +``` > > + > > +3. Flow filter entry add/modify, delete over flow vq: > > + > > +``` > > +struct virtio_net_rff_add_modify { > > + u8 flow_op; > > + u8 padding; > > s/padding/priority > Ack. > Each rule needs a priority. > > > + u16 group_id; > > + le32 flow_id; > > + struct match_criteria mc; > > + struct destination dest; > > + struct action action; > > + > > + struct match_criteria mask; /* optional */ > > +}; > > + > > +struct virtio_net_rff_delete { > > + u8 flow_op; > > + u8 padding[3]; > > + le32 flow_id; > > +}; > > + > > +``` > > + > > +4. Flow filter commands over cvq: > > + > > +``` > > + > > +struct virtio_net_rff_cmd { > > + u8 class; /* RFF class */ > > + u8 commands; /* RFF cmd = A */ > > + u8 command-specific-data[]; /* contains struct > virtio_net_rff_add_modify or > > + * struct virtio_net_rff_delete > > For flow vq, we no longer distinguish operations by command, but by flow_op. > But for ctrlq, this field will be carried. We should make it clear that when ctrlq is > delivered based on cmd, the flow_op field is ignored. > Since cvq is only the communication medium for delivering of command, it is better use the flow_op as is, and cvq commands field to be ignored. This way, software layers are more organized cvq or flow vq.


  • 30.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-17-2023 05:15


    ? 2023/8/17 ??12:52, Parav Pandit ??:
    >
    >> From: Heng Qi <hengqi@linux.alibaba.com>
    >> Sent: Wednesday, August 16, 2023 5:12 PM
    >>> +/* command 1 */
    >>> +struct flow_filter_capabilities {
    >>> + le16 start_vq_index;
    >>> + le16 num_flow_filter_vqs;
    >>> + le16 max_flow_groups;
    >>> + le16 max_group_priorities; /* max priorities of the group */
    >>> + le32 max_flow_filters_per_group;
    >>> + le32 max_flow_filters; /* max flow_id in add/del
    >>> + * is equal = max_flow_filters - 1.
    >>> + */
    >>> + u8 max_priorities_per_group;
    >> + u8 padding[3];
    >>
    > Ack.
    >
    >>> +struct virtio_net_rff_group_add {
    >>> + le16 priority;
    >> Please explicitly explain the relationship between the number and the priority,
    >> for example, the smaller the number, the higher the priority :)
    >>
    > Right. Will do.
    > I was thinking of higher the value higher the priority, so that one doesnt need to invert this in brain every time seeing the priority field. :)
    >

    It's ok :)

    >>> + le16 group_id;
    >>> +};
    >>> +
    >>> +
    >>> +struct virtio_net_rff_group_delete {
    >>> + le16 group_id;
    >>> +
    >>> +```
    >>> +
    >>> +3. Flow filter entry add/modify, delete over flow vq:
    >>> +
    >>> +```
    >>> +struct virtio_net_rff_add_modify {
    >>> + u8 flow_op;
    >>> + u8 padding;
    >> s/padding/priority
    >>
    > Ack.
    >
    >> Each rule needs a priority.
    >>
    >>> + u16 group_id;
    >>> + le32 flow_id;
    >>> + struct match_criteria mc;
    >>> + struct destination dest;
    >>> + struct action action;
    >>> +
    >>> + struct match_criteria mask; /* optional */
    >>> +};
    >>> +
    >>> +struct virtio_net_rff_delete {
    >>> + u8 flow_op;
    >>> + u8 padding[3];
    >>> + le32 flow_id;
    >>> +};
    >>> +
    >>> +```
    >>> +
    >>> +4. Flow filter commands over cvq:
    >>> +
    >>> +```
    >>> +
    >>> +struct virtio_net_rff_cmd {
    >>> + u8 class; /* RFF class */
    >>> + u8 commands; /* RFF cmd = A */
    >>> + u8 command-specific-data[]; /* contains struct
    >> virtio_net_rff_add_modify or
    >>> + * struct virtio_net_rff_delete
    >> For flow vq, we no longer distinguish operations by command, but by flow_op.
    >> But for ctrlq, this field will be carried. We should make it clear that when ctrlq is
    >> delivered based on cmd, the flow_op field is ignored.
    >>
    > Since cvq is only the communication medium for delivering of command, it is better use the flow_op as is, and cvq commands field to be ignored.
    > This way, software layers are more organized cvq or flow vq.

    Agree.

    Thanks!

    >




  • 31.  Re: [PATCH requirements v4 5/7] net-features: Add n-tuple receive flow filters requirements

    Posted 08-17-2023 05:15
    å 2023/8/17 äå12:52, Parav Pandit åé: From: Heng Qi <hengqi@linux.alibaba.com> Sent: Wednesday, August 16, 2023 5:12 PM +/* command 1 */ +struct flow_filter_capabilities { + le16 start_vq_index; + le16 num_flow_filter_vqs; + le16 max_flow_groups; + le16 max_group_priorities; /* max priorities of the group */ + le32 max_flow_filters_per_group; + le32 max_flow_filters; /* max flow_id in add/del + * is equal = max_flow_filters - 1. + */ + u8 max_priorities_per_group; + u8 padding[3]; Ack. +struct virtio_net_rff_group_add { + le16 priority; Please explicitly explain the relationship between the number and the priority, for example, the smaller the number, the higher the priority :) Right. Will do. I was thinking of higher the value higher the priority, so that one doesnt need to invert this in brain every time seeing the priority field. :) It's ok :) + le16 group_id; +}; + + +struct virtio_net_rff_group_delete { + le16 group_id; + +``` + +3. Flow filter entry add/modify, delete over flow vq: + +``` +struct virtio_net_rff_add_modify { + u8 flow_op; + u8 padding; s/padding/priority Ack. Each rule needs a priority. + u16 group_id; + le32 flow_id; + struct match_criteria mc; + struct destination dest; + struct action action; + + struct match_criteria mask; /* optional */ +}; + +struct virtio_net_rff_delete { + u8 flow_op; + u8 padding[3]; + le32 flow_id; +}; + +``` + +4. Flow filter commands over cvq: + +``` + +struct virtio_net_rff_cmd { + u8 class; /* RFF class */ + u8 commands; /* RFF cmd = A */ + u8 command-specific-data[]; /* contains struct virtio_net_rff_add_modify or + * struct virtio_net_rff_delete For flow vq, we no longer distinguish operations by command, but by flow_op. But for ctrlq, this field will be carried. We should make it clear that when ctrlq is delivered based on cmd, the flow_op field is ignored. Since cvq is only the communication medium for delivering of command, it is better use the flow_op as is, and cvq commands field to be ignored. This way, software layers are more organized cvq or flow vq. Agree. Thanks!


  • 32.  [PATCH requirements v4 3/7] net-features: Add low latency receive queue requirements

    Posted 08-15-2023 07:47
    Add requirements for the low latency receive queue. Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v0->v1: - clarified the requirements further - added line for the gro case - added design goals as the motivation for the requirements --- net-workstream/features-1.4.md 45 +++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 40fa07f..72d04bd 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. # 2. Summary 1. Device counters visible to the driver -2. Low latency tx virtqueue for PCI transport +2. Low latency tx and rx virtqueues for PCI transport # 3. Requirements ## 3.1 Device counters @@ -129,3 +129,46 @@ struct vnet_data_desc desc[2]; 9. A flow filter virtqueue also similarly need the ability to inline the short flow command header. + +### 3.2.2 Low latency rx virtqueue +0. Design goal: + a. Keep packet metadata and buffer data together which is consumed by driver + layer and make it available in a single cache line of cpu + b. Instead of having per packet descriptors which is complex to scale for + the device, supply the page directly to the device to consume it based + on packet size +1. The device should be able to write a packet receive completion that consists + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write + PCIe TLP. +2. The device should be able to perform DMA writes of multiple packets + completions in a single DMA transaction up to the PCIe maximum write limit + in a transaction. +3. The device should be able to zero pad packet write completion to align it to + 64B or CPU cache line size whenever possible. +4. An example of the above DMA completion structure: + +``` +/* Constant size receive packet completion */ +struct vnet_rx_completion { + u16 flags; + u16 id; /* buffer id */ + u8 gso_type; + u8 reserved[3]; + le16 gso_hdr_len; + le16 gso_size; + le16 csum_start; + le16 csum_offset; + u16 reserved2; + u64 timestamp; /* explained later */ + u8 padding[]; +}; +``` +5. The driver should be able to post constant-size buffer pages on a receive + queue which can be consumed by the device for an incoming packet of any size + from 64B to 9K bytes. +6. The device should be able to know the constant buffer size at receive + virtqueue level instead of per buffer level. +7. The device should be able to indicate when a full page buffer is consumed, + which can be recycled by the driver when the packets from the completed + page is fully consumed. +8. The device should be able to consume multiple pages for a receive GSO stream. -- 2.26.2


  • 33.  Re: [PATCH requirements v4 3/7] net-features: Add low latency receive queue requirements

    Posted 08-15-2023 08:50

    On Tuesday, 2023-08-15 at 10:45:56 +03, Parav Pandit wrote:
    > Add requirements for the low latency receive queue.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>
    > ---
    > changelog:
    > v0->v1:
    > - clarified the requirements further
    > - added line for the gro case
    > - added design goals as the motivation for the requirements
    > ---
    > net-workstream/features-1.4.md | 45 +++++++++++++++++++++++++++++++++-
    > 1 file changed, 44 insertions(+), 1 deletion(-)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 40fa07f..72d04bd 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
    >
    > # 2. Summary
    > 1. Device counters visible to the driver
    > -2. Low latency tx virtqueue for PCI transport
    > +2. Low latency tx and rx virtqueues for PCI transport
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -129,3 +129,46 @@ struct vnet_data_desc desc[2];
    >
    > 9. A flow filter virtqueue also similarly need the ability to inline the short flow
    > command header.
    > +
    > +### 3.2.2 Low latency rx virtqueue
    > +0. Design goal:
    > + a. Keep packet metadata and buffer data together which is consumed by driver
    > + layer and make it available in a single cache line of cpu
    > + b. Instead of having per packet descriptors which is complex to scale for

    This really is "per buffer" rather than "per packet".

    > + the device, supply the page directly to the device to consume it based
    > + on packet size
    > +1. The device should be able to write a packet receive completion that consists
    > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
    > + PCIe TLP.
    > +2. The device should be able to perform DMA writes of multiple packets
    > + completions in a single DMA transaction up to the PCIe maximum write limit
    > + in a transaction.
    > +3. The device should be able to zero pad packet write completion to align it to
    > + 64B or CPU cache line size whenever possible.
    > +4. An example of the above DMA completion structure:
    > +
    > +```
    > +/* Constant size receive packet completion */
    > +struct vnet_rx_completion {
    > + u16 flags;
    > + u16 id; /* buffer id */
    > + u8 gso_type;
    > + u8 reserved[3];
    > + le16 gso_hdr_len;
    > + le16 gso_size;
    > + le16 csum_start;
    > + le16 csum_offset;
    > + u16 reserved2;
    > + u64 timestamp; /* explained later */
    > + u8 padding[];
    > +};
    > +```
    > +5. The driver should be able to post constant-size buffer pages on a receive
    > + queue which can be consumed by the device for an incoming packet of any size
    > + from 64B to 9K bytes.
    > +6. The device should be able to know the constant buffer size at receive
    > + virtqueue level instead of per buffer level.
    > +7. The device should be able to indicate when a full page buffer is consumed,
    > + which can be recycled by the driver when the packets from the completed
    > + page is fully consumed.
    > +8. The device should be able to consume multiple pages for a receive GSO stream.
    --
    Swimming around in a plastic bag.



  • 34.  Re: [PATCH requirements v4 3/7] net-features: Add low latency receive queue requirements

    Posted 08-15-2023 09:53
    On Tuesday, 2023-08-15 at 10:45:56 +03, Parav Pandit wrote: > Add requirements for the low latency receive queue. > > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v0->v1: > - clarified the requirements further > - added line for the gro case > - added design goals as the motivation for the requirements > --- > net-workstream/features-1.4.md 45 +++++++++++++++++++++++++++++++++- > 1 file changed, 44 insertions(+), 1 deletion(-) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 40fa07f..72d04bd 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface. > > # 2. Summary > 1. Device counters visible to the driver > -2. Low latency tx virtqueue for PCI transport > +2. Low latency tx and rx virtqueues for PCI transport > > # 3. Requirements > ## 3.1 Device counters > @@ -129,3 +129,46 @@ struct vnet_data_desc desc[2]; > > 9. A flow filter virtqueue also similarly need the ability to inline the short flow > command header. > + > +### 3.2.2 Low latency rx virtqueue > +0. Design goal: > + a. Keep packet metadata and buffer data together which is consumed by driver > + layer and make it available in a single cache line of cpu > + b. Instead of having per packet descriptors which is complex to scale for This really is "per buffer" rather than "per packet". > + the device, supply the page directly to the device to consume it based > + on packet size > +1. The device should be able to write a packet receive completion that consists > + of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write > + PCIe TLP. > +2. The device should be able to perform DMA writes of multiple packets > + completions in a single DMA transaction up to the PCIe maximum write limit > + in a transaction. > +3. The device should be able to zero pad packet write completion to align it to > + 64B or CPU cache line size whenever possible. > +4. An example of the above DMA completion structure: > + > +``` > +/* Constant size receive packet completion */ > +struct vnet_rx_completion { > + u16 flags; > + u16 id; /* buffer id */ > + u8 gso_type; > + u8 reserved[3]; > + le16 gso_hdr_len; > + le16 gso_size; > + le16 csum_start; > + le16 csum_offset; > + u16 reserved2; > + u64 timestamp; /* explained later */ > + u8 padding[]; > +}; > +``` > +5. The driver should be able to post constant-size buffer pages on a receive > + queue which can be consumed by the device for an incoming packet of any size > + from 64B to 9K bytes. > +6. The device should be able to know the constant buffer size at receive > + virtqueue level instead of per buffer level. > +7. The device should be able to indicate when a full page buffer is consumed, > + which can be recycled by the driver when the packets from the completed > + page is fully consumed. > +8. The device should be able to consume multiple pages for a receive GSO stream. -- Swimming around in a plastic bag.


  • 35.  [PATCH requirements v4 6/7] net-features: Add packet timestamp requirements

    Posted 08-15-2023 07:47
    Add tx and rx packet timestamp requirements. Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> --- changelog: v3->v4: - no change --- net-workstream/features-1.4.md 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 78bb3d2..82b907a 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -10,6 +10,7 @@ together is desired while updating the virtio net interface. 2. Low latency tx and rx virtqueues for PCI transport 3. Virtqueue notification coalescing re-arming support 4 Virtqueue receive flow filters (RFF) +5. Device timestamp for tx and rx packets # 3. Requirements ## 3.1 Device counters @@ -334,3 +335,28 @@ a. Driver should be able to specify a specific packet byte offset, number b. Support RSS context, in addition to a specific RQ. c. If/when virtio switch object is implemented, support ingress/egress flow filters at the switch port level. + +## 3.5 Packet timestamp +1. Device should provide transmit timestamp and receive timestamp of the packets + at per packet level when the device is enabled. +2. Device should provide the current free running clock in the least latency + possible using an MMIO register read of 64-bit to have the least jitter. +3. Device should provide the current frequency and the frequency unit for the + software to synchronize the reference point of software and the device using + a control vq command. + +### 3.5.1 Transmit timestamp +1. Transmit completion must contain a packet transmission timestamp when the + device is enabled for it. +2. The device should record the packet transmit timestamp in the completion at + the farthest egress point towards the network. +3. The device must provide a transmit packet timestamp in a single DMA + transaction along with the rest of the transmit completion fields. + +### 3.5.2 Receive timestamp +1. Receive completion must contain a packet reception timestamp when the device + is enabled for it. +2. The device should record the received packet timestamp at the closet ingress + point of reception from the network. +3. The device should provide a receive packet timestamp in a single DMA + transaction along with the rest of the receive completion fields. -- 2.26.2


  • 36.  [PATCH requirements v4 7/7] net-features: Add header data split requirements

    Posted 08-15-2023 07:47
    Add header data split requirements for the receive packets. Signed-off-by: Parav Pandit <parav@nvidia.com> --- net-workstream/features-1.4.md 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md index 82b907a..5e359b6 100644 --- a/net-workstream/features-1.4.md +++ b/net-workstream/features-1.4.md @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. 3. Virtqueue notification coalescing re-arming support 4 Virtqueue receive flow filters (RFF) 5. Device timestamp for tx and rx packets +6. Header data split for the receive virtqueue # 3. Requirements ## 3.1 Device counters @@ -360,3 +361,15 @@ c. If/when virtio switch object is implemented, support ingress/egress flow point of reception from the network. 3. The device should provide a receive packet timestamp in a single DMA transaction along with the rest of the receive completion fields. + +## 3.6 Header data split for the receive virtqueue +1. The device should be able to DMA the packet header and data to two different + memory locations, this enables driver and networking stack to perform zero + copy to application buffer(s). +2. The driver should be able to configure maximum header buffer size per + virtqueue. +3. The header buffer to be in a physically contiguous memory per virtqueue +4. The device should be able to indicate header data split in the receive + completion. +5. The device should be able to zero pad the header buffer when the received + header is shorter than cpu cache line size. -- 2.26.2


  • 37.  Re: [PATCH requirements v4 7/7] net-features: Add header data split requirements

    Posted 08-15-2023 08:53

    On Tuesday, 2023-08-15 at 10:46:00 +03, Parav Pandit wrote:
    > Add header data split requirements for the receive packets.
    >
    > Signed-off-by: Parav Pandit <parav@nvidia.com>

    Acked-by: David Edmondson <david.edmondson@oracle.com>

    > ---
    > net-workstream/features-1.4.md | 13 +++++++++++++
    > 1 file changed, 13 insertions(+)
    >
    > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
    > index 82b907a..5e359b6 100644
    > --- a/net-workstream/features-1.4.md
    > +++ b/net-workstream/features-1.4.md
    > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface.
    > 3. Virtqueue notification coalescing re-arming support
    > 4 Virtqueue receive flow filters (RFF)
    > 5. Device timestamp for tx and rx packets
    > +6. Header data split for the receive virtqueue
    >
    > # 3. Requirements
    > ## 3.1 Device counters
    > @@ -360,3 +361,15 @@ c. If/when virtio switch object is implemented, support ingress/egress flow
    > point of reception from the network.
    > 3. The device should provide a receive packet timestamp in a single DMA
    > transaction along with the rest of the receive completion fields.
    > +
    > +## 3.6 Header data split for the receive virtqueue
    > +1. The device should be able to DMA the packet header and data to two different
    > + memory locations, this enables driver and networking stack to perform zero
    > + copy to application buffer(s).
    > +2. The driver should be able to configure maximum header buffer size per
    > + virtqueue.
    > +3. The header buffer to be in a physically contiguous memory per virtqueue
    > +4. The device should be able to indicate header data split in the receive
    > + completion.
    > +5. The device should be able to zero pad the header buffer when the received
    > + header is shorter than cpu cache line size.
    --
    I used to worry, thought I was goin' mad in a hurry.



  • 38.  Re: [PATCH requirements v4 7/7] net-features: Add header data split requirements

    Posted 08-15-2023 08:53
    On Tuesday, 2023-08-15 at 10:46:00 +03, Parav Pandit wrote: > Add header data split requirements for the receive packets. > > Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: David Edmondson <david.edmondson@oracle.com> > --- > net-workstream/features-1.4.md 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md > index 82b907a..5e359b6 100644 > --- a/net-workstream/features-1.4.md > +++ b/net-workstream/features-1.4.md > @@ -11,6 +11,7 @@ together is desired while updating the virtio net interface. > 3. Virtqueue notification coalescing re-arming support > 4 Virtqueue receive flow filters (RFF) > 5. Device timestamp for tx and rx packets > +6. Header data split for the receive virtqueue > > # 3. Requirements > ## 3.1 Device counters > @@ -360,3 +361,15 @@ c. If/when virtio switch object is implemented, support ingress/egress flow > point of reception from the network. > 3. The device should provide a receive packet timestamp in a single DMA > transaction along with the rest of the receive completion fields. > + > +## 3.6 Header data split for the receive virtqueue > +1. The device should be able to DMA the packet header and data to two different > + memory locations, this enables driver and networking stack to perform zero > + copy to application buffer(s). > +2. The driver should be able to configure maximum header buffer size per > + virtqueue. > +3. The header buffer to be in a physically contiguous memory per virtqueue > +4. The device should be able to indicate header data split in the receive > + completion. > +5. The device should be able to zero pad the header buffer when the received > + header is shorter than cpu cache line size. -- I used to worry, thought I was goin' mad in a hurry.