For 4.20 and newer kernels VethPeerIndex() causes a stack corruption as
the kernel is copying more data to golang user space than originally
expected. This is due to a recent kernel commit where it extends veth
driver's ethtool stats for XDP:
https://git.kernel.org/torvalds/c/d397b9682c1c808344dd93b43de8750fa4d9f581
The VethPeerIndex()'s logic is utterly wrong to assume ethtool stats are
never extended in the driver. Unfortunately there is no other way around
in golang than to add serialize/deserialize helpers to have a dynamically
sized ethtoolStats with a uint64 data array that has the size of the previous
result from the ETHTOOL_GSSET_INFO query. This ensures we don't run into
a buffer overflow triggered by kernel's copy_to_user() in ETHTOOL_GSTATS
query (ethtool_get_stats() in kernel). Now, for the deserialize operation
we really only care about the peer's ifindex which is always stored in
the first uint64.
Fixes: 54ad9e3a4c ("Two new functions: LinkSetBondSlave and VethPeerIndex")
Reported-by: Jean Raby <jean@raby.sh>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: phob0s <git@phob0s.pl>
It's not needed for retrieving the veth peer ifindex, and we already
get the set count via earlier ETHTOOL_GSSET_INFO call. Both are copying
veth_get_sset_count() up to user space in veth case (which is the only
user of this anyway).
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This PR refers to PR@lebauce and add some changes.
- Added some tests to retrieve bond slave information.
- Link.BondSlave is changed to LinkSlave interface.
- BondSlaveState.String() returns UPPER case. (same as iproute2)
- BondSlaveMiiStatus.String() returns UPPER case. (same as iproute2)
- Add a new Link type, IPoIB, that exposes the following IPoIB attributes:
* IFLA_IPOIB_PKEY
* IFLA_IPOIB_MODE
* IFLA_IPOIB_UMCAST
- Suppport Deserialize for IPoIB link attributes in LinkDeserialize()
- Support IPoIB attributes in LinkAdd()
Today netlink package supports Get/Set of a VF's max TX rate
via IFLA_VF_TX_RATE netlink attribute.
This patch add support to Get/Set of a VFs min and max TX rate
via IFLA_VF_RATE netlink attribute.
- Add support to set min/max tx rate for VF via IFLA_VF_RATE
- Added IFLA_VF_RATE min/max tx rate attributes to netlink.VfInfo
including parsing support in netlink.parseVfInfo()
NOTE: According to [1] IFLA_VF_RATE takes precedence over
IFLA_VF_TX_RATE. Dealing with the co-existance of these
netlink attributes is left for the user to handle.
[1]https://lists.openwall.net/netdev/2014/05/22/42
iproute2's own netlink library asserts that the sockaddr sender pid
has to be the one of the kernel [0]. It also doesn't bail out on pid
mismatch but only skips the message instead. We've seen cases where
the latter had a pid 0; in such case we should skip to the next nl
message instead of hard bail out.
[0] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/lib/libnetlink.c
rtnl_dump_filter_l(), __rtnl_talk_iov()
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This deserializes the tx queue, and rx queue count on link
deserialization. We already supported it on serialization.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
For tuntap interfaces, return a TunTap Interface instead of
a Generic link when retrieving the interface.
Use netlink extended attributes to populate the Link attributes
for the tuntap link.
In case of older tun driver which does not provide these
attributes, use sysfs to retrieve these attributes.
This commit also adds Owner and Group attributes for the TunTap
Link.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
XFRM interfaces are available in Linux Kernel 4.19+
When an IF_ID is applied to a XFRM policy and state, the corresponding
traffic will be sent through the virtual interface with the same IF_ID.
Currently each call to Receive() allocates 64K buffer on the heap
for the data to receive from a netlink socket. This is rather costly
considering that in most cases only fraction of this memory is actually
needed.
A quick fix is to make sure that the large buffer does not "escape" -
i.e. that it is sufficient to have it allocated on the stack.
Then only the prefix of the buffer that was actually used
is copied to the heap.
Fix for issue: #379
Signed-off-by: Milan Lenco <milan.lenco@pantheon.tech>
chg: addtl comment and made minor logic optimization as disscussed in PR #296
chg: flipped Persist to NonPersist
chg: comments, only unpersist tuntap if flag is set
chg: tuntap persist optional, allow empty intfc name
chg: added conditional build
Signed-off-by: Ralph Schmieder <ralph.schmieder@gmail.com>
Add support for setting InfiniBand Node and Port GUID address
configuration of a VF when InfiniBand HCA are used with SR-IOV mode.
Signed-off-by: Parav Pandit <parav@mellanox.com>
The IFLA_* constants in in x/sys/unix were updated to Linux 4.15 in
golang/sys@88d2dcc510, so use these instead of locally duplicating
them.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Allow the caller to specify the desired link index at link creation.
This is equivalent to
ip link add link eth0 name testmacvtap index 1000 type macvtap
ip link add dummy1 index 1001 type dummy
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
For link, address, route, add a `WithOptions` variant to the
`*Subscribe()` function to specify a namespace and an error
callback. Those options can be extended in the future without adding
more functions. For example, it could be possible to subscribe only
for a given family by adding a `Family` member to the appropriate
struct.
As a minor change, the private function is always suffixed by `At`,
since it was the case for route and raw netlink functions (but not for
address and link).
When a fatal error happens in a `*Subscribe*()` function, the error
was not available to the user. We add a callback function that will be
invoked when such an error happens.
This also modifies the behavior of `AddrSubscribe*()` function to turn
parse errors into fatal errors, as it happens with the other functions.
On newer linux kernels (4.12), netlink rejects a request to set an XDP
program with flags set to 0. Instead, flags need to not be specified if they
are 0.
Add support for creating and managing gre tunnels.
This is equivalent to
Point to Point:
ip tunnel add tun4 mode gre local 192.0.2.1 remote 203.0.113.6 key 123
Point to Multipoint:
ip tunnel add tun8 mode gre local 192.0.2.1 key 1234
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Corrected function signature to have correct name LinkSetVfTrust
instead of LinkSetTrust.
This aligns with code comment and rest of the other VF functions.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Add support for setting trust state of a VF. This allows restricting
certain operations on VF when its untrusted such as disabling
promiscuous mode.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Add bond parameters corresponding to:
* IFLA_BOND_AD_ACTOR_SYS_PRIO
* IFLA_BOND_AD_USER_PORT_KEY
* IFLA_BOND_AD_ACTOR_SYSTEM
* IFLA_BOND_TLB_DYNAMIC_LB
These are available in new(ish) kernels.
A new error type LinkNotFoundError is returned instead
of the default error type to facilitate better error
handling by downstream consumers of this package
* Multicast snooping and hello time are the only ones supported at the
moment
* Only pass values to kernel when user sets them, otherwise let kernel
decide default
* Can set multicast snooping on existing bridges
* Tests disabled on Travis CI as the kernel version is too old
* All bridge flags copied from Kernel code, but only the two mentioned
above work
(5a7ad1146c/include/uapi/linux/if_link.h (L232-L281))
Signed-off-by: Petar Petrov <pppepito86@gmail.com>
Signed-off-by: Ed King <eking@pivotal.io>
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
Bridge ports can be set to use the proxy arp features by calling
either LinkSetBrProxyArp() or LinkSetBrProxyArpWiFi().
Signed-off-by: David Wilder <wilder@us.ibm.com>
Currently a LinkByName("bondX") doesn't return the bond specific attributes.
parseBondData needs to update the link that is passed in in order for
the bond's Mode, Miimon etc to be populated correctly.
Retreive the link type from Netlink GetLink information.
Aim to return the same value as nl-link-list for example :
gre0 gre <noarp,up,running,lowerup> slave-of NONE group 0 ipgre : gre0
gretap0 ether <broadcast,multicast> slave-of NONE group 0 ipgre : gretap0
dummy0 ether 36:d5:87:cf:eb:35 <broadcast,noarp> group 0
tun0 none <pointopoint,multicast,noarp> group 0
tap0 ether 4e:ce:43:4a:82:c2 <broadcast,multicast> group 0
Signed-off-by: Nicolas PLANEL <nplanel@redhat.com>
* Add netlink definitions for extra IFLAs
The relevant IFLA_* are defined in the kernel but not in the syscall
package.
* Parameterize the return value of loadSimpleBpf
Allow the return value of the bpf program created by loadSimpleBpf to
be specified by the caller. Before this, the value was hardcoded to 1.
* Add support for a new IFLA that enables using a bpf program as a
filter early in the driver path of some NICs.
* Add a test for set/get of an xdp program. Since currently, the XDP
IFLA is optional, check that the hardware supports it before trying to
set the field.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
The mode on macvtap interfaces was not being set correctly.
Due to this the mode on macvtap is always set to default.
Set the mode correctly and add unit tests to check the same
This fixes issue https://github.com/vishvananda/netlink/issues/136
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
- Package methods only need an empty handle.
Not a regular Handle with a couple of
sockets creation/delete.
Signed-off-by: Alessandro Boch <aboch@docker.com>
- Ties to a netlink socket. All client requests
will re-use same socket. Socket released at
handle deletion.
- Also network namespace can be specified during
handle creation. Socket will be opened on the
specified network namespace.
Signed-off-by: Alessandro Boch <aboch@docker.com>