mirror of
https://github.com/dynup/kpatch
synced 2025-01-05 12:39:26 +00:00
ebc776a4fc
In shadow-newpid.patch, a new structure member is created using the klp_shadow_get_or_alloc() routine. The simplified and explained version of it in the patch author guide has a typo that replaces klp_shadow_get_or_alloc() with klp_shadow_get(). Signed-off-by: Yannick Cote <ycote@redhat.com>
777 lines
29 KiB
Markdown
777 lines
29 KiB
Markdown
kpatch Patch Author Guide
|
|
=========================
|
|
|
|
Because kpatch-build is relatively easy to use, it can be easy to assume that a
|
|
successful patch module build means that the patch is safe to apply. But in
|
|
fact that's a very dangerous assumption.
|
|
|
|
There are many pitfalls that can be encountered when creating a live patch.
|
|
This document attempts to guide the patch creation process. It's a work in
|
|
progress. If you find it useful, please contribute!
|
|
|
|
Patch Analysis
|
|
--------------
|
|
|
|
kpatch provides _some_ guarantees, but it does not guarantee that all patches
|
|
are safe to apply. Every patch must also be analyzed in-depth by a human.
|
|
|
|
The most important point here cannot be stressed enough. Here comes the bold:
|
|
|
|
**Do not blindly apply patches. There is no substitute for human analysis and
|
|
reasoning on a per-patch basis. All patches must be thoroughly analyzed by a
|
|
human kernel expert who completely understands the patch and the affected code
|
|
and how they relate to the live patching environment.**
|
|
|
|
kpatch vs livepatch vs kGraft
|
|
-----------------------------
|
|
|
|
This document assumes that the kpatch-build tool is being used to create
|
|
livepatch kernel modules. Other live patching systems may have different
|
|
consistency models, their own guarantees, and other subtle differences.
|
|
The guidance in this document applies **only** to kpatch-build generated
|
|
livepatches.
|
|
|
|
Patch upgrades
|
|
--------------
|
|
|
|
Due to potential unexpected interactions between patches, it's highly
|
|
recommended that when patching a system which has already been patched, the
|
|
second patch should be a cumulative upgrade which is a superset of the first
|
|
patch.
|
|
|
|
Data structure changes
|
|
----------------------
|
|
|
|
kpatch patches functions, not data. If the original patch involves a change to
|
|
a data structure, the patch will require some rework, as changes to data
|
|
structures are not allowed by default.
|
|
|
|
Usually you have to get creative. There are several possible ways to handle
|
|
this:
|
|
|
|
### Change the code which uses the data structure
|
|
|
|
Sometimes, instead of changing the data structure itself, you can change the
|
|
code which uses it.
|
|
|
|
For example, consider this
|
|
[patch](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=54a20552e1eae07aa240fa370a0293e006b5faed).
|
|
which has the following hunk:
|
|
|
|
```
|
|
@@ -3270,6 +3277,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
|
|
[SVM_EXIT_EXCP_BASE + PF_VECTOR] = pf_interception,
|
|
[SVM_EXIT_EXCP_BASE + NM_VECTOR] = nm_interception,
|
|
[SVM_EXIT_EXCP_BASE + MC_VECTOR] = mc_interception,
|
|
+ [SVM_EXIT_EXCP_BASE + AC_VECTOR] = ac_interception,
|
|
[SVM_EXIT_INTR] = intr_interception,
|
|
[SVM_EXIT_NMI] = nmi_interception,
|
|
[SVM_EXIT_SMI] = nop_on_interception,
|
|
```
|
|
|
|
`svm_exit_handlers[]` is an array of function pointers. The patch adds a
|
|
`ac_interception` function pointer to the array at index `[SVM_EXIT_EXCP_BASE +
|
|
AC_VECTOR]`. That change is incompatible with kpatch.
|
|
|
|
Looking at the source file, we can see that this function pointer is only
|
|
accessed by a single function, `handle_exit()`:
|
|
|
|
```
|
|
if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
|
|
|| !svm_exit_handlers[exit_code]) {
|
|
WARN_ONCE(1, "svm: unexpected exit reason 0x%x\n", exit_code);
|
|
kvm_queue_exception(vcpu, UD_VECTOR);
|
|
return 1;
|
|
}
|
|
|
|
return svm_exit_handlers[exit_code](svm);
|
|
```
|
|
|
|
So an easy solution here is to just change the code to manually check for the
|
|
new case before looking in the data structure:
|
|
|
|
```
|
|
@@ -3580,6 +3580,9 @@ static int handle_exit(struct kvm_vcpu *vcpu)
|
|
return 1;
|
|
}
|
|
|
|
+ if (exit_code == SVM_EXIT_EXCP_BASE + AC_VECTOR)
|
|
+ return ac_interception(svm);
|
|
+
|
|
return svm_exit_handlers[exit_code](svm);
|
|
}
|
|
```
|
|
|
|
Not only is this an easy solution, it's also safer than touching data since
|
|
`svm_exit_handlers[]` may be in use by tasks that haven't been patched
|
|
yet.
|
|
|
|
### Use a kpatch callback macro
|
|
|
|
Kpatch supports the kernel's livepatch [(Un)patching
|
|
callbacks](https://github.com/torvalds/linux/blob/master/Documentation/livepatch/callbacks.rst).
|
|
The kernel API requires callback registration through `struct klp_callbacks`,
|
|
but to do so through kpatch-build, `kpatch-macros.h` defines the following:
|
|
|
|
* `KPATCH_PRE_PATCH_CALLBACK` - executed before patching
|
|
* `KPATCH_POST_PATCH_CALLBACK` - executed after patching
|
|
* `KPATCH_PRE_UNPATCH_CALLBACK` - executed before unpatching, complements the
|
|
post-patch callback.
|
|
* `KPATCH_POST_UNPATCH_CALLBACK` - executed after unpatching, complements the
|
|
pre-patch callback.
|
|
|
|
A pre-patch callback routine has the following signature:
|
|
|
|
```
|
|
static int callback(patch_object *obj) { }
|
|
KPATCH_PRE_PATCH_CALLBACK(callback);
|
|
```
|
|
|
|
and any non-zero return status indicates failure to the kernel. For more
|
|
information on pre-patch callback failure, see the **Pre-patch return status**
|
|
section below.
|
|
|
|
Post-patch, pre-unpatch, and post-unpatch callback routines all share the
|
|
following signature:
|
|
|
|
```
|
|
static void callback(patch_object *obj) { }
|
|
KPATCH_POST_PATCH_CALLBACK(callback); /* or */
|
|
KPATCH_PRE_UNPATCH_CALLBACK(callback); /* or */
|
|
KPATCH_POST_UNPATCH_CALLBACK(callback);
|
|
```
|
|
|
|
Generally pre-patch callbacks are paired with post-unpatch callbacks, meaning
|
|
that anything the former allocates or sets up should be torn down by the former
|
|
callback. Likewise for post-patch and pre-unpatch callbacks.
|
|
|
|
#### Pre-patch return status
|
|
|
|
If kpatch is currently patching already loaded objects (vmlinux always by
|
|
definition as well as any currently loaded kernel modules), a non-zero pre-patch
|
|
callback status stops the current patch in progress. The kpatch-module
|
|
is rejected, completely reverted, and unloaded.
|
|
|
|
If an already loaded kpatch is patching an incoming kernel module, then
|
|
a failing pre-patch callback will result in the kernel module loader
|
|
rejecting the new module.
|
|
|
|
In both cases, if a pre-patch callback fails, none of its other
|
|
associated callbacks will be executed.
|
|
|
|
#### Callback context
|
|
|
|
* For patches to vmlinux or already loaded kernel modules, callback functions
|
|
will be run around the livepatch transitions in the `klp_enable_patch()`
|
|
callchain. This is executed automatically on kpatch module init.
|
|
|
|
* For patches to kernel modules which haven't been loaded yet, a
|
|
module-notifier will execute callbacks when the module is loaded into
|
|
the `MODULE_STATE_COMING` state. The pre and post-patch callbacks are
|
|
called before any module_init code.
|
|
|
|
Example: a kpatch fix for CVE-2016-5389 could utilize the
|
|
`KPATCH_PRE_PATCH_CALLBACK` and `KPATCH_POST_UNPATCH_CALLBACK` macros to modify
|
|
variable `sysctl_tcp_challenge_ack_limit` in-place:
|
|
|
|
```
|
|
+#include "kpatch-macros.h"
|
|
+
|
|
+static bool kpatch_write = false;
|
|
+static int kpatch_pre_patch_tcp_send_challenge_ack(patch_object *obj)
|
|
+{
|
|
+ if (sysctl_tcp_challenge_ack_limit == 100) {
|
|
+ sysctl_tcp_challenge_ack_limit = 1000;
|
|
+ kpatch_write = true;
|
|
+ }
|
|
+ return 0;
|
|
+}
|
|
static void kpatch_post_unpatch_tcp_send_challenge_ack(patch_object *obj)
|
|
+{
|
|
+ if (kpatch_write && sysctl_tcp_challenge_ack_limit == 1000)
|
|
+ sysctl_tcp_challenge_ack_limit = 100;
|
|
+}
|
|
+KPATCH_PRE_PATCH_CALLBACK(kpatch_pre_patch_tcp_send_challenge_ack);
|
|
+KPATCH_POST_UNPATCH_CALLBACK(kpatch_post_unpatch_tcp_send_challenge_ack);
|
|
```
|
|
|
|
Don't forget to protect access to data as needed. Spinlocks and mutexes /
|
|
sleeping locks **may be used** (this is a change of behavior from when kpatch
|
|
relied on the kpatch.ko support module and `stop_machine()` context.)
|
|
|
|
Be careful when upgrading. If patch A has a pre/post-patch callback which
|
|
writes to X, and then you load patch B which is a superset of A, in some cases
|
|
you may want to prevent patch B from writing to X, if A is already loaded.
|
|
|
|
|
|
### Use a shadow variable
|
|
|
|
If you need to add a field to an existing data structure, or even many existing
|
|
data structures, you can use the kernel's
|
|
[Shadow Variable](https://www.kernel.org/doc/html/latest/livepatch/shadow-vars.html) API.
|
|
|
|
Example: The `shadow-newpid.patch` integration test employs shadow variables
|
|
to add a rolling counter to the new `struct task_struct` instances. A
|
|
simplified version is presented here.
|
|
|
|
A shadow PID variable is allocated in `do_fork()`: it is associated with the
|
|
current `struct task_struct *p` value, given an ID of `KPATCH_SHADOW_NEWPID`,
|
|
sized accordingly, and allocated as per `GFP_KERNEL` flag rules. Note that
|
|
the shadow variable <obj, id> association is global -- hence it is best to
|
|
provide unique ID enumerations per kpatch as needed.
|
|
|
|
`klp_shadow_alloc()` returns a pointer to the shadow variable, so we can
|
|
dereference and make assignments as usual. In this patch chunk, the shadow
|
|
`newpid` is allocated then assigned to a rolling `ctr` counter value:
|
|
```
|
|
diff --git a/kernel/fork.c b/kernel/fork.c
|
|
index 9bff3b28c357..18374fd35bd9 100644
|
|
--- a/kernel/fork.c
|
|
+++ b/kernel/fork.c
|
|
@@ -1751,6 +1751,8 @@ struct task_struct *fork_idle(int cpu)
|
|
return task;
|
|
}
|
|
|
|
+#include <linux/livepatch.h>
|
|
+#define KPATCH_SHADOW_NEWPID 0
|
|
/*
|
|
* Ok, this is the main fork-routine.
|
|
*
|
|
@@ -1794,6 +1796,14 @@ long do_fork(unsigned long clone_flags,
|
|
if (!IS_ERR(p)) {
|
|
struct completion vfork;
|
|
struct pid *pid;
|
|
+ int *newpid;
|
|
+ static int ctr = 0;
|
|
+
|
|
+ newpid = klp_shadow_get_or_alloc(p, KPATCH_SHADOW_NEWPID,
|
|
+ sizeof(*newpid), GFP_KERNEL,
|
|
+ NULL, NULL);
|
|
+ if (newpid)
|
|
+ *newpid = ctr++;
|
|
|
|
trace_sched_process_fork(current, p);
|
|
```
|
|
|
|
A shadow variable may be accessed via `klp_shadow_get()`. Here the patch
|
|
modifies `task_context_switch_counts()` to fetch the shadow variable
|
|
associated with the current `struct task_struct *p` object and a
|
|
`KPATCH_SHADOW_NEWPID ID`. As in the previous patch chunk, the shadow
|
|
variable pointer may be accessed as an ordinary pointer type:
|
|
```
|
|
diff --git a/fs/proc/array.c b/fs/proc/array.c
|
|
index 39684c79e8e2..fe0259d057a3 100644
|
|
--- a/fs/proc/array.c
|
|
+++ b/fs/proc/array.c
|
|
@@ -394,13 +394,19 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
|
|
seq_putc(m, '\n');
|
|
}
|
|
|
|
+#include <linux/livepatch.h>
|
|
+#define KPATCH_SHADOW_NEWPID 0
|
|
static inline void task_context_switch_counts(struct seq_file *m,
|
|
struct task_struct *p)
|
|
{
|
|
+ int *newpid;
|
|
seq_printf(m, "voluntary_ctxt_switches:\t%lu\n"
|
|
"nonvoluntary_ctxt_switches:\t%lu\n",
|
|
p->nvcsw,
|
|
p->nivcsw);
|
|
+ newpid = klp_shadow_get(p, KPATCH_SHADOW_NEWPID);
|
|
+ if (newpid)
|
|
+ seq_printf(m, "newpid:\t%d\n", *newpid);
|
|
}
|
|
|
|
static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
|
|
```
|
|
|
|
A shadow variable is freed by calling `klp_shadow_free()` and providing
|
|
the object / enum ID combination. Once freed, the shadow variable is no
|
|
longer safe to access:
|
|
```
|
|
diff --git a/kernel/exit.c b/kernel/exit.c
|
|
index 148a7842928d..44b6fe61e912 100644
|
|
--- a/kernel/exit.c
|
|
+++ b/kernel/exit.c
|
|
@@ -791,6 +791,8 @@ static void check_stack_usage(void)
|
|
static inline void check_stack_usage(void) {}
|
|
#endif
|
|
|
|
+#include <linux/livepatch.h>
|
|
+#define KPATCH_SHADOW_NEWPID 0
|
|
void do_exit(long code)
|
|
{
|
|
struct task_struct *tsk = current;
|
|
@@ -888,6 +890,8 @@ void do_exit(long code)
|
|
check_stack_usage();
|
|
exit_thread();
|
|
|
|
+ klp_shadow_free(tsk, KPATCH_SHADOW_NEWPID, NULL);
|
|
+
|
|
/*
|
|
* Flush inherited counters to the parent - before the parent
|
|
* gets woken up by child-exit notifications.
|
|
```
|
|
Notes:
|
|
* `klp_shadow_alloc()` and `klp_shadow_get_or_alloc()` initialize only shadow
|
|
variable metadata. They allocate variable storage via `kmalloc` with the
|
|
`gfp_t` flags given, but otherwise leave the area untouched. Initialization
|
|
of a shadow variable is the responsibility of the caller.
|
|
* As soon as `klp_shadow_alloc()` or `klp_shadow_get_or_alloc()` create a shadow
|
|
variable, its presence will be reported by `klp_shadow_get()`. Care should be
|
|
taken to avoid any potential race conditions between a kernel thread that
|
|
allocates a shadow variable and concurrent threads that may attempt to use
|
|
it.
|
|
* Patches may need to call `klp_shadow_free_all()` from a post-unpatch handler
|
|
to safely cleanup any shadow variables of a particular ID. From post-unpatch
|
|
context, unloading kpatch module code (aside from .exit) should be
|
|
completely inactive. As long as these shadow variables were only accessed by
|
|
the unloaded kpatch, they are be safe to release.
|
|
|
|
Data semantic changes
|
|
---------------------
|
|
|
|
Part of the stable-tree [backport](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/fs/aio.c?h=linux-3.10.y&id=6745cb91b5ec93a1b34221279863926fba43d0d7)
|
|
to fix CVE-2014-0206 changed the reference count semantic of `struct
|
|
kioctx.reqs_active`. Associating a shadow variable to new instances of this
|
|
structure can be used by patched code to handle both new (post-patch) and
|
|
existing (pre-patch) instances.
|
|
|
|
(Note: this example is trimmed to highlight this use-case. Boilerplate code is
|
|
also required to allocate/free a shadow variable with enum ID
|
|
`KPATCH_SHADOW_REQS_ACTIVE_V2` whenever a new `struct kioctx` is
|
|
created/released. No values are ever assigned to the shadow variable.)
|
|
|
|
```
|
|
diff --git a/fs/aio.c b/fs/aio.c
|
|
index ebd06fd0de89..6a33b73c9107 100644
|
|
--- a/fs/aio.c
|
|
+++ b/fs/aio.c
|
|
@@ -280,6 +280,8 @@ static void free_ioctx_rcu(struct rcu_head *head)
|
|
* and ctx->users has dropped to 0, so we know no more kiocbs can be submitted -
|
|
* now it's safe to cancel any that need to be.
|
|
*/
|
|
+#include <linux/livepatch.h>
|
|
+#define KPATCH_SHADOW_REQS_ACTIVE_V2 1
|
|
static void free_ioctx(struct kioctx *ctx)
|
|
{
|
|
struct aio_ring *ring;
|
|
```
|
|
|
|
Shadow variable existence can be verified before applying the *new* data
|
|
semantic of the associated object:
|
|
```
|
|
@@ -678,6 +681,8 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
|
|
put_rq:
|
|
/* everything turned out well, dispose of the aiocb. */
|
|
aio_put_req(iocb);
|
|
+ if (klp_shadow_get(ctx, KPATCH_SHADOW_REQS_ACTIVE_V2))
|
|
+ atomic_dec(&ctx->reqs_active);
|
|
|
|
/*
|
|
* We have to order our ring_info tail store above and test
|
|
```
|
|
|
|
Likewise, shadow variable non-existence can be tested to continue applying the
|
|
*old* data semantic:
|
|
```
|
|
@@ -310,7 +312,8 @@ static void free_ioctx(struct kioctx *ctx)
|
|
|
|
avail = (head <= ctx->tail ? ctx->tail : ctx->nr_events) - head;
|
|
|
|
- atomic_sub(avail, &ctx->reqs_active);
|
|
+ if (!klp_shadow_get(ctx, KPATCH_SHADOW_REQS_ACTIVE_V2))
|
|
+ atomic_sub(avail, &ctx->reqs_active);
|
|
head += avail;
|
|
head %= ctx->nr_events;
|
|
}
|
|
@@ -757,6 +762,8 @@ static long aio_read_events_ring(struct kioctx *ctx,
|
|
pr_debug("%li h%u t%u\n", ret, head, ctx->tail);
|
|
|
|
atomic_sub(ret, &ctx->reqs_active);
|
|
+ if (!klp_shadow_get(ctx, KPATCH_SHADOW_REQS_ACTIVE_V2))
|
|
+ atomic_sub(ret, &ctx->reqs_active);
|
|
out:
|
|
mutex_unlock(&ctx->ring_lock);
|
|
```
|
|
|
|
The previous example can be extended to use shadow variable storage to handle
|
|
locking semantic changes. Consider the [upstream fix](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1d147bfa64293b2723c4fec50922168658e613ba)
|
|
for CVE-2014-2706, which added a `ps_lock` to `struct sta_info` to protect
|
|
critical sections throughout `net/mac80211/sta_info.c`.
|
|
|
|
When allocating a new `struct sta_info`, allocate a corresponding shadow
|
|
variable large enough to hold a `spinlock_t` instance, then initialize the
|
|
spinlock:
|
|
```
|
|
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
|
|
index decd30c1e290..758533dda4d8 100644
|
|
--- a/net/mac80211/sta_info.c
|
|
+++ b/net/mac80211/sta_info.c
|
|
@@ -287,6 +287,8 @@ static int sta_prepare_rate_control(struct ieee80211_local *local,
|
|
return 0;
|
|
}
|
|
|
|
+#include <linux/livepatch.h>
|
|
+#define KPATCH_SHADOW_PS_LOCK 2
|
|
struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
|
|
const u8 *addr, gfp_t gfp)
|
|
{
|
|
@@ -295,6 +297,7 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
|
|
struct timespec uptime;
|
|
struct ieee80211_tx_latency_bin_ranges *tx_latency;
|
|
int i;
|
|
+ spinlock_t *ps_lock;
|
|
|
|
sta = kzalloc(sizeof(*sta) + local->hw.sta_data_size, gfp);
|
|
if (!sta)
|
|
@@ -330,6 +333,10 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
|
|
rcu_read_unlock();
|
|
|
|
spin_lock_init(&sta->lock);
|
|
+ ps_lock = klp_shadow_alloc(sta, KPATCH_SHADOW_PS_LOCK,
|
|
+ sizeof(*ps_lock), gfp, NULL, NULL);
|
|
+ if (ps_lock)
|
|
+ spin_lock_init(ps_lock);
|
|
INIT_WORK(&sta->drv_unblock_wk, sta_unblock);
|
|
INIT_WORK(&sta->ampdu_mlme.work, ieee80211_ba_session_work);
|
|
mutex_init(&sta->ampdu_mlme.mtx);
|
|
```
|
|
|
|
Patched code can reference the shadow variable associated with a given `struct
|
|
sta_info` to determine and apply the correct locking semantic for that
|
|
instance:
|
|
```
|
|
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
|
|
index 97a02d3f7d87..0edb0ed8dc60 100644
|
|
--- a/net/mac80211/tx.c
|
|
+++ b/net/mac80211/tx.c
|
|
@@ -459,12 +459,15 @@ static int ieee80211_use_mfp(__le16 fc, struct sta_info *sta,
|
|
return 1;
|
|
}
|
|
|
|
+#include <linux/livepatch.h>
|
|
+#define KPATCH_SHADOW_PS_LOCK 2
|
|
static ieee80211_tx_result
|
|
ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx)
|
|
{
|
|
struct sta_info *sta = tx->sta;
|
|
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb);
|
|
struct ieee80211_local *local = tx->local;
|
|
+ spinlock_t *ps_lock;
|
|
|
|
if (unlikely(!sta))
|
|
return TX_CONTINUE;
|
|
@@ -478,6 +481,23 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx)
|
|
sta->sta.addr, sta->sta.aid, ac);
|
|
if (tx->local->total_ps_buffered >= TOTAL_MAX_TX_BUFFER)
|
|
purge_old_ps_buffers(tx->local);
|
|
+
|
|
+ /* sync with ieee80211_sta_ps_deliver_wakeup */
|
|
+ ps_lock = klp_shadow_get(sta, KPATCH_SHADOW_PS_LOCK);
|
|
+ if (ps_lock) {
|
|
+ spin_lock(ps_lock);
|
|
+ /*
|
|
+ * STA woke up the meantime and all the frames on ps_tx_buf have
|
|
+ * been queued to pending queue. No reordering can happen, go
|
|
+ * ahead and Tx the packet.
|
|
+ */
|
|
+ if (!test_sta_flag(sta, WLAN_STA_PS_STA) &&
|
|
+ !test_sta_flag(sta, WLAN_STA_PS_DRIVER)) {
|
|
+ spin_unlock(ps_lock);
|
|
+ return TX_CONTINUE;
|
|
+ }
|
|
+ }
|
|
+
|
|
if (skb_queue_len(&sta->ps_tx_buf[ac]) >= STA_MAX_TX_BUFFER) {
|
|
struct sk_buff *old = skb_dequeue(&sta->ps_tx_buf[ac]);
|
|
ps_dbg(tx->sdata,
|
|
```
|
|
|
|
Init code changes
|
|
-----------------
|
|
|
|
Any code which runs in an `__init` function or during module or device
|
|
initialization is problematic, as it may have already run before the patch was
|
|
applied. The patch may require a pre-patch callback which detects whether such
|
|
init code has run, and which rewrites or changes the original initialization to
|
|
force it into the desired state. Some changes involving hardware init are
|
|
inherently incompatible with live patching.
|
|
|
|
Header file changes
|
|
-------------------
|
|
|
|
When changing header files, be extra careful. If data is being changed, you
|
|
probably need to modify the patch. See "Data struct changes" above.
|
|
|
|
If a function prototype is being changed, make sure it's not an exported
|
|
function. Otherwise it could break out-of-tree modules. One way to
|
|
workaround this is to define an entirely new copy of the function (with
|
|
updated code) and patch in-tree callers to invoke it rather than the
|
|
deprecated version.
|
|
|
|
Many header file changes result in a complete rebuild of the kernel tree, which
|
|
makes kpatch-build have to compare every .o file in the kernel. It slows the
|
|
build down a lot, and can even fail to build if kpatch-build has any bugs
|
|
lurking. If it's a trivial header file change, like adding a macro, it's
|
|
advisable to just move that macro into the .c file where it's needed to avoid
|
|
changing the header file at all.
|
|
|
|
Dealing with unexpected changed functions
|
|
-----------------------------------------
|
|
|
|
In general, it's best to patch as minimally as possible. If kpatch-build is
|
|
reporting some unexpected function changes, it's always a good idea to try to
|
|
figure out why it thinks they changed. In many cases you can change the source
|
|
patch so that they no longer change.
|
|
|
|
Some examples:
|
|
|
|
* If a changed function was inlined, then the callers which inlined the
|
|
function will also change. In this case there's nothing you can do to
|
|
prevent the extra changes.
|
|
|
|
* If a changed function was originally inlined, but turned into a callable
|
|
function after patching, consider adding `__always_inline` to the function
|
|
definition. Likewise, if a function is only inlined after patching,
|
|
consider using `noinline` to prevent the compiler from doing so.
|
|
|
|
* If your patch adds a call to a function where the original version of the
|
|
function's ELF symbol has a .constprop or .isra suffix, and the corresponding
|
|
patched function doesn't, that means the patch caused gcc to no longer
|
|
perform an interprocedural optimization, which affects the function and all
|
|
its callers. If you want to prevent this from happening, copy/paste the
|
|
function with a new name and call the new function from your patch.
|
|
|
|
* Moving around source code lines can introduce unique instructions if any
|
|
`__LINE__` preprocessor macros are in use. This can be mitigated by adding
|
|
any new functions to the bottom of source files, using newline whitespace to
|
|
maintain original line counts, etc. A more exact fix can be employed by
|
|
modifying the source code that invokes `__LINE__` and hard-coding the
|
|
original line number in place.
|
|
|
|
Removing references to static local variables
|
|
---------------------------------------------
|
|
|
|
Removing references to static locals will fail to patch unless extra steps are taken.
|
|
Static locals are basically global variables because they outlive the function's
|
|
scope. They need to be correlated so that the new function will use the old static
|
|
local. That way patching the function doesn't inadvertently reset the variable
|
|
to zero; instead the variable keeps its old value.
|
|
|
|
To work around this limitation one needs to retain the reference to the static local.
|
|
This might be as simple as adding the variable back in the patched function in a
|
|
non-functional way and ensuring the compiler doesn't optimize it away.
|
|
|
|
Code removal
|
|
------------
|
|
|
|
Some fixes may replace or completely remove functions and references
|
|
to them. Remember that kpatch modules can only add new functions and
|
|
redirect existing functions, so "removed" functions will continue to exist in
|
|
kernel address space as effectively dead code.
|
|
|
|
That means this patch (source code removal of `cmdline_proc_show`):
|
|
```
|
|
diff -Nupr src.orig/fs/proc/cmdline.c src/fs/proc/cmdline.c
|
|
--- src.orig/fs/proc/cmdline.c 2016-11-30 19:39:49.317737234 +0000
|
|
+++ src/fs/proc/cmdline.c 2016-11-30 19:39:52.696737234 +0000
|
|
@@ -3,15 +3,15 @@
|
|
#include <linux/proc_fs.h>
|
|
#include <linux/seq_file.h>
|
|
|
|
-static int cmdline_proc_show(struct seq_file *m, void *v)
|
|
-{
|
|
- seq_printf(m, "%s\n", saved_command_line);
|
|
- return 0;
|
|
-}
|
|
+static int cmdline_proc_show_v2(struct seq_file *m, void *v)
|
|
+{
|
|
+ seq_printf(m, "%s kpatch\n", saved_command_line);
|
|
+ return 0;
|
|
+}
|
|
|
|
static int cmdline_proc_open(struct inode *inode, struct file *file)
|
|
{
|
|
- return single_open(file, cmdline_proc_show, NULL);
|
|
+ return single_open(file, cmdline_proc_show_v2, NULL);
|
|
}
|
|
|
|
static const struct file_operations cmdline_proc_fops = {
|
|
```
|
|
will generate an equivalent kpatch module to this patch (dead
|
|
`cmdline_proc_show` left in source):
|
|
```
|
|
diff -Nupr src.orig/fs/proc/cmdline.c src/fs/proc/cmdline.c
|
|
--- src.orig/fs/proc/cmdline.c 2016-11-30 19:39:49.317737234 +0000
|
|
+++ src/fs/proc/cmdline.c 2016-11-30 19:39:52.696737234 +0000
|
|
@@ -9,9 +9,15 @@ static int cmdline_proc_show(struct seq_
|
|
return 0;
|
|
}
|
|
|
|
+static int cmdline_proc_show_v2(struct seq_file *m, void *v)
|
|
+{
|
|
+ seq_printf(m, "%s kpatch\n", saved_command_line);
|
|
+ return 0;
|
|
+}
|
|
+
|
|
static int cmdline_proc_open(struct inode *inode, struct file *file)
|
|
{
|
|
- return single_open(file, cmdline_proc_show, NULL);
|
|
+ return single_open(file, cmdline_proc_show_v2, NULL);
|
|
}
|
|
|
|
static const struct file_operations cmdline_proc_fops = {
|
|
```
|
|
In both versions, `kpatch-build` will determine that only
|
|
`cmdline_proc_open` has changed and that `cmdline_proc_show_v2` is a
|
|
new function.
|
|
|
|
In some patching cases it might be necessary to completely remove the original
|
|
function to avoid the compiler complaining about a defined, but unused
|
|
function. This will depend on symbol scope and kernel build options.
|
|
|
|
"Once" macros
|
|
-------------
|
|
|
|
When adding a call to `printk_once()`, `pr_warn_once()`, or any other "once"
|
|
variation of `printk()`, you'll get the following eror:
|
|
|
|
```
|
|
ERROR: vmx.o: 1 unsupported section change(s)
|
|
vmx.o: WARNING: unable to correlate static local variable __print_once.60588 used by vmx_update_pi_irte, assuming variable is new
|
|
vmx.o: changed function: vmx_update_pi_irte
|
|
vmx.o: data section .data..read_mostly selected for inclusion
|
|
/usr/lib/kpatch/create-diff-object: unreconcilable difference
|
|
```
|
|
This error occurs because the `printk_once()` adds a static local variable to
|
|
the `.data..read_mostly` section. kpatch-build strict disallows any changes to
|
|
that section, because in some cases a change to this section indicates a bug.
|
|
|
|
To work around this issue, you'll need to manually implement your own "once"
|
|
logic which doesn't store the static variable in the `.data..read_mostly`
|
|
section.
|
|
|
|
For example, a `pr_warn_once()` can be replaced with:
|
|
```
|
|
static bool print_once;
|
|
...
|
|
if (!print_once) {
|
|
print_once = true;
|
|
pr_warn("...");
|
|
}
|
|
```
|
|
|
|
inline implies notrace
|
|
----------------------
|
|
|
|
The linux kernel defines its own version of "inline" in
|
|
include/linux/compiler_types.h which includes "notrace" as well:
|
|
|
|
```
|
|
#if !defined(CONFIG_OPTIMIZE_INLINING)
|
|
#define inline inline __attribute__((__always_inline__)) __gnu_inline \
|
|
__inline_maybe_unused notrace
|
|
#else
|
|
#define inline inline __gnu_inline \
|
|
__inline_maybe_unused notrace
|
|
#endif
|
|
```
|
|
|
|
With the implicit "notrace", use of "inline" in patch sources may lead
|
|
to kpatch-build errors like the following:
|
|
|
|
1. `__tcp_mtu_to_mss()` is marked as inline:
|
|
|
|
```
|
|
net/ipv4/tcp_output.c:
|
|
|
|
/* Calculate MSS not accounting any TCP options. */
|
|
static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu)
|
|
{
|
|
```
|
|
|
|
2. the compiler decides not to inline it and keeps it in its own
|
|
function-section. Then kpatch-build notices that it doesn't have an
|
|
fentry/mcount call:
|
|
|
|
```
|
|
% kpatch-build ...
|
|
|
|
tcp_output.o: function __tcp_mtu_to_mss has no fentry/mcount call, unable to patch
|
|
```
|
|
|
|
3. a peek at the generated code:
|
|
|
|
```
|
|
Disassembly of section .text.__tcp_mtu_to_mss:
|
|
|
|
0000000000000000 <__tcp_mtu_to_mss>:
|
|
0: 48 8b 87 60 05 00 00 mov 0x560(%rdi),%rax
|
|
7: 0f b7 50 30 movzwl 0x30(%rax),%edx
|
|
b: 0f b7 40 32 movzwl 0x32(%rax),%eax
|
|
f: 29 d6 sub %edx,%esi
|
|
11: 83 ee 14 sub $0x14,%esi
|
|
...
|
|
```
|
|
|
|
This could be a little confusing since one might have expected to see
|
|
changes to all of `__tcp_mtu_to_mss()` callers (ie, it was inlined as
|
|
requested). In this case, a simple workaround is to specify
|
|
`__tcp_mtu_to_mss()` as `__always_inline` to force the compiler to do so.
|
|
|
|
Jump labels
|
|
-----------
|
|
|
|
When modifying a function that contains a jump label, kpatch-build may
|
|
return an error like: `ERROR: oom_kill.o: kpatch_regenerate_special_section: 2109: Found a jump label at out_of_memory()+0x10a, using key cpusets_enabled_key. Jump labels aren't currently supported. Use static_key_enabled() instead.`
|
|
|
|
This is due to a limitation in the kernel to process static key
|
|
livepatch relocations (resolved by late-module patching). Older
|
|
versions of kpatch-build may have reported successfully building
|
|
kpatch module, but issue
|
|
[#931](https://github.com/dynup/kpatch/issues/931) revealed potentially
|
|
dangerous behavior if the static key value had been modified from its
|
|
compiled default.
|
|
|
|
The current workaround is to remove the jump label by explictly checking
|
|
the static key:
|
|
|
|
```
|
|
DEFINE_STATIC_KEY_TRUE(true_key);
|
|
DEFINE_STATIC_KEY_FALSE(false_key);
|
|
|
|
/* unsupported */
|
|
if (static_key_true(&true_key))
|
|
if (static_key_false(&false_key))
|
|
if (static_branch_likely(&key))
|
|
|
|
/* supported */
|
|
if (static_key_enabled(&true_key))
|
|
if (static_key_enabled(&false_key))
|
|
if (likely(static_key_enabled(&key)))
|
|
```
|
|
|
|
Sibling calls
|
|
-------------
|
|
|
|
GCC may generate sibling calls that are incompatible with kpatch, resulting in
|
|
an error like: `ERROR("Found an unsupported sibling call at foo()+0x123. Add __attribute__((optimize("-fno-optimize-sibling-calls"))) to foo() definition."`
|
|
|
|
For example, if function A() calls function B() at the end of A() and both
|
|
return similar data-types, GCC may deem them "sibling calls" and apply a tail
|
|
call optimization in which A() restores the stack to is callee state before
|
|
setting up B()'s arguments and jumping to B().
|
|
|
|
This may be an issue for kpatches on PowerPC which modify only A() or B() and
|
|
the function call crosses a kernel module boundary: the sibling call
|
|
optimization has changed expected calling conventions and (un)patched code may
|
|
not be similarly modified.
|
|
|
|
Commit [8b952bd77130](https://github.com/dynup/kpatch/commit/8b952bd77130)
|
|
("create-diff-object/ppc64le: Don't allow sibling calls") contains an
|
|
excellent example and description of this problem with annotated disassembly.
|
|
|
|
Adding `__attribute__((optimize("-fno-optimize-sibling-calls")))` instructs
|
|
GCC to turn off the optimization for the given function.
|