kpatch/doc/patch-author-guide.md

19 KiB

kpatch Patch Author Guide

Because kpatch-build is relatively easy to use, it can be easy to assume that a successful patch module build means that the patch is safe to apply. But in fact that's a very dangerous assumption.

There are many pitfalls that can be encountered when creating a live patch. This document attempts to guide the patch creation process. It's a work in progress. If you find it useful, please contribute!

Patch Analysis

kpatch provides some guarantees, but it does not guarantee that all patches are safe to apply. Every patch must also be analyzed in-depth by a human.

The most important point here cannot be stressed enough. Here comes the bold:

Do not blindly apply patches. There is no subsitute for human analysis and reasoning on a per-patch basis. All patches must be thoroughly analyzed by a human kernel expert who completely understands the patch and the affected code and how they relate to the live patching environment.

kpatch vs livepatch vs kGraft

This document assumes that the kpatch core module is being used. Other live patching systems (e.g., livepatch and kGraft) have different consistency models. Each comes with its own guarantees, and there are some subtle differences. The guidance in this document applies only to kpatch.

Patch upgrades

Due to potential unexpected interactions between patches, it's highly recommended that when patching a system which has already been patched, the second patch should be a cumulative upgrade which is a superset of the first patch.

Data structure changes

kpatch patches functions, not data. If the original patch involves a change to a data structure, the patch will require some rework, as changes to data structures are not allowed by default.

Usually you have to get creative. There are several possible ways to handle this:

Change the code which uses the data structure

Sometimes, instead of changing the data structure itself, you can change the code which uses it.

For example, consider this patch. which has the following hunk:

@@ -3270,6 +3277,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_EXCP_BASE + PF_VECTOR]	= pf_interception,
 	[SVM_EXIT_EXCP_BASE + NM_VECTOR]	= nm_interception,
 	[SVM_EXIT_EXCP_BASE + MC_VECTOR]	= mc_interception,
+	[SVM_EXIT_EXCP_BASE + AC_VECTOR]	= ac_interception,
 	[SVM_EXIT_INTR]				= intr_interception,
 	[SVM_EXIT_NMI]				= nmi_interception,
 	[SVM_EXIT_SMI]				= nop_on_interception,

svm_exit_handlers[] is an array of function pointers. The patch adds a ac_interception function pointer to the array at index [SVM_EXIT_EXCP_BASE + AC_VECTOR]. That change is incompatible with kpatch.

Looking at the source file, we can see that this function pointer is only accessed by a single function, handle_exit():

        if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
            || !svm_exit_handlers[exit_code]) {
                WARN_ONCE(1, "svm: unexpected exit reason 0x%x\n", exit_code);
                kvm_queue_exception(vcpu, UD_VECTOR);
                return 1;
        }

        return svm_exit_handlers[exit_code](svm);

So an easy solution here is to just change the code to manually check for the new case before looking in the data structure:

@@ -3580,6 +3580,9 @@ static int handle_exit(struct kvm_vcpu *vcpu)
                return 1;
        }

+       if (exit_code == SVM_EXIT_EXCP_BASE + AC_VECTOR)
+               return ac_interception(svm);
+
        return svm_exit_handlers[exit_code](svm);
 }

Not only is this an easy solution, it's also safer than touching data since kpatch creates a barrier between the calling of old functions and new functions.

Use a kpatch load hook

If you need to change the contents of an existing variable in-place, you can use the KPATCH_LOAD_HOOK macro to specify a function to be called when the patch module is loaded.

kpatch-macros.h provides KPATCH_LOAD_HOOK and KPATCH_UNLOAD_HOOK macros to define such functions. The signature of both hook functions is void foo(void). Their execution context is as follows:

  • For patches to vmlinux or already loaded kernel modules, hook functions will be run by stop_machine as part of applying or removing a patch. (Therefore the hooks must not block or sleep.)

  • For patches to kernel modules which haven't been loaded yet, a module-notifier will execute load hooks when the associated module is loaded into the MODULE_STATE_COMING state. The load hook is called before any module_init code.

Example: a kpatch fix for CVE-2016-5389 utilized the KPATCH_LOAD_HOOK and KPATCH_UNLOAD_HOOK macros to modify variable sysctl_tcp_challenge_ack_limit in-place:

+static bool kpatch_write = false;
+void kpatch_load_tcp_send_challenge_ack(void)
+{
+	if (sysctl_tcp_challenge_ack_limit == 100) {
+		sysctl_tcp_challenge_ack_limit = 1000;
+		kpatch_write = true;
+	}
+}
+void kpatch_unload_tcp_send_challenge_ack(void)
+{
+	if (kpatch_write && sysctl_tcp_challenge_ack_limit == 1000)
+		sysctl_tcp_challenge_ack_limit = 100;
+}
+#include "kpatch-macros.h"
+KPATCH_LOAD_HOOK(kpatch_load_tcp_send_challenge_ack);
+KPATCH_UNLOAD_HOOK(kpatch_unload_tcp_send_challenge_ack);

Don't forget to protect access to the data as needed.

Also be careful when upgrading. If patch A has a load hook which writes to X, and then you load patch B which is a superset of A, in some cases you may want to prevent patch B from writing to X, if A is already loaded.

Use a shadow variable

If you need to add a field to an existing data structure, or even many existing data structures, you can use the kpatch_shadow_*() functions:

  • kpatch_shadow_alloc - allocates a new shadow variable associated with a given object
  • kpatch_shadow_get - find and return a pointer to a shadow variable
  • kpatch_shadow_free - find and free a shadow variable

Example: The shadow-newpid.patch integration test demonstrates the usage of these functions.

A shadow PID variable is allocated in do_fork() : it is associated with the current struct task_struct *p value, given a string lookup key of "newpid", sized accordingly, and allocated as per GFP_KERNEL flag rules.

kpatch_shadow_alloc returns a pointer to the shadow variable, so we can dereference and make assignments as usual. In this patch chunk, the shadow newpid is allocated then assigned to a rolling ctr counter value:

+		int *newpid;
+		static int ctr = 0;
+
+		newpid = kpatch_shadow_alloc(p, "newpid", sizeof(*newpid),
+					     GFP_KERNEL);
+		if (newpid)
+			*newpid = ctr++;

A shadow variable may also be accessed via kpatch_shadow_get. Here the patch modifies task_context_switch_counts() to fetch the shadow variable associated with the current struct task_struct *p object and a "newpid" tag. As in the previous patch chunk, the shadow variable pointer may be accessed as an ordinary pointer type:

+	int *newpid;
+
 	seq_put_decimal_ull(m, "voluntary_ctxt_switches:\t", p->nvcsw);
 	seq_put_decimal_ull(m, "\nnonvoluntary_ctxt_switches:\t", p->nivcsw);
 	seq_putc(m, '\n');
+
+	newpid = kpatch_shadow_get(p, "newpid");
+	if (newpid)
+		seq_printf(m, "newpid:\t%d\n", *newpid);

A shadow variable is freed by calling kpatch_shadow_free and providing the object / string key combination. Once freed, the shadow variable is not safe to access:

 	exit_task_work(tsk);
 	exit_thread(tsk);
 
+	kpatch_shadow_free(tsk, "newpid");
+
 	/*
 	 * Flush inherited counters to the parent - before the parent
 	 * gets woken up by child-exit notifications.

Notes:

  • kpatch_shadow_alloc initializes only shadow variable metadata. It allocates variable storage via kmalloc with the gfp_t flags it is given, but otherwise leaves the area untouched. Initialization of a shadow variable is the responsibility of the caller.
  • As soon as kpatch_shadow_alloc creates a shadow variable, its presence will be reported by kpatch_shadow_get. Care should be taken to avoid any potential race conditions between a kernel thread that allocates a shadow variable and concurrent threads that may attempt to use it.

Data semantic changes

Part of the stable-tree backport to fix CVE-2014-0206 changed the reference count semantic of struct kioctx.reqs_active. Associating a shadow variable to new instances of this structure can be used by patched code to handle both new (post-patch) and existing (pre-patch) instances.

(This example is trimmed to highlight this use-case. Boilerplate code is also required to allocate/free a shadow variable called "reqs_active_v2" whenever a new struct kioctx is created/released. No values are ever assigned to the shadow variable.)

Shadow variable existence can be verified before applying the new data semantic of the associated object:

@@ -678,6 +688,9 @@ void aio_complete(struct kiocb *iocb, lo
 put_rq:
 	/* everything turned out well, dispose of the aiocb. */
 	aio_put_req(iocb);
+	reqs_active_v2 = kpatch_shadow_get(ctx, "reqs_active_v2");
+	if (reqs_active_v2)
+		atomic_dec(&ctx->reqs_active);
 
 	/*
 	 * We have to order our ring_info tail store above and test

Likewise, shadow variable non-existence can be tested to continue applying the old data semantic:

@@ -705,6 +718,7 @@ static long aio_read_events_ring(struct
 	unsigned head, pos;
 	long ret = 0;
 	int copy_ret;
+	int *reqs_active_v2;
 
 	mutex_lock(&ctx->ring_lock);
 
@@ -756,7 +770,9 @@ static long aio_read_events_ring(struct
 
 	pr_debug("%li  h%u t%u\n", ret, head, ctx->tail);
 
-	atomic_sub(ret, &ctx->reqs_active);
+	reqs_active_v2 = kpatch_shadow_get(ctx, "reqs_active_v2");
+	if (!reqs_active_v2)
+		atomic_sub(ret, &ctx->reqs_active);
 out:
 	mutex_unlock(&ctx->ring_lock);

The previous example can be extended to use shadow variable storage to handle locking semantic changes. Consider the upstream fix for CVE-2014-2706, which added a ps_lock to struct sta_info to protect critical sections throughout net/mac80211/sta_info.c.

When allocating a new struct sta_info, allocate a corresponding "ps_lock" shadow variable large enough to hold a spinlock_t instance, then initialize the spinlock:

@@ -333,12 +336,16 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
 	struct sta_info *sta;
 	struct timespec uptime;
 	int i;
+	spinlock_t *ps_lock;
 
 	sta = kzalloc(sizeof(*sta) + local->hw.sta_data_size, gfp);
 	if (!sta)
 		return NULL;
 
 	spin_lock_init(&sta->lock);
+	ps_lock = kpatch_shadow_alloc(sta, "ps_lock", sizeof(*ps_lock), gfp);
+	if (ps_lock)
+		spin_lock_init(ps_lock);
 	INIT_WORK(&sta->drv_unblock_wk, sta_unblock);
 	INIT_WORK(&sta->ampdu_mlme.work, ieee80211_ba_session_work);
 	mutex_init(&sta->ampdu_mlme.mtx);

Patched code can reference the "ps_lock" shadow variable associated with a given struct sta_info to determine and apply the correct locking semantic for that instance:

@@ -471,6 +475,23 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx)
 		       sta->sta.addr, sta->sta.aid, ac);
 		if (tx->local->total_ps_buffered >= TOTAL_MAX_TX_BUFFER)
 			purge_old_ps_buffers(tx->local);
+
+		/* sync with ieee80211_sta_ps_deliver_wakeup */
+		ps_lock = kpatch_shadow_get(sta, "ps_lock");
+		if (ps_lock) {
+			spin_lock(ps_lock);
+			/*
+			 * STA woke up the meantime and all the frames on ps_tx_buf have
+			 * been queued to pending queue. No reordering can happen, go
+			 * ahead and Tx the packet.
+			 */
+			if (!test_sta_flag(sta, WLAN_STA_PS_STA) &&
+			    !test_sta_flag(sta, WLAN_STA_PS_DRIVER)) {
+				spin_unlock(ps_lock);
+				return TX_CONTINUE;
+			}
+		}
+
 		if (skb_queue_len(&sta->ps_tx_buf[ac]) >= STA_MAX_TX_BUFFER) {
 			struct sk_buff *old = skb_dequeue(&sta->ps_tx_buf[ac]);
 			ps_dbg(tx->sdata,

Init code changes

Any code which runs in an __init function or during module or device initialization is problematic, as it may have already run before the patch was applied. The patch may require a load hook function which detects whether such init code has run, and which rewrites or changes the original initialization to force it into the desired state. Some changes involving hardware init are inherently incompatible with live patching.

Header file changes

When changing header files, be extra careful. If data is being changed, you probably need to modify the patch. See "Data struct changes" above.

If a function prototype is being changed, make sure it's not an exported function. Otherwise it could break out-of-tree modules. One way to workaround this is to define an entirely new copy of the function (with updated code) and patch in-tree callers to invoke it rather than the deprecated version.

Many header file changes result in a complete rebuild of the kernel tree, which makes kpatch-build have to compare every .o file in the kernel. It slows the build down a lot, and can even fail to build if kpatch-build has any bugs lurking. If it's a trivial header file change, like adding a macro, it's advisable to just move that macro into the .c file where it's needed to avoid changing the header file at all.

Dealing with unexpected changed functions

In general, it's best to patch as minimally as possible. If kpatch-build is reporting some unexpected function changes, it's always a good idea to try to figure out why it thinks they changed. In many cases you can change the source patch so that they no longer change.

Some examples:

  • If a changed function was inlined, then the callers which inlined the function will also change. In this case there's nothing you can do to prevent the extra changes.

  • If a changed function was originally inlined, but turned into a callable function after patching, consider adding __always_inline to the function definition. Likewise, if a function is only inlined after patching, consider using noinline to prevent the compiler from doing so.

  • If your patch adds a call to a function where the original version of the function's ELF symbol has a .constprop or .isra suffix, and the corresponding patched function doesn't, that means the patch caused gcc to no longer perform an interprocedural optimization, which affects the function and all its callers. If you want to prevent this from happening, copy/paste the function with a new name and call the new function from your patch.

  • Moving around source code lines can introduce unique instructions if any __LINE__ preprocessor macros are in use. This can be mitigated by adding any new functions to the bottom of source files, using newline whitespace to maintain original line counts, etc. A more exact fix can be employed by modifying the source code that invokes __LINE__ and hard-coding the original line number in place.

Removing references to static local variables

Removing references to static locals will fail to patch unless extra steps are taken. Static locals are basically global variables because they outlive the function's scope. They need to be correlated so that the new function will use the old static local. That way patching the function doesn't inadvertently reset the variable to zero; instead the variable keeps its old value.

To work around this limitation one needs to retain the reference to the static local. This might be as simple as adding the variable back in the patched function in a non-functional way and ensuring the compiler doesn't optimize it away.

Code removal

Some fixes may replace or completely remove functions and references to them. Remember that kpatch modules can only add new functions and redirect existing functions, so "removed" functions will continue to exist in kernel address space as effectively dead code.

That means this patch (source code removal of cmdline_proc_show):

diff -Nupr src.orig/fs/proc/cmdline.c src/fs/proc/cmdline.c
--- src.orig/fs/proc/cmdline.c	2016-11-30 19:39:49.317737234 +0000
+++ src/fs/proc/cmdline.c	2016-11-30 19:39:52.696737234 +0000
@@ -3,15 +3,15 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
-static int cmdline_proc_show(struct seq_file *m, void *v)
-{
-	seq_printf(m, "%s\n", saved_command_line);
-	return 0;
-}
+static int cmdline_proc_show_v2(struct seq_file *m, void *v)
+{
+	seq_printf(m, "%s kpatch\n", saved_command_line);
+	return 0;
+}
 
 static int cmdline_proc_open(struct inode *inode, struct file *file)
 {
-	return single_open(file, cmdline_proc_show, NULL);
+	return single_open(file, cmdline_proc_show_v2, NULL);
 }
 
 static const struct file_operations cmdline_proc_fops = {

will generate an equivalent kpatch module to this patch (dead cmdline_proc_show left in source):

diff -Nupr src.orig/fs/proc/cmdline.c src/fs/proc/cmdline.c
--- src.orig/fs/proc/cmdline.c	2016-11-30 19:39:49.317737234 +0000
+++ src/fs/proc/cmdline.c	2016-11-30 19:39:52.696737234 +0000
@@ -9,9 +9,15 @@ static int cmdline_proc_show(struct seq_
 	return 0;
 }
 
+static int cmdline_proc_show_v2(struct seq_file *m, void *v)
+{
+	seq_printf(m, "%s kpatch\n", saved_command_line);
+	return 0;
+}
+
 static int cmdline_proc_open(struct inode *inode, struct file *file)
 {
-	return single_open(file, cmdline_proc_show, NULL);
+	return single_open(file, cmdline_proc_show_v2, NULL);
 }
 
 static const struct file_operations cmdline_proc_fops = {

In both versions, kpatch-build will determine that only cmdline_proc_open has changed and that cmdline_proc_show_v2 is a new function.

In some patching cases it might be necessary to completely remove the original function to avoid the compiler complaining about a defined, but unused function. This will depend on symbol scope and kernel build options.

Other issues

When adding a call to printk_once(), pr_warn_once(), or any other "once" variation of printk(), you'll get the following eror:

ERROR: vmx.o: 1 unsupported section change(s)
vmx.o: WARNING: unable to correlate static local variable __print_once.60588 used by vmx_update_pi_irte, assuming variable is new
vmx.o: changed function: vmx_update_pi_irte
vmx.o: data section .data..read_mostly selected for inclusion
/usr/lib/kpatch/create-diff-object: unreconcilable difference

This error occurs because the printk_once() adds a static local variable to the .data..read_mostly section. kpatch-build strict disallows any changes to that section, because in some cases a change to this section indicates a bug.

To work around this issue, you'll need to manually implement your own "once" logic which doesn't store the static variable in the .data..read_mostly section.

For example, a pr_warn_once() can be replaced with:

	static bool print_once;
	...
	if (!print_once) {
		print_once = true;
		pr_warn("...");
	}