1
0
mirror of https://github.com/mpv-player/mpv synced 2025-01-11 09:29:29 +00:00

client API: fix potential deadlock problems by throwing more shit at it

The render API (vo_libmpv) had potential deadlock problems with
MPV_RENDER_PARAM_ADVANCED_CONTROL. This required vd-lavc-dr to be
enabled (the default). I never observed these deadlocks in the wild
(doesn't mean they didn't happen), although I could specifically provoke
them with some code changes.

The problem was mostly about DR (direct rendering, letting the video
decoder write to OpenGL buffer memory). Allocating/freeing a DR image
needs to be done on the OpenGL thread, even though _lots_ of threads are
involved with handling images. Freeing a DR image is a special case that
can happen any time. dr_helper.c does most of the evil magic of
achieving this. Unfortunately, there was a (sort of) circular lock
dependency: freeing an image while certain internal locks are held would
trigger the user's context update callback, which in turn would call
mpv_render_context_update(), which processed all pending free requests,
and then acquire an internal lock - which the caller might not release
until a further DR image could be freed.

"Solve" this by making freeing DR images asynchronous. This is slightly
risky, but actually not much. The DR images will be free'd eventually.
The biggest disadvantage is probably that debugging might get trickier.

Any solution to this problem will probably add images to free to some
sort of queue, and then process it later. I considered making this more
explicit (so there'd be a point where the caller forcibly waits for all
queued items to be free'd), but discarded these ideas as this probably
would only increase complexity.

Another consequence is that freeing DR images on the GL thread is not
synchronous anymore. Instead, it mpv_render_context_update() will do it
with a delay. This seems roundabout, but doesn't actually change
anything, and avoids additional code.

This also fixes that the render API required the render API user to
remain on the same thread, even though this wasn't documented. As such,
it was a bug. OpenGL essentially forces you to do all GL usage on a
single thread, but in theory the API user could for example move the GL
context to another thread.

The API bump is because I think you can't make enough noise about this.
Since we don't backport fixes to old versions, I'm specifically stating
that old versions are broken, and I'm supplying workarounds.

Internally, dr_helper_create() does not use pthread_self() anymore, thus
the vo.c change. I think it's better to make binding to the current
thread as explicit as possible.

Of course it's not sure that this fixes all deadlocks (probably not).
This commit is contained in:
wm4 2019-09-26 14:07:33 +02:00
parent 3ee6d7db4e
commit 4d43c79e4c
7 changed files with 81 additions and 32 deletions

View File

@ -32,6 +32,11 @@ API changes
::
--- mpv 0.30.0 ---
1.105 - Fix deadlock problems with MPV_RENDER_PARAM_ADVANCED_CONTROL and if
the "vd-lavc-dr" option is enabled (which it is by default).
There were no actual API changes.
API users on older API versions and mpv releases should set
"vd-lavc-dr" to "no" to avoid these issues.
1.104 - Deprecate struct mpv_opengl_drm_params. Replaced by mpv_opengl_drm_params_v2
- Deprecate MPV_RENDER_PARAM_DRM_DISPLAY. Replaced by MPV_RENDER_PARAM_DRM_DISPLAY_V2.
1.103 - redo handling of async commands

View File

@ -223,7 +223,7 @@ extern "C" {
* relational operators (<, >, <=, >=).
*/
#define MPV_MAKE_VERSION(major, minor) (((major) << 16) | (minor) | 0UL)
#define MPV_CLIENT_API_VERSION MPV_MAKE_VERSION(1, 104)
#define MPV_CLIENT_API_VERSION MPV_MAKE_VERSION(1, 105)
/**
* The API user is allowed to "#define MPV_ENABLE_DEPRECATED 0" before

View File

@ -79,16 +79,6 @@ extern "C" {
* is logged. If you set MPV_RENDER_PARAM_ADVANCED_CONTROL, you promise that
* this won't happen, and must absolutely guarantee it, or a real deadlock
* will freeze the mpv core thread forever.
* - if MPV_RENDER_PARAM_ADVANCED_CONTROL is used, you currently must call all
* mpv_render*() API functions from the same thread on which the
* mpv_render_context was returned by mpv_render_context_create(). This
* requirement always existed. Not honoring it will lead to UB (deadlocks,
* use of invalid pthread_t handles). This requirement might be removed in
* the future, but will require some considerable work internal to libmpv.
* You can avoid this issue by setting "vd-lavc-dr" to "no".
* - MPV_RENDER_PARAM_ADVANCED_CONTROL has some other libmpv-internal problems,
* which may result in random deadlocks (see top of vo_libmpv.c).
* You can probably avoid this issue by setting "vd-lavc-dr" to "no".
*
* libmpv functions which are safe to call from a render thread are:
* - functions marked with "Safe to be called from mpv render API threads."
@ -104,6 +94,18 @@ extern "C" {
* - if the mpv_handle parameter refers to a different mpv core than the one
* you're rendering for (very obscure, but allowed)
*
* Note about old libmpv version:
*
* Before API version 1.105 (basically in mpv 0.29.x), simply enabling
* MPV_RENDER_PARAM_ADVANCED_CONTROL could cause deadlock issues. This can
* be worked around by setting the "vd-lavc-dr" option to "no".
* In addition, you were required to call all mpv_render*() API functions
* from the same thread on which mpv_render_context_create() was originally
* run (for the same the mpv_render_context). Not honoring it led to UB
* (deadlocks, use of invalid pthread_t handles), even if you moved your GL
* context to a different thread correctly.
* These problems were addressed in API version 1.105 (mpv 0.30.0).
*
* Context and handle lifecycle
* ----------------------------
*

View File

@ -12,7 +12,10 @@
#include "dr_helper.h"
struct dr_helper {
pthread_mutex_t thread_lock;
pthread_t thread;
bool thread_valid; // (POSIX defines no "unset" pthread_t value yet)
struct mp_dispatch_queue *dispatch;
atomic_ullong dr_in_flight;
@ -28,6 +31,8 @@ static void dr_helper_destroy(void *ptr)
// All references must have been freed on destruction, or we'll have
// dangling pointers.
assert(atomic_load(&dr->dr_in_flight) == 0);
pthread_mutex_destroy(&dr->thread_lock);
}
struct dr_helper *dr_helper_create(struct mp_dispatch_queue *dispatch,
@ -38,15 +43,34 @@ struct dr_helper *dr_helper_create(struct mp_dispatch_queue *dispatch,
struct dr_helper *dr = talloc_ptrtype(NULL, dr);
talloc_set_destructor(dr, dr_helper_destroy);
*dr = (struct dr_helper){
.thread = pthread_self(),
.dispatch = dispatch,
.dr_in_flight = ATOMIC_VAR_INIT(0),
.get_image = get_image,
.get_image_ctx = get_image_ctx,
};
pthread_mutex_init(&dr->thread_lock, NULL);
return dr;
}
void dr_helper_acquire_thread(struct dr_helper *dr)
{
pthread_mutex_lock(&dr->thread_lock);
assert(!dr->thread_valid); // fails on API user errors
dr->thread_valid = true;
dr->thread = pthread_self();
pthread_mutex_unlock(&dr->thread_lock);
}
void dr_helper_release_thread(struct dr_helper *dr)
{
pthread_mutex_lock(&dr->thread_lock);
// Fails on API user errors.
assert(dr->thread_valid);
assert(pthread_equal(dr->thread, pthread_self()));
dr->thread_valid = false;
pthread_mutex_unlock(&dr->thread_lock);
}
struct free_dr_context {
struct dr_helper *dr;
AVBufferRef *ref;
@ -66,13 +90,19 @@ static void dr_thread_free(void *ptr)
static void free_dr_buffer_on_dr_thread(void *opaque, uint8_t *data)
{
struct free_dr_context *ctx = opaque;
struct dr_helper *dr = ctx->dr;
pthread_mutex_lock(&dr->thread_lock);
bool on_this_thread =
dr->thread_valid && pthread_equal(ctx->dr->thread, pthread_self());
pthread_mutex_unlock(&dr->thread_lock);
// The image could be unreffed even on the DR thread. In practice, this
// matters most on DR destruction.
if (pthread_equal(ctx->dr->thread, pthread_self())) {
if (on_this_thread) {
dr_thread_free(ctx);
} else {
mp_dispatch_run(ctx->dr->dispatch, dr_thread_free, ctx);
mp_dispatch_enqueue(dr->dispatch, dr_thread_free, ctx);
}
}

View File

@ -3,18 +3,35 @@
// This is a helper for implementing thread-safety for DR callbacks. These need
// to allocate GPU buffers on the GPU thread (e.g. OpenGL with its forced TLS),
// and the buffers also need to be freed on the GPU thread.
// This is not a helpful "Dr.", rather it represents Satan in form of C code.
struct dr_helper;
struct mp_image;
struct mp_dispatch_queue;
// This MUST be called on the "target" thread (it will call pthread_self()).
// dr_helper_get_image() calls will use the dispatch queue to run get_image on
// the target thread too.
// a target thread, which processes the dispatch queue.
// Note: the dispatch queue must process outstanding async. work before the
// dr_helper instance can be destroyed.
struct dr_helper *dr_helper_create(struct mp_dispatch_queue *dispatch,
struct mp_image *(*get_image)(void *ctx, int imgfmt, int w, int h,
int stride_align),
void *get_image_ctx);
// Make DR release calls (freeing images) reentrant if they are called on this
// (pthread_self()) thread. That means any free call will directly release the
// image as allocated with get_image().
// Only 1 thread can use this at a time. Note that it would make no sense to
// call this on more than 1 thread, as get_image is assumed not thread-safe.
void dr_helper_acquire_thread(struct dr_helper *dr);
// This _must_ be called on the same thread as dr_helper_acquire_thread() was
// called. Every release call must be paired with an acquire call.
void dr_helper_release_thread(struct dr_helper *dr);
// Allocate an image by running the get_image callback on the target thread.
// Always blocks on dispatch queue processing. This implies there is no way to
// allocate a DR'ed image on the render thread (at least not in a way which
// actually works if you want foreign threads to be able to free them).
struct mp_image *dr_helper_get_image(struct dr_helper *dr, int imgfmt,
int w, int h, int stride_align);

View File

@ -1046,8 +1046,10 @@ static void *vo_thread(void *ptr)
mpthread_set_name("vo");
if (vo->driver->get_image)
if (vo->driver->get_image) {
in->dr_helper = dr_helper_create(in->dispatch, get_image_vo, vo);
dr_helper_acquire_thread(in->dr_helper);
}
int r = vo->driver->preinit(vo) ? -1 : 0;
mp_rendezvous(vo, r); // init barrier

View File

@ -46,21 +46,9 @@
* And: render thread > VO (wait for present)
* VO > render thread (wait for present done, via timeout)
*
* Inherent locking bugs with advanced_control:
*
* In theory, it can deadlock on ctx->lock. Consider if the VO thread calls
* forget_frames(), which it does under ctx->lock (e.g. VOCTRL_RESET).
* If a frame was a DR image, dr_helper.c will call mp_dispatch_run().
* This in turn will call the wakeup callback set with
* mpv_render_context_set_update_callback(). The API user will eventually
* call mpv_render_context_update(), which performs the dispatch queue work
* queued by dr_helper.c.
* And then the function tries to acquire ctx->lock. This can deadlock
* under specific circumstances. It will _not_ deadlock if the queued work
* made the caller release the lock. However, if the caller made queue
* some more work (like freeing a second frame), and will keep the lock
* until it gets a reply. Both threads will wait on each other, and no
* progress can be made.
* Locking gets more complex with advanced_control enabled. Use
* mpv_render_context.dispatch with care; synchronous calls can add lock
* dependencies.
*/
struct vo_priv {
@ -306,6 +294,11 @@ void mpv_render_context_free(mpv_render_context *ctx)
assert(!atomic_load(&ctx->in_use));
assert(!ctx->vo);
// With the dispatch queue not being served anymore, allow frame free
// requests from this thread to be served directly.
if (ctx->dr)
dr_helper_acquire_thread(ctx->dr);
// Possibly remaining outstanding work.
mp_dispatch_queue_process(ctx->dispatch, 0);