mpv/video/out/vulkan/malloc.h

#pragma once

#include "common.h"

void vk_malloc_init(struct mpvk_ctx *vk);
void vk_malloc_uninit(struct mpvk_ctx *vk);

// Represents a single "slice" of generic (non-buffer) memory, plus some
// metadata for accounting. This struct is essentially read-only.
struct vk_memslice {
    VkDeviceMemory vkmem;
    size_t offset;
    size_t size;
    void *priv;
};

void vk_free_memslice(struct mpvk_ctx *vk, struct vk_memslice slice);
bool vk_malloc_generic(struct mpvk_ctx *vk, VkMemoryRequirements reqs,
                       VkMemoryPropertyFlags flags, struct vk_memslice *out);

// Represents a single "slice" of a larger buffer
struct vk_bufslice {
    struct vk_memslice mem; // must be freed by the user when done
    VkBuffer buf;           // the buffer this memory was sliced from
    // For persistently mapped buffers, this points to the first usable byte of
    // this slice.
    void *data;
};

// Allocate a buffer slice. This is more efficient than vk_malloc_generic for
// when the user needs lots of buffers, since it doesn't require
// creating/destroying lots of (little) VkBuffers.
bool vk_malloc_buffer(struct mpvk_ctx *vk, VkBufferUsageFlags bufFlags,
                      VkMemoryPropertyFlags memFlags, VkDeviceSize size,
                      VkDeviceSize alignment, struct vk_bufslice *out);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00			`#pragma once`

			`#include "common.h"`

			`void vk_malloc_init(struct mpvk_ctx *vk);`
			`void vk_malloc_uninit(struct mpvk_ctx *vk);`

			`// Represents a single "slice" of generic (non-buffer) memory, plus some`
			`// metadata for accounting. This struct is essentially read-only.`
			`struct vk_memslice {`
			`VkDeviceMemory vkmem;`
			`size_t offset;`
			`size_t size;`
			`void *priv;`
			`};`

			`void vk_free_memslice(struct mpvk_ctx *vk, struct vk_memslice slice);`
			`bool vk_malloc_generic(struct mpvk_ctx *vk, VkMemoryRequirements reqs,`
vo_gpu: vulkan: normalize use of Flags and FlagBits FlagBits is just the name of the enum. The actual data type representing a combination of these flags follows the Flags convention. (The relevant difference is that the latter is defined to be uint32_t instead of left implicit) For consistency, use Flags everywhere instead of randomly switching between Flags and FlagBits. Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker 2017-09-26 22:24:03 +00:00			`VkMemoryPropertyFlags flags, struct vk_memslice *out);`
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00
			`// Represents a single "slice" of a larger buffer`
			`struct vk_bufslice {`
			`struct vk_memslice mem; // must be freed by the user when done`
			`VkBuffer buf; // the buffer this memory was sliced from`
			`// For persistently mapped buffers, this points to the first usable byte of`
			`// this slice.`
			`void *data;`
			`};`

			`// Allocate a buffer slice. This is more efficient than vk_malloc_generic for`
			`// when the user needs lots of buffers, since it doesn't require`
			`// creating/destroying lots of (little) VkBuffers.`
vo_gpu: vulkan: normalize use of Flags and FlagBits FlagBits is just the name of the enum. The actual data type representing a combination of these flags follows the Flags convention. (The relevant difference is that the latter is defined to be uint32_t instead of left implicit) For consistency, use Flags everywhere instead of randomly switching between Flags and FlagBits. Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker 2017-09-26 22:24:03 +00:00			`bool vk_malloc_buffer(struct mpvk_ctx *vk, VkBufferUsageFlags bufFlags,`
			`VkMemoryPropertyFlags memFlags, VkDeviceSize size,`
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00			`VkDeviceSize alignment, struct vk_bufslice *out);`