The advantage of binding between Surface and Context.
by Hirokazu Honda
Hi,
According to libva document
(http://01org.github.io/libva_staging_doxygen/group__api__core.html#ga4af3...),
Surfaces are bound to a context if passing them as an argument when
creating the context.
Seeing intel-vaapi-driver code, the surfaces are just stored in
object_context.render_targets.
A surface processed by
vaBeginPicture()-vaRenderPicture()-vaEndPicture() are specified in
vaBeginPicture(). (object_context.current_render_target)
It looks like a surface can be processed using a context by being
specified in vaBeginPicture(), even if it is not bound to the context.
Here, my questions are below.
What is the advantage of binding?
In what circumstances do we need to associate the context with surfaces?
In which scenarios passing surfaces to vaCreateContext is required,
and in which it is not?
Best Regards,
Hirokazu Honda
3 years
[RFC 0/9] Per batch buffer balancing
by Tvrtko Ursulin
From: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Hi,
I have been working on some new i915 uAPI proposals which sounds interesting
for the intel-vaapi-driver project.
Primarily this is about changing the way VCS engines are selected at execbuf
time, adding the i915 feature to load-balance those batches, and allowing
userspace to opt-in to such behaviour.
You can find this patches here:
https://patchwork.freedesktop.org/series/33706/
This series contains proof of concept patches against intel-vaapi-driver to make
use of these proposed i915 features.
I am not familiar of the code base so there might be smaller or bigger mistakes
here, please be aware.
But in essence the code seems to pass the test shipped with the project and is
also able to utilize the features as intended when run for instance from ffmpeg:
root@sc:~/ffmpeg# VA_INTEL_CONCURRENT=0 perf stat -a -e i915/vcs0-busy/,i915/vcs1-busy/ ffmpeg -loglevel panic -hwaccel vaapi -hwaccel_output_format vaapi -i ~/bbb_sunflower_1080p_60fps_normal.mp4 -f null -
Performance counter stats for 'system wide':
57,568,097,358 ns i915/vcs0-busy/
0 ns i915/vcs1-busy/
57.585753514 seconds time elapsed
root@sc:~/ffmpeg# VA_INTEL_CONCURRENT=1 perf stat -a -e i915/vcs0-busy/,i915/vcs1-busy/ ffmpeg -loglevel panic -hwaccel vaapi -hwaccel_output_format vaapi -i ~/bbb_sunflower_1080p_60fps_normal.mp4 -f null -
Performance counter stats for 'system wide':
29,152,427,164 ns i915/vcs0-busy/
29,115,272,714 ns i915/vcs1-busy/
40.733992298 seconds time elapsed
This shows that decoding a single stream on a SKL GT4 part it is able to
utilize both VCS engines for a nice performance improvement.
It could also have a beneficial effect when multiple parallel streams are
decoded or encoded, but in this case it will depend on the exact streams.
Because the current i915 RFC does a trivial round-robin engine balancing only.
In the future we may decide to take the route of making the i915 scheduler
smarter in this respect.
Main question is whether you guys can see some flaws in this approach, both
design and implementation, and whether you would be interested in having such
extensions in your code base?
Tvrtko Ursulin (9):
Create a dedicated context
Mark the context as concurrent
Use engine class when allocating batch buffers
Use class and instance in batch override
Store class/instance in batch and use for flushing
Use class/instance based execbuf
Store engine features in batch
Feature based VCS override
Use engine features for HEVC decode
src/gen6_mfc.c | 6 +-
src/gen6_mfd.c | 3 +-
src/gen75_mfc.c | 6 +-
src/gen75_mfd.c | 3 +-
src/gen75_vpp_gpe.c | 3 +-
src/gen75_vpp_vebox.c | 3 +-
src/gen7_mfd.c | 3 +-
src/gen8_mfc.c | 10 ++-
src/gen8_mfd.c | 5 +-
src/gen9_hevc_encoder.c | 4 +-
src/gen9_mfc_hevc.c | 7 +-
src/gen9_mfd.c | 11 ++-
src/gen9_vdenc.c | 2 +-
src/gen9_vp9_encoder.c | 2 +-
src/i965_avc_encoder.c | 4 +-
src/i965_drv_video.c | 8 ++-
src/i965_encoder.c | 3 +-
src/i965_encoder_vp8.c | 6 +-
src/i965_media.c | 3 +-
src/i965_post_processing.c | 3 +-
src/intel_batchbuffer.c | 176 +++++++++++++++++++++++++--------------------
src/intel_batchbuffer.h | 99 ++++++++++++++++++++-----
src/intel_driver.c | 10 ++-
src/intel_driver.h | 3 +
src/intel_memman.c | 13 +++-
src/intel_memman.h | 2 +-
26 files changed, 264 insertions(+), 134 deletions(-)
--
2.14.1
4 years, 7 months