libabigail/include/abg-ctf-reader.h

57 lines
1.4 KiB
C
Raw Normal View History

Add support for the CTF debug format to libabigail. CTF (C Type Format) is a lightweight debugging format that provides information about C types and the association between functions and data symbols and types. It is designed to be very compact and simple. More can be learned about it at https://ctfstd.org. This patch introduces support in libabigail to extract ABI information from CTF stored in ELF files. A few notes on this implementation: - The implementation is complete in terms of CTF support. Every CTF feature is processed and handled to generate libabigail IR. This includes basic types, typedefs, pointer, array and struct types. The CTF record of data objects (variables) and functions are also used in order to generate the corresponding libabigail IR artifacts. - The decoding of CTF data is done using the libctf library which is part of binutils. In order to link with it, binutils shall be built with --enable-shared for libctf.so to become available. - This initial implementation is aimed to simplicity. We have not tried to resolve any and every corner case that may require special handling. We have observed that the DWARF front-end (which is naturally way more complex as the scope is way bigger) is plagued with hacks to handle such situations. However, for the CTF support we prefer to proceed in a simpler and more modest way: we will handle these problems if/when we find them. The fact that CTF only supports C (currently) certainly helps there. - Likewise, in this basic support we are not handling symbol suppressions or other goodies that libabigail provides. We are new to libabigail and ABI analysis, and at this point we simply don't have a clear picture about what is most useful/relevant to support or not. With the maintainer's blesssing, we will tackle that functionaly after this basic support is applied upstream. - The implementation in abg-ctf-reader.{cc,h} is pretty much self-contained. As a result there is some duplication in terms of ELF handling with the DWARF reader, but since that logic is very simple and can be easily implemented, we don't consider this to be a big deal (for now.) Hopefully the maintainers agree. - The libabigail tools assume that ELF means to always use DWARF to generate the ABI IR. We added a new command-line option --ctf to the tools in order to make them to use the CTF debug info instead. We are definitely not sure whether this is the best user interface. In fact I would be suprised if it was ;) - We added support for --ctf to both abilint and abidiff. We are not sure whether it would make sense to add support for CTF to the other tools. Feedback welcome. - We are pondering about what to do in terms of testing. We have cursory tested this implementation using abilint and abidiff. We know we are generating IR corpus that seem to be ok. It would be good however to be able to run the libabigail testsuites using CTF. However the testsuites may need some non-trivial changes in order to make this possible. Let's talk about that :) * configure.ac: Check for libctf. * src/abg-ctf-reader.cc: New file. * include/abg-ctf-reader.h: Likewise. * src/Makefile.am (libabigail_la_SOURCES): Add abg-ctf-reader.cc conditionally. * include/Makefile.am (pkginclude_HEADERS): Add abg-ctf-reader.h conditionally. * tools/abilint.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * tools/abidiff.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * doc/manuals/abidiff.rst: Document --ctf. * doc/manuals/abilint.rst: Likewise. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-10-28 22:51:32 +00:00
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// -*- Mode: C++ -*-
//
// Copyright (C) 2021 Oracle, Inc.
//
// Author: Jose E. Marchesi
/// @file
///
/// This file contains the declarations of the entry points to
/// de-serialize an instance of @ref abigail::corpus from a file in
/// elf format, containing CTF information.
#ifndef __ABG_CTF_READER_H__
#define __ABG_CTF_READER_H__
#include <ostream>
#include "abg-corpus.h"
#include "abg-suppression.h"
#include "abg-elf-reader-common.h"
Add support for the CTF debug format to libabigail. CTF (C Type Format) is a lightweight debugging format that provides information about C types and the association between functions and data symbols and types. It is designed to be very compact and simple. More can be learned about it at https://ctfstd.org. This patch introduces support in libabigail to extract ABI information from CTF stored in ELF files. A few notes on this implementation: - The implementation is complete in terms of CTF support. Every CTF feature is processed and handled to generate libabigail IR. This includes basic types, typedefs, pointer, array and struct types. The CTF record of data objects (variables) and functions are also used in order to generate the corresponding libabigail IR artifacts. - The decoding of CTF data is done using the libctf library which is part of binutils. In order to link with it, binutils shall be built with --enable-shared for libctf.so to become available. - This initial implementation is aimed to simplicity. We have not tried to resolve any and every corner case that may require special handling. We have observed that the DWARF front-end (which is naturally way more complex as the scope is way bigger) is plagued with hacks to handle such situations. However, for the CTF support we prefer to proceed in a simpler and more modest way: we will handle these problems if/when we find them. The fact that CTF only supports C (currently) certainly helps there. - Likewise, in this basic support we are not handling symbol suppressions or other goodies that libabigail provides. We are new to libabigail and ABI analysis, and at this point we simply don't have a clear picture about what is most useful/relevant to support or not. With the maintainer's blesssing, we will tackle that functionaly after this basic support is applied upstream. - The implementation in abg-ctf-reader.{cc,h} is pretty much self-contained. As a result there is some duplication in terms of ELF handling with the DWARF reader, but since that logic is very simple and can be easily implemented, we don't consider this to be a big deal (for now.) Hopefully the maintainers agree. - The libabigail tools assume that ELF means to always use DWARF to generate the ABI IR. We added a new command-line option --ctf to the tools in order to make them to use the CTF debug info instead. We are definitely not sure whether this is the best user interface. In fact I would be suprised if it was ;) - We added support for --ctf to both abilint and abidiff. We are not sure whether it would make sense to add support for CTF to the other tools. Feedback welcome. - We are pondering about what to do in terms of testing. We have cursory tested this implementation using abilint and abidiff. We know we are generating IR corpus that seem to be ok. It would be good however to be able to run the libabigail testsuites using CTF. However the testsuites may need some non-trivial changes in order to make this possible. Let's talk about that :) * configure.ac: Check for libctf. * src/abg-ctf-reader.cc: New file. * include/abg-ctf-reader.h: Likewise. * src/Makefile.am (libabigail_la_SOURCES): Add abg-ctf-reader.cc conditionally. * include/Makefile.am (pkginclude_HEADERS): Add abg-ctf-reader.h conditionally. * tools/abilint.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * tools/abidiff.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * doc/manuals/abidiff.rst: Document --ctf. * doc/manuals/abilint.rst: Likewise. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-10-28 22:51:32 +00:00
ctf-reader: Add support to read CTF information from the Linux Kernel This patch is meant to extract ABI information from the CTF data stored in the Linux kernel build directory. It depends on the vmlinux.ctfa archive file. In order to generate the CTF information, the Linux Kernel build system must support the 'make ctf' command, which causes the compiler to be run with -gctf, thus emitting the CTF information for the Kernel. The target 'ctf' in the Linux Makefile generates a 'vmlinux.ctfa' file that will be used by the ctf reader in libabigail. The 'vmlinux.ctfa' archive has multiple 'ctf dictionaries' called "CTF archive members". There is one CTF archive member for built-in kernel modules (in `vmlinux') and one for each out-of-tree kernel module organized in a parent-child hierarchy. There is also a CTF archive member called `shared_ctf' which is a parent dictionary containing shared symbols and CTF types used by more than one kernel object. These common types are stored in 'types_map' in the ctf reader, ignoring the ctf dictionary name. The CTF API has the machinery for looking for a shared type in the parent dictionary referred to in a given child dictionary. This CTF layout can be dumped by using the objdump tool. Due to the fact that the _same_ ctf archive is used to build the vmlinux corpus the corpora of the kernel module (which, by the way, all belong to the same corpus group), the high number of open/close on the CTF archive is very time consuming during the ctf extraction. So, the performance is improved up to 300% (from ~2m:50s to ~50s) by keeping the ctf archive open for a given group, and thus, by using the same ctf_archive_t pointer while building all the various corpora. We just invoke `reset_read_context' for each new corpus. Note that the `read_context::ctfa` data member should be updated if the corpus::origin data member is set to `LINUX_KERNEL_BINARY_ORIGIN' and the file to be process is not 'vmlinux'. Note that `ctf_close' must be called after processing all group's members so it is executed from the destructor of `reader_context'. The basic algorithm used to generate the Linux corpus is the following: 1. Looking for: vmlinux, *.ko objects, and vmlinux.ctfa files. The first files are used to extract the ELF symbols, and the last one contains the CTF type information for non-static variables and functions symbols. 2. `process_ctf_archive' iterates on public symbols for vmlinux and its modules, using the name of the symbol, ctf reader search for CTF information in its dictionary, if the information was found it builds a `var_decl' or `function_decl' depending of `ctf_type_kind' result. This algorithm is also applied to ELF files (exec, dyn, rel), so instead of iterating on all ctf_types it just loops on the public symbols. * abg-elf-reader-common.h: Include ctf-api.h file. (read_and_add_corpus_to_group_from_elf, set_read_context_corpus_group) (reset_read_context, dic_type_key): Declare new member functions. * include/abg-ir.cc (types_defined_same_linux_kernel_corpus_public): Use bitwise to know the corpus `origin'. * src/abg-ctf-reader.cc: Include map, algorithms header files. (read_context::type_map): Change from unordered_map to std::map storing ctf dictionary name as part of the key. (read_context::is_elf_exec): Add new member variable. (read_context::{cur_corpus_, cur_corpus_group_}): Likewise. (read_context::unknown_types_set): Likewise. (read_context::{current_corpus_group, main_corpus_from_current_group, has_corpus_group, current_corpus_is_main_corpus_from_current_group, should_reuse_type_from_corpus_group}): Add new member functions. (read_context::{add_unknown_type, lookup_unknown_type, initialize}): Likewise. (read_context::{add_type, lookup_type}): Add new `ctf_dict_t' type argument. (ctf_reader::{process_ctf_typedef, process_ctf_base_type, process_ctf_function_type, process_ctf_forward_type, process_ctf_struct_type, process_ctf_union_type, process_ctf_array_type, process_ctf_qualified_type, process_ctf_enum_type}): Add code to `reuse' types already registered in main corpus `should_reuse_type_from_corpus_group'. Use new `lookup_type' and `add_type' operations on `read_context::types_map'. Replace function calls to the new ctf interface. Add verifier to not build types duplicated by recursive calling chain. (ctf_reader::process_ctf_type): Add code to return immediately if the ctf type is unknown. Add unknown types to `unknown_types_set'. (ctf_reader::process_ctf_archive): Change comment. Add code to iterate over global symbols, searching by symbol name in the ctf dictionary using `ctf_lookup_{variable,by_symbol_name}' depending of the ELF file type and corpus type, creating a `{var,fuc}_decl' using the return type of `ctf_type_kind'. Also close the ctf dict and call `canonicalize_all_types'. (slurp_elf_info): Set `is_elf_exec' depending of ELF type. Also return success if corpus origin is Linux and symbol table was read. (ctf_reader::read_corpus): Add current corpus. Set corpus origin to `LINUX_KERNEL_BINARY_ORIGIN' if `is_linux_kernel' returns true. Verify the ctf reader status, now the ctf archive is 'opened' using `ctf_arc{open,bufopen}' depending if the corpus origin has `corpus::LINUX_KERNEL_BINARY_ORIGIN' bit set. Use `sort_{function,variables}' calls after extract ctf information. `ctf_close' is called from `read_context' destructor. (read:context::{set_read_context_corpus_group, reset_read_context, read_and_add_corpus_to_group_from_elf, dic_type_key): Add new member function implementation. * include/abg-tools-utils.h (build_corpus_group_from_kernel_dist_under): Add `origin' parameter with default `corpus::DWARF_ORIGIN'. * src/abg-tools-utils.cc: Use `abg-ctf-reader.h' file. (maybe_load_vmlinux_dwarf_corpus): Add new function. (maybe_load_vmlinux_ctf_corpus): Likewise. (build_corpus_group_from_kernel_dist_under): Update comments. Add new `origin' argument. Use `maybe_load_vmlinux_dwarf_corpus' or `maybe_load_vmlinux_ctf_corpus' according to `origin' value. * src/abg-corpus.h (corpus::origin): Update `origin' type values in enum. * src/abg-corpus-priv.h (corpus::priv): Replace `origin' type from `corpus::origin' to `uint32_t'. * src/abg-corpus.cc (corpus::{get,set}_origin): Replace data type from `corpus::origin' to `uint32_t'. * tools/abidw.cc (main): Use of --ctf argument to set format debug. * tests/test-read-ctf.cc: Add new tests to harness. * tests/data/test-read-ctf/test-PR27700.abi: New test expected result. * tests/data/test-read-ctf/test-anonymous-fields.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-many-ctf.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum-many.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum-symbol-ctf.o.hash.abi: Likewise. * tests/data/test-read-common/test-PR26568-2.o: Adjust. * tests/data/test-read-ctf/test-PR26568-1.o.abi: Likewise. * tests/data/test-read-ctf/test-PR26568-2.o.abi: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-A.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.c: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.o: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-array-of-pointers.abi: Likewise. * tests/data/test-read-ctf/test-callback.abi: Likewise. * tests/data/test-read-ctf/test-callback2.abi: Likewise. * tests/data/test-read-ctf/test-conflicting-type-syms-a.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-conflicting-type-syms-b.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-dynamic-array.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-ctf.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-symbol.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum.o.abi: Likewise. * tests/data/test-read-ctf/test-forward-type-decl.abi: Likewise. * tests/data/test-read-ctf/test-functions-declaration.abi: Likewise. * tests/data/test-read-ctf/test-list-struct.abi: Likewise. * tests/data/test-read-ctf/test0: Likewise. * tests/data/test-read-ctf/test0.abi: Likewise. * tests/data/test-read-ctf/test0.c: Likewise. * tests/data/test-read-ctf/test0.hash.abi: Likewise. * tests/data/test-read-ctf/test1.so.abi: Likewise. * tests/data/test-read-ctf/test1.so.hash.abi: Likewise. * tests/data/test-read-ctf/test2.so.abi: Likewise. * tests/data/test-read-ctf/test2.so.hash.abi: Likewise. * tests/data/test-read-ctf/test3.so.abi: Likewise. * tests/data/test-read-ctf/test3.so.hash.abi: Likewise. * tests/data/test-read-ctf/test4.so.abi: Likewise. * tests/data/test-read-ctf/test4.so.hash.abi: Likewise. * tests/data/test-read-ctf/test5.o.abi: Likewise. * tests/data/test-read-ctf/test7.o.abi: Likewise. * tests/data/test-read-ctf/test8.o.abi: Likewise. * tests/data/test-read-ctf/test9.o.abi: Likewise. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2022-05-04 22:29:30 +00:00
#include "ctf-api.h"
Add support for the CTF debug format to libabigail. CTF (C Type Format) is a lightweight debugging format that provides information about C types and the association between functions and data symbols and types. It is designed to be very compact and simple. More can be learned about it at https://ctfstd.org. This patch introduces support in libabigail to extract ABI information from CTF stored in ELF files. A few notes on this implementation: - The implementation is complete in terms of CTF support. Every CTF feature is processed and handled to generate libabigail IR. This includes basic types, typedefs, pointer, array and struct types. The CTF record of data objects (variables) and functions are also used in order to generate the corresponding libabigail IR artifacts. - The decoding of CTF data is done using the libctf library which is part of binutils. In order to link with it, binutils shall be built with --enable-shared for libctf.so to become available. - This initial implementation is aimed to simplicity. We have not tried to resolve any and every corner case that may require special handling. We have observed that the DWARF front-end (which is naturally way more complex as the scope is way bigger) is plagued with hacks to handle such situations. However, for the CTF support we prefer to proceed in a simpler and more modest way: we will handle these problems if/when we find them. The fact that CTF only supports C (currently) certainly helps there. - Likewise, in this basic support we are not handling symbol suppressions or other goodies that libabigail provides. We are new to libabigail and ABI analysis, and at this point we simply don't have a clear picture about what is most useful/relevant to support or not. With the maintainer's blesssing, we will tackle that functionaly after this basic support is applied upstream. - The implementation in abg-ctf-reader.{cc,h} is pretty much self-contained. As a result there is some duplication in terms of ELF handling with the DWARF reader, but since that logic is very simple and can be easily implemented, we don't consider this to be a big deal (for now.) Hopefully the maintainers agree. - The libabigail tools assume that ELF means to always use DWARF to generate the ABI IR. We added a new command-line option --ctf to the tools in order to make them to use the CTF debug info instead. We are definitely not sure whether this is the best user interface. In fact I would be suprised if it was ;) - We added support for --ctf to both abilint and abidiff. We are not sure whether it would make sense to add support for CTF to the other tools. Feedback welcome. - We are pondering about what to do in terms of testing. We have cursory tested this implementation using abilint and abidiff. We know we are generating IR corpus that seem to be ok. It would be good however to be able to run the libabigail testsuites using CTF. However the testsuites may need some non-trivial changes in order to make this possible. Let's talk about that :) * configure.ac: Check for libctf. * src/abg-ctf-reader.cc: New file. * include/abg-ctf-reader.h: Likewise. * src/Makefile.am (libabigail_la_SOURCES): Add abg-ctf-reader.cc conditionally. * include/Makefile.am (pkginclude_HEADERS): Add abg-ctf-reader.h conditionally. * tools/abilint.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * tools/abidiff.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * doc/manuals/abidiff.rst: Document --ctf. * doc/manuals/abilint.rst: Likewise. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-10-28 22:51:32 +00:00
namespace abigail
{
namespace ctf_reader
{
class read_context;
typedef shared_ptr<read_context> read_context_sptr;
Add support for the CTF debug format to libabigail. CTF (C Type Format) is a lightweight debugging format that provides information about C types and the association between functions and data symbols and types. It is designed to be very compact and simple. More can be learned about it at https://ctfstd.org. This patch introduces support in libabigail to extract ABI information from CTF stored in ELF files. A few notes on this implementation: - The implementation is complete in terms of CTF support. Every CTF feature is processed and handled to generate libabigail IR. This includes basic types, typedefs, pointer, array and struct types. The CTF record of data objects (variables) and functions are also used in order to generate the corresponding libabigail IR artifacts. - The decoding of CTF data is done using the libctf library which is part of binutils. In order to link with it, binutils shall be built with --enable-shared for libctf.so to become available. - This initial implementation is aimed to simplicity. We have not tried to resolve any and every corner case that may require special handling. We have observed that the DWARF front-end (which is naturally way more complex as the scope is way bigger) is plagued with hacks to handle such situations. However, for the CTF support we prefer to proceed in a simpler and more modest way: we will handle these problems if/when we find them. The fact that CTF only supports C (currently) certainly helps there. - Likewise, in this basic support we are not handling symbol suppressions or other goodies that libabigail provides. We are new to libabigail and ABI analysis, and at this point we simply don't have a clear picture about what is most useful/relevant to support or not. With the maintainer's blesssing, we will tackle that functionaly after this basic support is applied upstream. - The implementation in abg-ctf-reader.{cc,h} is pretty much self-contained. As a result there is some duplication in terms of ELF handling with the DWARF reader, but since that logic is very simple and can be easily implemented, we don't consider this to be a big deal (for now.) Hopefully the maintainers agree. - The libabigail tools assume that ELF means to always use DWARF to generate the ABI IR. We added a new command-line option --ctf to the tools in order to make them to use the CTF debug info instead. We are definitely not sure whether this is the best user interface. In fact I would be suprised if it was ;) - We added support for --ctf to both abilint and abidiff. We are not sure whether it would make sense to add support for CTF to the other tools. Feedback welcome. - We are pondering about what to do in terms of testing. We have cursory tested this implementation using abilint and abidiff. We know we are generating IR corpus that seem to be ok. It would be good however to be able to run the libabigail testsuites using CTF. However the testsuites may need some non-trivial changes in order to make this possible. Let's talk about that :) * configure.ac: Check for libctf. * src/abg-ctf-reader.cc: New file. * include/abg-ctf-reader.h: Likewise. * src/Makefile.am (libabigail_la_SOURCES): Add abg-ctf-reader.cc conditionally. * include/Makefile.am (pkginclude_HEADERS): Add abg-ctf-reader.h conditionally. * tools/abilint.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * tools/abidiff.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * doc/manuals/abidiff.rst: Document --ctf. * doc/manuals/abilint.rst: Likewise. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-10-28 22:51:32 +00:00
read_context_sptr
create_read_context(const std::string& elf_path,
ir::environment *env);
corpus_sptr
read_corpus(read_context *ctxt, elf_reader::status& status);
ctf-reader: Add support to read CTF information from the Linux Kernel This patch is meant to extract ABI information from the CTF data stored in the Linux kernel build directory. It depends on the vmlinux.ctfa archive file. In order to generate the CTF information, the Linux Kernel build system must support the 'make ctf' command, which causes the compiler to be run with -gctf, thus emitting the CTF information for the Kernel. The target 'ctf' in the Linux Makefile generates a 'vmlinux.ctfa' file that will be used by the ctf reader in libabigail. The 'vmlinux.ctfa' archive has multiple 'ctf dictionaries' called "CTF archive members". There is one CTF archive member for built-in kernel modules (in `vmlinux') and one for each out-of-tree kernel module organized in a parent-child hierarchy. There is also a CTF archive member called `shared_ctf' which is a parent dictionary containing shared symbols and CTF types used by more than one kernel object. These common types are stored in 'types_map' in the ctf reader, ignoring the ctf dictionary name. The CTF API has the machinery for looking for a shared type in the parent dictionary referred to in a given child dictionary. This CTF layout can be dumped by using the objdump tool. Due to the fact that the _same_ ctf archive is used to build the vmlinux corpus the corpora of the kernel module (which, by the way, all belong to the same corpus group), the high number of open/close on the CTF archive is very time consuming during the ctf extraction. So, the performance is improved up to 300% (from ~2m:50s to ~50s) by keeping the ctf archive open for a given group, and thus, by using the same ctf_archive_t pointer while building all the various corpora. We just invoke `reset_read_context' for each new corpus. Note that the `read_context::ctfa` data member should be updated if the corpus::origin data member is set to `LINUX_KERNEL_BINARY_ORIGIN' and the file to be process is not 'vmlinux'. Note that `ctf_close' must be called after processing all group's members so it is executed from the destructor of `reader_context'. The basic algorithm used to generate the Linux corpus is the following: 1. Looking for: vmlinux, *.ko objects, and vmlinux.ctfa files. The first files are used to extract the ELF symbols, and the last one contains the CTF type information for non-static variables and functions symbols. 2. `process_ctf_archive' iterates on public symbols for vmlinux and its modules, using the name of the symbol, ctf reader search for CTF information in its dictionary, if the information was found it builds a `var_decl' or `function_decl' depending of `ctf_type_kind' result. This algorithm is also applied to ELF files (exec, dyn, rel), so instead of iterating on all ctf_types it just loops on the public symbols. * abg-elf-reader-common.h: Include ctf-api.h file. (read_and_add_corpus_to_group_from_elf, set_read_context_corpus_group) (reset_read_context, dic_type_key): Declare new member functions. * include/abg-ir.cc (types_defined_same_linux_kernel_corpus_public): Use bitwise to know the corpus `origin'. * src/abg-ctf-reader.cc: Include map, algorithms header files. (read_context::type_map): Change from unordered_map to std::map storing ctf dictionary name as part of the key. (read_context::is_elf_exec): Add new member variable. (read_context::{cur_corpus_, cur_corpus_group_}): Likewise. (read_context::unknown_types_set): Likewise. (read_context::{current_corpus_group, main_corpus_from_current_group, has_corpus_group, current_corpus_is_main_corpus_from_current_group, should_reuse_type_from_corpus_group}): Add new member functions. (read_context::{add_unknown_type, lookup_unknown_type, initialize}): Likewise. (read_context::{add_type, lookup_type}): Add new `ctf_dict_t' type argument. (ctf_reader::{process_ctf_typedef, process_ctf_base_type, process_ctf_function_type, process_ctf_forward_type, process_ctf_struct_type, process_ctf_union_type, process_ctf_array_type, process_ctf_qualified_type, process_ctf_enum_type}): Add code to `reuse' types already registered in main corpus `should_reuse_type_from_corpus_group'. Use new `lookup_type' and `add_type' operations on `read_context::types_map'. Replace function calls to the new ctf interface. Add verifier to not build types duplicated by recursive calling chain. (ctf_reader::process_ctf_type): Add code to return immediately if the ctf type is unknown. Add unknown types to `unknown_types_set'. (ctf_reader::process_ctf_archive): Change comment. Add code to iterate over global symbols, searching by symbol name in the ctf dictionary using `ctf_lookup_{variable,by_symbol_name}' depending of the ELF file type and corpus type, creating a `{var,fuc}_decl' using the return type of `ctf_type_kind'. Also close the ctf dict and call `canonicalize_all_types'. (slurp_elf_info): Set `is_elf_exec' depending of ELF type. Also return success if corpus origin is Linux and symbol table was read. (ctf_reader::read_corpus): Add current corpus. Set corpus origin to `LINUX_KERNEL_BINARY_ORIGIN' if `is_linux_kernel' returns true. Verify the ctf reader status, now the ctf archive is 'opened' using `ctf_arc{open,bufopen}' depending if the corpus origin has `corpus::LINUX_KERNEL_BINARY_ORIGIN' bit set. Use `sort_{function,variables}' calls after extract ctf information. `ctf_close' is called from `read_context' destructor. (read:context::{set_read_context_corpus_group, reset_read_context, read_and_add_corpus_to_group_from_elf, dic_type_key): Add new member function implementation. * include/abg-tools-utils.h (build_corpus_group_from_kernel_dist_under): Add `origin' parameter with default `corpus::DWARF_ORIGIN'. * src/abg-tools-utils.cc: Use `abg-ctf-reader.h' file. (maybe_load_vmlinux_dwarf_corpus): Add new function. (maybe_load_vmlinux_ctf_corpus): Likewise. (build_corpus_group_from_kernel_dist_under): Update comments. Add new `origin' argument. Use `maybe_load_vmlinux_dwarf_corpus' or `maybe_load_vmlinux_ctf_corpus' according to `origin' value. * src/abg-corpus.h (corpus::origin): Update `origin' type values in enum. * src/abg-corpus-priv.h (corpus::priv): Replace `origin' type from `corpus::origin' to `uint32_t'. * src/abg-corpus.cc (corpus::{get,set}_origin): Replace data type from `corpus::origin' to `uint32_t'. * tools/abidw.cc (main): Use of --ctf argument to set format debug. * tests/test-read-ctf.cc: Add new tests to harness. * tests/data/test-read-ctf/test-PR27700.abi: New test expected result. * tests/data/test-read-ctf/test-anonymous-fields.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-many-ctf.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum-many.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum-symbol-ctf.o.hash.abi: Likewise. * tests/data/test-read-common/test-PR26568-2.o: Adjust. * tests/data/test-read-ctf/test-PR26568-1.o.abi: Likewise. * tests/data/test-read-ctf/test-PR26568-2.o.abi: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-A.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.c: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.o: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-array-of-pointers.abi: Likewise. * tests/data/test-read-ctf/test-callback.abi: Likewise. * tests/data/test-read-ctf/test-callback2.abi: Likewise. * tests/data/test-read-ctf/test-conflicting-type-syms-a.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-conflicting-type-syms-b.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-dynamic-array.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-ctf.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-symbol.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum.o.abi: Likewise. * tests/data/test-read-ctf/test-forward-type-decl.abi: Likewise. * tests/data/test-read-ctf/test-functions-declaration.abi: Likewise. * tests/data/test-read-ctf/test-list-struct.abi: Likewise. * tests/data/test-read-ctf/test0: Likewise. * tests/data/test-read-ctf/test0.abi: Likewise. * tests/data/test-read-ctf/test0.c: Likewise. * tests/data/test-read-ctf/test0.hash.abi: Likewise. * tests/data/test-read-ctf/test1.so.abi: Likewise. * tests/data/test-read-ctf/test1.so.hash.abi: Likewise. * tests/data/test-read-ctf/test2.so.abi: Likewise. * tests/data/test-read-ctf/test2.so.hash.abi: Likewise. * tests/data/test-read-ctf/test3.so.abi: Likewise. * tests/data/test-read-ctf/test3.so.hash.abi: Likewise. * tests/data/test-read-ctf/test4.so.abi: Likewise. * tests/data/test-read-ctf/test4.so.hash.abi: Likewise. * tests/data/test-read-ctf/test5.o.abi: Likewise. * tests/data/test-read-ctf/test7.o.abi: Likewise. * tests/data/test-read-ctf/test8.o.abi: Likewise. * tests/data/test-read-ctf/test9.o.abi: Likewise. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2022-05-04 22:29:30 +00:00
corpus_sptr
read_corpus(const read_context_sptr &ctxt, elf_reader::status &status);
ctf-reader: Add support to read CTF information from the Linux Kernel This patch is meant to extract ABI information from the CTF data stored in the Linux kernel build directory. It depends on the vmlinux.ctfa archive file. In order to generate the CTF information, the Linux Kernel build system must support the 'make ctf' command, which causes the compiler to be run with -gctf, thus emitting the CTF information for the Kernel. The target 'ctf' in the Linux Makefile generates a 'vmlinux.ctfa' file that will be used by the ctf reader in libabigail. The 'vmlinux.ctfa' archive has multiple 'ctf dictionaries' called "CTF archive members". There is one CTF archive member for built-in kernel modules (in `vmlinux') and one for each out-of-tree kernel module organized in a parent-child hierarchy. There is also a CTF archive member called `shared_ctf' which is a parent dictionary containing shared symbols and CTF types used by more than one kernel object. These common types are stored in 'types_map' in the ctf reader, ignoring the ctf dictionary name. The CTF API has the machinery for looking for a shared type in the parent dictionary referred to in a given child dictionary. This CTF layout can be dumped by using the objdump tool. Due to the fact that the _same_ ctf archive is used to build the vmlinux corpus the corpora of the kernel module (which, by the way, all belong to the same corpus group), the high number of open/close on the CTF archive is very time consuming during the ctf extraction. So, the performance is improved up to 300% (from ~2m:50s to ~50s) by keeping the ctf archive open for a given group, and thus, by using the same ctf_archive_t pointer while building all the various corpora. We just invoke `reset_read_context' for each new corpus. Note that the `read_context::ctfa` data member should be updated if the corpus::origin data member is set to `LINUX_KERNEL_BINARY_ORIGIN' and the file to be process is not 'vmlinux'. Note that `ctf_close' must be called after processing all group's members so it is executed from the destructor of `reader_context'. The basic algorithm used to generate the Linux corpus is the following: 1. Looking for: vmlinux, *.ko objects, and vmlinux.ctfa files. The first files are used to extract the ELF symbols, and the last one contains the CTF type information for non-static variables and functions symbols. 2. `process_ctf_archive' iterates on public symbols for vmlinux and its modules, using the name of the symbol, ctf reader search for CTF information in its dictionary, if the information was found it builds a `var_decl' or `function_decl' depending of `ctf_type_kind' result. This algorithm is also applied to ELF files (exec, dyn, rel), so instead of iterating on all ctf_types it just loops on the public symbols. * abg-elf-reader-common.h: Include ctf-api.h file. (read_and_add_corpus_to_group_from_elf, set_read_context_corpus_group) (reset_read_context, dic_type_key): Declare new member functions. * include/abg-ir.cc (types_defined_same_linux_kernel_corpus_public): Use bitwise to know the corpus `origin'. * src/abg-ctf-reader.cc: Include map, algorithms header files. (read_context::type_map): Change from unordered_map to std::map storing ctf dictionary name as part of the key. (read_context::is_elf_exec): Add new member variable. (read_context::{cur_corpus_, cur_corpus_group_}): Likewise. (read_context::unknown_types_set): Likewise. (read_context::{current_corpus_group, main_corpus_from_current_group, has_corpus_group, current_corpus_is_main_corpus_from_current_group, should_reuse_type_from_corpus_group}): Add new member functions. (read_context::{add_unknown_type, lookup_unknown_type, initialize}): Likewise. (read_context::{add_type, lookup_type}): Add new `ctf_dict_t' type argument. (ctf_reader::{process_ctf_typedef, process_ctf_base_type, process_ctf_function_type, process_ctf_forward_type, process_ctf_struct_type, process_ctf_union_type, process_ctf_array_type, process_ctf_qualified_type, process_ctf_enum_type}): Add code to `reuse' types already registered in main corpus `should_reuse_type_from_corpus_group'. Use new `lookup_type' and `add_type' operations on `read_context::types_map'. Replace function calls to the new ctf interface. Add verifier to not build types duplicated by recursive calling chain. (ctf_reader::process_ctf_type): Add code to return immediately if the ctf type is unknown. Add unknown types to `unknown_types_set'. (ctf_reader::process_ctf_archive): Change comment. Add code to iterate over global symbols, searching by symbol name in the ctf dictionary using `ctf_lookup_{variable,by_symbol_name}' depending of the ELF file type and corpus type, creating a `{var,fuc}_decl' using the return type of `ctf_type_kind'. Also close the ctf dict and call `canonicalize_all_types'. (slurp_elf_info): Set `is_elf_exec' depending of ELF type. Also return success if corpus origin is Linux and symbol table was read. (ctf_reader::read_corpus): Add current corpus. Set corpus origin to `LINUX_KERNEL_BINARY_ORIGIN' if `is_linux_kernel' returns true. Verify the ctf reader status, now the ctf archive is 'opened' using `ctf_arc{open,bufopen}' depending if the corpus origin has `corpus::LINUX_KERNEL_BINARY_ORIGIN' bit set. Use `sort_{function,variables}' calls after extract ctf information. `ctf_close' is called from `read_context' destructor. (read:context::{set_read_context_corpus_group, reset_read_context, read_and_add_corpus_to_group_from_elf, dic_type_key): Add new member function implementation. * include/abg-tools-utils.h (build_corpus_group_from_kernel_dist_under): Add `origin' parameter with default `corpus::DWARF_ORIGIN'. * src/abg-tools-utils.cc: Use `abg-ctf-reader.h' file. (maybe_load_vmlinux_dwarf_corpus): Add new function. (maybe_load_vmlinux_ctf_corpus): Likewise. (build_corpus_group_from_kernel_dist_under): Update comments. Add new `origin' argument. Use `maybe_load_vmlinux_dwarf_corpus' or `maybe_load_vmlinux_ctf_corpus' according to `origin' value. * src/abg-corpus.h (corpus::origin): Update `origin' type values in enum. * src/abg-corpus-priv.h (corpus::priv): Replace `origin' type from `corpus::origin' to `uint32_t'. * src/abg-corpus.cc (corpus::{get,set}_origin): Replace data type from `corpus::origin' to `uint32_t'. * tools/abidw.cc (main): Use of --ctf argument to set format debug. * tests/test-read-ctf.cc: Add new tests to harness. * tests/data/test-read-ctf/test-PR27700.abi: New test expected result. * tests/data/test-read-ctf/test-anonymous-fields.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-many-ctf.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum-many.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum-symbol-ctf.o.hash.abi: Likewise. * tests/data/test-read-common/test-PR26568-2.o: Adjust. * tests/data/test-read-ctf/test-PR26568-1.o.abi: Likewise. * tests/data/test-read-ctf/test-PR26568-2.o.abi: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-A.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.c: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.o: Likewise. * tests/data/test-read-ctf/test-ambiguous-struct-B.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-array-of-pointers.abi: Likewise. * tests/data/test-read-ctf/test-callback.abi: Likewise. * tests/data/test-read-ctf/test-callback2.abi: Likewise. * tests/data/test-read-ctf/test-conflicting-type-syms-a.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-conflicting-type-syms-b.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-dynamic-array.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-ctf.o.abi: Likewise. * tests/data/test-read-ctf/test-enum-symbol.o.hash.abi: Likewise. * tests/data/test-read-ctf/test-enum.o.abi: Likewise. * tests/data/test-read-ctf/test-forward-type-decl.abi: Likewise. * tests/data/test-read-ctf/test-functions-declaration.abi: Likewise. * tests/data/test-read-ctf/test-list-struct.abi: Likewise. * tests/data/test-read-ctf/test0: Likewise. * tests/data/test-read-ctf/test0.abi: Likewise. * tests/data/test-read-ctf/test0.c: Likewise. * tests/data/test-read-ctf/test0.hash.abi: Likewise. * tests/data/test-read-ctf/test1.so.abi: Likewise. * tests/data/test-read-ctf/test1.so.hash.abi: Likewise. * tests/data/test-read-ctf/test2.so.abi: Likewise. * tests/data/test-read-ctf/test2.so.hash.abi: Likewise. * tests/data/test-read-ctf/test3.so.abi: Likewise. * tests/data/test-read-ctf/test3.so.hash.abi: Likewise. * tests/data/test-read-ctf/test4.so.abi: Likewise. * tests/data/test-read-ctf/test4.so.hash.abi: Likewise. * tests/data/test-read-ctf/test5.o.abi: Likewise. * tests/data/test-read-ctf/test7.o.abi: Likewise. * tests/data/test-read-ctf/test8.o.abi: Likewise. * tests/data/test-read-ctf/test9.o.abi: Likewise. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2022-05-04 22:29:30 +00:00
corpus_sptr
read_and_add_corpus_to_group_from_elf(read_context*, corpus_group&, elf_reader::status&);
void
set_read_context_corpus_group(read_context& ctxt, corpus_group_sptr& group);
void
reset_read_context(read_context_sptr &ctxt,
const std::string& elf_path,
ir::environment* environment);
std::string
dic_type_key(ctf_dict_t *dic, ctf_id_t ctf_type);
Add support for the CTF debug format to libabigail. CTF (C Type Format) is a lightweight debugging format that provides information about C types and the association between functions and data symbols and types. It is designed to be very compact and simple. More can be learned about it at https://ctfstd.org. This patch introduces support in libabigail to extract ABI information from CTF stored in ELF files. A few notes on this implementation: - The implementation is complete in terms of CTF support. Every CTF feature is processed and handled to generate libabigail IR. This includes basic types, typedefs, pointer, array and struct types. The CTF record of data objects (variables) and functions are also used in order to generate the corresponding libabigail IR artifacts. - The decoding of CTF data is done using the libctf library which is part of binutils. In order to link with it, binutils shall be built with --enable-shared for libctf.so to become available. - This initial implementation is aimed to simplicity. We have not tried to resolve any and every corner case that may require special handling. We have observed that the DWARF front-end (which is naturally way more complex as the scope is way bigger) is plagued with hacks to handle such situations. However, for the CTF support we prefer to proceed in a simpler and more modest way: we will handle these problems if/when we find them. The fact that CTF only supports C (currently) certainly helps there. - Likewise, in this basic support we are not handling symbol suppressions or other goodies that libabigail provides. We are new to libabigail and ABI analysis, and at this point we simply don't have a clear picture about what is most useful/relevant to support or not. With the maintainer's blesssing, we will tackle that functionaly after this basic support is applied upstream. - The implementation in abg-ctf-reader.{cc,h} is pretty much self-contained. As a result there is some duplication in terms of ELF handling with the DWARF reader, but since that logic is very simple and can be easily implemented, we don't consider this to be a big deal (for now.) Hopefully the maintainers agree. - The libabigail tools assume that ELF means to always use DWARF to generate the ABI IR. We added a new command-line option --ctf to the tools in order to make them to use the CTF debug info instead. We are definitely not sure whether this is the best user interface. In fact I would be suprised if it was ;) - We added support for --ctf to both abilint and abidiff. We are not sure whether it would make sense to add support for CTF to the other tools. Feedback welcome. - We are pondering about what to do in terms of testing. We have cursory tested this implementation using abilint and abidiff. We know we are generating IR corpus that seem to be ok. It would be good however to be able to run the libabigail testsuites using CTF. However the testsuites may need some non-trivial changes in order to make this possible. Let's talk about that :) * configure.ac: Check for libctf. * src/abg-ctf-reader.cc: New file. * include/abg-ctf-reader.h: Likewise. * src/Makefile.am (libabigail_la_SOURCES): Add abg-ctf-reader.cc conditionally. * include/Makefile.am (pkginclude_HEADERS): Add abg-ctf-reader.h conditionally. * tools/abilint.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * tools/abidiff.cc (struct options): New option `use_ctf'. (display_usage): Documentation for --ctf. (parse_command_line): Handle --ctf. (main): Honour --ctf. * doc/manuals/abidiff.rst: Document --ctf. * doc/manuals/abilint.rst: Likewise. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-10-28 22:51:32 +00:00
} // end namespace ctf_reader
} // end namespace abigail
#endif // ! __ABG_CTF_READER_H__