Commit Graph

2286 Commits

Author SHA1 Message Date
Dodji Seketeli
d1a8eae8ed ir: Avoid infinite loop during type canonicalization
While looking at something else, I noticed an occurrence of infinite
loop during type canonicalization, especially when cancelling
canonical type propagation on some types.

Fixed thus.

This helps address https://bugzilla.redhat.com/show_bug.cgi?id=1951501

	* src/abg-ir-priv.h
	(environment::priv::collect_types_that_depends_on): Don't try to
	collect a type that has already been collected.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-09-08 16:12:06 +02:00
Dodji Seketeli
4e029df894 writer: escape enum linkage name in abixml
While looking at something else, I stumbled across this bug where the
linkage name of enum are not escaped in abixml.  So "forbidden"
characters like '<' can snick in.

Fixed thus.

This helps address https://bugzilla.redhat.com/show_bug.cgi?id=1951501

	* src/abg-writer.cc (write_enum_type_decl): Escape linkage name.
2021-09-08 16:11:58 +02:00
Dodji Seketeli
022faf705f RHBZ-1944096 - assertion failure during self comparison of systemd
When reading the abixml representing an enumerator which value is
exactly either LLONG_MIN or LLONG_MAX, build_enum_type_decl fails
because we wrongly think that an underflow or overflow happened, while
using strtoll.

This patch fixes the condition used to detect {under,over}flow
whenusing strtoll.

	* src/abg-reader.cc (build_enum_type_decl): When strtoll detects
	an underflow or overflo, it sets errno to ERANGE.  So take that
	into account.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-09-08 16:11:09 +02:00
Dodji Seketeli
190350a35f Bug 27985 - abidiff: bad array types in report
Reporting the change in array type exhibits a glitch in the type name.
As the bug report says:

	The resulting abidiff output contains:

	                type of 'int numbers[2]' changed:
			 type name changed from 'void[2]' to 'void[3]'
			 array type size changed from 64 to 96
			 array type subrange 1 changed length from 2 to 3

	instead of

	                type of 'int numbers[2]' changed:
			 type name changed from 'int[2]' to 'int[3]'
			 array type size changed from 64 to 96
			 array type subrange 1 changed length from 2 to 3

The problem comes from array_type_def::get_qualified_name() where we
fail to generate a "new" qualified name once the type of the array is
canonicalized.

Fixed thus.

	* src/abg-ir.cc (array_type_def::get_qualified_name): Use the
	cache for temporary qualified names when the type is not yet
	canonicalized.  That way, the cache for (non-temporary) qualified
	names is used only for canonicalized types.
	* tests/data/test-abidiff/test-PR27985-report.txt: Reference
	output for the new test.
	* tests/data/test-abidiff/test-PR27985-v{0,1}.c: Source code for
	the new test binary inputs.
	* tests/data/test-abidiff/test-PR27985-v{0,1}.o: New test binary inputs.
	* tests/data/test-abidiff/test-PR27985-v{0,1}.o.abi: New test
	abixml input.
	* tests/data/Makefile.am: Add the new test materials above to
	source distribution.
	* tests/test-abidiff.cc (specs): Add the tests above to the harness.
	* tests/data/test-diff-pkg/nss-3.23.0-1.0.fc23.x86_64-report-0.txt:
	Adjust.
	* tests/data/test-abidiff-exit/qualifier-typedef-array-report-1.txt:
	Adjust.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-09-03 11:08:01 +02:00
Giuliano Procida
0907d84aef abg-writer: faster referenced type emission tests
When determining whether a referenced type should be emitted, various
tests are done:

- has the type been emitted already? hash table lookup
- does the translation unit match? string comparison
- is this the last translation unit? read bool variable

The translation unit tests were added in recent commits and followed
the hash table lookups. This resulted in a performance regression
affecting Android continuous integration tests.

The lookups require a hash calculation and an equality check if the
hash is present. The equality checks are expensive deep equalities
rather than pointer comparisons.

This change reorders the tests so that the lookups happen last. This
speeds up abidw by more than a factor of 10 for one Android library.

	* src/abg-writer.cc (write_translation_unit): Reorder
	referenced type emission tests for efficiency. Consolidate
	related comments.

Signed-off-by: Giuliano Procida <gprocida@google.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-09-01 11:58:29 +02:00
Dodji Seketeli
ca08bae742 RHBZ 1925886 - Compare anonymous types without qualified names
An anonymous struct/union is, by definition an entity that is not
named (unless a naming typedef is provided for it).

It turns out that in C++ binaries, there are anonymous types that are
logically equivalent (as far as ABI is concerned) because they have
the same members and layout, but turn out to be evaluated as being
different because they are defined in different name spaces.  And
because they are not named, showing them as being different just
because of their name space doesn't bring anything but spurious error
reporting.

Consider the DWARF representing this:

struct S
{
  union
  {
   int a;
   int b;
  } member;
};

where the 'member' is of type S::<anonymous-union>.

Probably due to LTO, we see some DWARF that represents the type of
'member' as just <anonymous-union>, in some translation units.

I could not generate that DWARF from a small test case, myself.  But
it comes from the binary 'usr/bin/lto-dump', from the
https://bugzilla.redhat.com/show_bug.cgi?id=1925886 problem report.

So in that case, we want the S::<anonymous-union> to compare equal to
the <anonymous-union>, otherwise, this produces spurious type changes,
especially when doing self comparison.

This is what this patch does.

	* include/abg-fwd.h (is_anonymous_type): Constify this function.
	* src/abg-ir.cc (equals): In the overload for decl_base, do not
	take scope of anonymous types into account.  In the overload for
	array_type_def do not peel of typedefs.  This is not directly
	related to anonymous types, but it make comparison more robust
	against naming typedefs used for anonymous types in array
	elements.
	(get_type_name): Do not take into account the scope of anonymous
	types when building internal representation of types.  Note that
	the internal representation is what is used for canonicalization.
	This means that all anonymous types are compared against each
	others during type canonicalization.
	* src/abg-reader.cc (build_class_decl): Do not try to re-use
	anonymous types, just like we already do for DWARF.
	* tests/data/test-annotate/test17-pr19027.so.abi: Adjust.
	* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
	Likewise.
	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt:
	Likewise.
	* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt:
	Likewise.
	* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
	* tests/data/test-read-dwarf/test-libaaudio.so.abi: Likewise.
	* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
	* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise.
	* tests/data/test-read-dwarf/test11-pr18828.so.abi: Likewise.
	* tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise.
	* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
	* tests/data/test-read-dwarf/test17-pr19027.so.abi: Likewise.
	* tests/data/test-read-dwarf/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:39:54 +02:00
Dodji Seketeli
dd0861c9a8 Bug 27236 - Don't forget to emit some referenced types
Since we arranged to only emit referenced types in translation units
where they belong, it appears that in some cases we forget to emit
some referenced types.

This is because some referenced types might belong to a translation
unit that is *already* emitted by the time we detect that a type is
referenced.

To fix this correctly, we should probably have a pass that walks the
corpus to detect referenced types, so that we have their set even
before we start emitting translation units.

But for now, the patch just detects when we are emitting the last
translation unit.  In that case all the non-emitted referenced types
are emitted.  It doesn't seem to be an issue if those don't belong to
that translation unit, compared to their original (from the DWARF)
type.

	* include/abg-writer.h (write_translation_unit): Add a new
	parameter that says if we are emitting the last TU.
	* src/abg-writer.cc (write_translation_unit::{type_is_emitted,
	decl_only_type_is_emitted}): Constify these methods.
	(write_context::has_non_emitted_referenced_types): Define new
	member function using the const methods above.
	(write_translation_unit): When emitting the last TU, emit all the
	referenced types.
	(write_corpus): Set signal when emitting the last translation
	unit.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:39:49 +02:00
Dodji Seketeli
cc2574121f Bug 27236 - Allow updating classes from abixml
Some classes can be defined piece-wise, in some rare cases in the
abixml.  build_class_decl is currently preventing that to happen,
leading to some spurious self comparison errors.

Fixed thus.

	* src/abg-reader.cc (build_class_decl): Keep going when the class
	has already been built.  The rest of the code knows how to add new
	stuff.
	* tests/data/test-abidiff/test-PR18791-report0.txt: Adjust.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:39:36 +02:00
Dodji Seketeli
39ba859603 Bug 27236 - Fix the canonical type propagation optimization
While working on another bug, it turned out the initial fix for the
bug https://sourceware.org/bugzilla/show_bug.cgi?id=27236 was just
papering over the real issue.

I think the real issue is that "canonical type propagation"
optimization was being done even in cases where it shouldn't have been
done.  This patch recognizes the limits of that optimization and avoid
performing it when we are off limits.

So here is what that optimization is.  The text below is also present
in the comments in the source code.  I am putting it here to explain
the context.

During the canonicalization of a type T (which doesn't yet have a
canonical type), T is compared structurally (member-wise) against a
type C which already has a canonical type.  The comparison expression
is C == T.

During that structural comparison, if a subtype of C (which also
already has a canonical type) is structurally compared to a subtype of
T (which doesn't yet have a canonical type) and if they are equal,
then we can deduce that the canonical type of the subtype of C is the
canonical type of the subtype of C.

Thus, we can canonicalize the sub-type of the T, during the
canonicalization of T itself.  That canonicalization of the sub-type
of T is what we call "propagating the canonical type of the sub-type
of C onto the sub-type of T".  It's also called "on-the-fly
canonicalization".  It's on the fly because it happens during a
comparison -- which itself happens during the canonicalization of T.

So this is the general description of the "canonical type propagation
optimization".

Now we must recognize the limits of that optimization.  Said
otherwise, there is a case when a type is *NOT* eligible to this
canonical type propagation optimization.

The reason why a type is deemed NON-eligible to the canonical type
propagation optimization is that it "depends" on a recursively present
type.  Let me explain.

Suppose we have a type T that has sub-types named ST0 and ST1.
Suppose ST1 itself has a sub-type that is T itself.  In this case, we
say that T is a recursive type, because it has T (itself) as one of
its sub-types:

  T
  +-- ST0
  |
  +-- ST1
  |    +
  |    |
  |    +-- T
  |
  +-- ST2

ST1 is said to "depend" on T because it has T as a sub-type.  But
because T is recursive, then ST1 is said to depend on a recursive
type.  Notice however that ST0 does not depend on any recursive type.

Now suppose we are comparing T to a type T' that has the same
structure with sub-types ST0', ST1' and ST2'.  During the
comparison of ST1 against ST1', their sub-type T is compared
against T'.  Because T (resp. T') is a recursive type that is
already being compared, the comparison of T against T' (as a
subtypes of ST1 and ST1') returns true, meaning they are
considered equal.  This is done so that we don't enter an infinite
recursion.

That means ST1 is also deemed equal to ST1'.  If we are in the
course of the canonicalization of T' and thus if T (as well as as
all of its sub-types) is already canonicalized, then the canonical
type propagation optimization will make us propagate the canonical
type of ST1 onto ST1'.  So the canonical type of ST1' will be
equal to the canonical type of ST1 as a result of that
optmization.

But then, later down the road, when ST2 is compared against ST2',
let's suppose that we find out that they are different. Meaning
that ST2 != ST2'.  This means that T != T', i.e, the
canonicalization of T' failed for now.  But most importantly, it
means that the propagation of the canonical type of ST1 to ST1'
must now be invalidated.  Meaning, ST1' must now be considered as
not having any canonical type.

In other words, during type canonicalization, if ST1' depends on a
recursive type T', its propagated canonical type must be
invalidated (set to nullptr) if T' appears to be different from T,
a.k.a, the canonicalization of T' temporarily failed.

This means that any sub-type that depends on recursive types and
that has been the target of the canonical type propagation
optimization must be tracked.  If the dependant recursive type
fails its canonicalization, then the sub-type being compared must
have its propagated canonical type cleared.  In other words, its
propagated canonical type must be cancelled.

This concept of cancelling the propagated canonical type when needed
is what this patch introduces.

New data members have been introduced to the environment::priv private
structure.  Those are to keep track of the stack of sub-types being
compared so that we can detect if a candidate to the canonical type
propagation optimization depends on a recursive type.

There is also a data structure in there to track the targets of the
canonical type propagation optimization that "might" need to see their
propagated canonical types be cancelled.

Then new functions have been introduced to detect when a type depends
on a recursive type, to cancel or confirm propagated canonical types
etc.

In abg-ir.cc, The RETURN* macros used in the equals() overloads have
been factorized using the newly introduced function templates
return_comparison_result().  This now contains the book keeping that
was previously done (in the RETURN* macros) to detect recursive cycles
in the comparison, as well as triggering the canonical type
propagation.  This i also where the logic of properly limiting the
optimization is implemented now.

	* include/abg-ir.h (pointer_set): This typedef is now for an
	unordered_set<uintptr_t> rather than an unordered_set<size_t>.
	(environment::priv_): Make this public so that code in free form
	function from abg-ir.cc can access it.
	* src/abg-ir-priv.h (struct type_base::priv): Move this private
	structure here, from abg-ir.cc.
	(type_base::priv::{depends_on_recursive_type_,
	canonical_type_propagated_}): Added these two new data members.
	(type_base::priv::priv): Initialize the two new data members.
	(type_base::priv::{depends_on_recursive_type,
	set_depends_on_recursive_type,
	set_does_not_depend_on_recursive_type, canonical_type_propagated,
	set_canonical_type_propagated, clear_propagated_canonical_type}):
	Define new member functions.
	(struct environment::priv): Move this struct here, from abg-ir.cc.
	(environment::priv::{types_with_non_confirmed_propagated_ct_,
	left_type_comp_operands_, right_type_comp_operands_}): New data
	members.
	(environment::priv::{mark_dependant_types,
	mark_dependant_types_compared_until, confirm_ct_propagation,
	collect_types_that_depends_on, cancel_ct_propagation,
	remove_from_types_with_non_confirmed_propagated_ct}): New member
	functions.
	* src/abg-ir.cc (struct environment::priv, struct)
	(type_base::priv, struct class_or_union::priv): Move these struct
	to include/abg-ir-priv.h.
	(push_composite_type_comparison_operands)
	(pop_composite_type_comparison_operands)
	(mark_dependant_types_compared_until)
	(maybe_cancel_propagated_canonical_type): Define new functions.
	(notify_equality_failed, mark_types_as_being_compared): Re-indent.
	(is_comparison_cycle_detected, return_comparison_result): Define
	new function templates.
	(RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED): Define new macro.
	(equals(const function_type& l, const function_type& r)): Redefine
	the RETURN macro using the new return_comparison_result function
	template.  Use the new RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED
	and mark_types_as_being_compared functions.
	(equals(const class_or_union& l, const class_or_union&, change_kind*)):
	Likewise.
	(equals(const class_decl& l, const class_decl&, change_kind*)):
	Likewise.  Because this uses another equal() function to compare
	the class_or_union part the type, ensure that no canonical type
	propagation occurs at that point.
	(types_are_being_compared): Remove as it's not used anymore.
	(maybe_propagate_canonical_type): Use the new
	environment::priv::propagate_ct() function here.
	(method_matches_at_least_one_in_vector): Ensure the
	right-hand-side operand of the equality stays on the right.  This
	is important because the equals() functions expect that.
	* src/abg-reader.cc (build_type): Ensure all types are
	canonicalized.
	* tests/data/test-diff-dwarf/PR25058-liblttng-ctl-report-1.txt:
	Adjust.
	* tests/data/test-diff-pkg/nss-3.23.0-1.0.fc23.x86_64-report-0.txt:
	Likewise.
	* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
	Likewise.
	* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
	Likewise.
	* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
	Likewise.
	* tests/data/test-read-dwarf/test-libaaudio.so.abi: Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:39:11 +02:00
Dodji Seketeli
46b1ab08b0 Bug 27995 - Self comparison error from abixml file
There are several self comparison issues uncovered by comparing the
file test-PR27995.abi (provided in the bug report) against itself.
This patch address them all as well as the regressions induced on some
of the test suite and then and updates the other reference test suite
output that need it.

In the equals overload for decl_base, we compare the non-internal
versions of qualified decl names.  For var_decls of anonymous class or
union types, the non-internal version is the flat-representation of
the type.  Thus a benign change in a data member name of the anonymous
type might cause the equals function to consider the var_decls to be
wrongly different.  The internal version of the qualified decl name
should return a name that is stable for types, irrespective of these
benign variations.  The patch thus makes the equals overload for
decl_base to compare internal versions of qualified decl names instead.

The patch ensures that enum_type_decl::get_pretty_representation
return and internal pretty representation that is "stable" for
anonymous types.  Basically, all anonymous enums will have the same of
name that looks like "__anonymous_enum__".  This is to ensure two
things: first, that all anonymous enums are compared against each
other during type canonicalization, ensuring that when two anonymous
enums are canonically different, it really is because of changes in
their enumerators or basic type, not because of anything having to do
with their artificial names.  Second, that in the equals overload for
decl_base, their internal qualified name always compare equal.  This
nullifies the risk of having anonymous types compare different because
of their (non existent) name.  This is because libabigail's dwarf
reader assigns artificial unique names to anonymous types, so we don't
want to use these during actual type comparison.

We do something similar for class_decl::get_pretty_representation and
union_decl::get_pretty_representation where the pretty internal
representation for class/union decl would now be
__anonymous_{struct,union}__.

The patch scouts the uses of get_pretty_representation() to make sure
to use avoid using the internal-form of the pretty representations
when it's not warranted.  It also updates the doxygen comments of the
overloads of that function.

In the abixml reader, we were wrongly canonicalizing array types
early, even before they were fully created.  The was leading to
spurious type chances down the road.

The patch also fixes the caching of the name of function types by
making it consistent with caching of the names of the other types of
the system.  The idea is that we don't cache the name of a function
type until it's canonicalize.  This is because the type might be
edited during its pre-canonicalization life time; and that editing
might change its name.  However once the type is canonicalized, it
becomes immutable.  At that point we can cache its name, for
performance purposes.  Note that we need to do that both for the
"internal version" of the type name (used for canonilization purposes)
and the "non-internal version" one, which is used for other purposes.

This caching scheme wasn't respected for function types, so we were
caching a potentially wrong name for the type after its
canonicalization.

Last but not least, there is a problem that makes canonical type
comparison different from structural type comparison.
Let's consider these two declarations:

    typedef int FirstInt;
    typedef int SecondInt;

Now, consider these two pointer types: FirstInt* and SecondInt*;
These two pointer types are canonically different because they have
different type names.  This is because during type canonicalization,
types with the same "pretty representation" are compared against each
other.  So types with different type names will certainly have
different pretty representations and won't be compared; they are thus
going to have different canonical types.

However, FirstInt* and SecondInt* do compare equal, structurally,
because the equals overload for pointer_type_def compares the
pointed-to types of pointers by peeling off typedefs.  So, here, as
both pointed-to types are 'int' when the typedefs are peeled off, the
two pointers structurally compare equal.  This discrepancy between
structural and canonical equality introduces subtle and spurious type
changes depending on the order in which types are canonicalized.  For
instance:

    struct {FirstInt* m0;};   /* First type.  */

    struct {SecondInt* m0;};  /* Second type. */

If FirstInt* and SecondInt* are canonicalized before their containing
anonymous types, then the two anonymous types will compare different
(because FirstInt* and SecondInt* compare different) and have
different canonical types.  If, however, the anonymous types are
canonicalized before FirstInt* and SecondInt*, then will compare equal
because FirstInt* and SecondInt* are structurally equivalent.
FirstInt* and SecondInt* will be canonicalized latter and have
different canonical types (because they have different type names)
despite being structurally equivalent.

The change in the order of canonicalization can happen when
canonicalizing types from a corpus coming from DWARF as opposed to
canonicalizing types from a corpus coming from abixml.

The patch fixes this discrepancy by not peeling off typedefs from the
pointed-to types when comparing pointers.  Note that this makes us
regress on bug https://sourceware.org/bugzilla/show_bug.cgi?id=27236,
where the typedef peeling was introduced.  In hindsight, introducing
that typedef peeling was a mistake.  I'll try to address that bug
again in a subsequent patch.

	* doc/manuals/abidiff.rst: Add documentation for the --debug
	option.
	* src/abg-ir.cc (equals): In the overload for decl_base consider
	the internal version of qualified decl name.  In the overload for
	pointer_type_def do not peel typedefs off from the compared
	pointed-to types.  In the overload for typedef_decl compare the
	typedef as a decl as well.  In the overload for var_decl, compare
	variables that have the same ELF symbols without taking into
	account their qualified name, rather than their name.  Stop
	comparing data member without considering their names.
	In the overload for class_or_union, when a decl-only class that is
	ODR-relevant is compared against another type, assume that
	equality if names are equal.  This is useful in environments where
	some TUs are ODR-relevant and others aren't.
	(*::get_pretty_representation): Update doxygen comments.
	(enum_type_decl::get_pretty_representation): Return an internal
	pretty representation that is stable across all anonymous enums.
	(var_decl::get_anon_dm_reliable_name): Use the non-internal pretty
	representation for anonymous data members.
	(function_type::priv::temp_internal_cached_name_): New data
	member.
	(function_type::get_cached_name): Cache the internal name after
	the function type is canonicalized.  Make sure internal name and
	non-internal name are cached separately.
	(class_or_union::find_anonymous_data_member): Look for the anonymous
	data member by looking at its non-internal name.
	({class, union}_decl::get_pretty_representation): Use something like "class
	__anonymous_{union,struct}__" for all anonymous classes, so that they can
	all be compared against each other during type canonicalization.
	(type_has_sub_type_changes): Use non-internal pretty
	representation.
	(hash_type_or_decl, function_decl_is_less_than:): Use internal
	pretty representation for comparison here.
	* src/abg-reader.cc (read_context::maybe_canonicalize_type): Don't
	early canonicalize array types.
	* src/abg-writer.cc (annotate): Use non-internal pretty
	representation.
	* tests/data/test-diff-filter/test-PR27995-report-0.txt: New
	reference report.
	* tests/data/test-diff-filter/test-PR27995.abi: New test input
	abixml file.
	* tests/data/Makefile.am: Add test-PR27995.abi,
	test-PR27995-report-0.txt to the source distribution.
	* tests/data/test-annotate/libtest23.so.abi: Adjust.
	* tests/data/test-diff-dwarf/test6-report.txt: Adjust.
	* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt: Adjust.
	* tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt: Adjust.
	* tests/data/test-diff-filter/test41-report-0.txt: Adjust.
	* tests/data/test-diff-filter/test43-decl-only-def-change-leaf-report-0.txt: Adjust.
	* tests/data/test-diff-filter/test8-report.txt: Adjust.
	* tests/data/test-diff-pkg/libICE-1.0.6-1.el6.x86_64.rpm--libICE-1.0.9-2.el7.x86_64.rpm-report-0.txt:
	Adjust.
	* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-0.txt:
	Adjust.
	* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
	Adjust.
	* tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-3.txt:
	Adjust.
	* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
	Adjust.
	* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
	Adjust.
	* tests/data/test-diff-suppr/test39-opaque-type-report-0.txt: Adjust.
	* tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi: Adjust.
	* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Adjust.
	* tests/data/test-read-dwarf/libtest23.so.abi: Adjust.
	* tests/data/test-read-dwarf/test-libandroid.so.abi: Adjust.
	* tests/data/test-read-dwarf/test11-pr18828.so.abi: Adjust.
	* tests/data/test-read-dwarf/test12-pr18844.so.abi: Adjust.
	* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Adjust.
	* tests/test-diff-filter.cc (in_out_specs): Add the
	test-PR27995.abi to the test harness.
	* tools/abidiff.cc (options::do_debug): New data member.
	(options::options): Initialize it.
	(parse_command_line): Parse --debug.
	(main): Activate self comparison debug if the user provided
	--debug.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:38:14 +02:00
Dodji Seketeli
2d276b67ed ir: Tighten type comparison optimization for Linux kernel binaries
types_defined_same_linux_kernel_corpus_public() performs an
optimization while comparing two types in the context of the Linux
kernel.  If two types of the same kind and name are defined in the
same corpus and in the same file, then they ought to be equal.

For two anonymous classes that have naming typedefs, the function
forgets to ensure that the naming typedefs have the same name.

I have no binary that exhibits the potential issue, but I stumbled
upon the problem while looking at something else that uncovered
the problem.  This change doesn't impact any of the binaries of the
regression suite at the moment, though.

Fixed thus.

	* src/abg-ir.cc (types_defined_same_linux_kernel_corpus_public):
	Ensure that anonymous classes with naming typedefs have identical
	typedef names.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:37:50 +02:00
Dodji Seketeli
d71518dbf0 ir: Tighten the test for anonymous data member
In is_anonymous_data_member(), we only test that the name of the data
member is empty; we forget to test that decl_base::get_is_anonymous()
is true.  This might make us wrongly think that a data member is
anonymous in cases like in the equals() function for var_decl, where
we temporarily set the name of the compared var_decl to "" before
invoking the decl_base::operator==.  We do this to perform the
comparison by not taking into account the name of the variable.

This hasn't yet happened on the binaries of the regression test suite,
but it's definitely wrong so I am fixing it here.

	* src/abg-ir.cc: (is_anonymous_data_member): Consider
	decl_base::get_is_anonymous as well.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:37:02 +02:00
Dodji Seketeli
8926b2f3d1 ir: Improve the debugging facilities
While looking at something else, I stumbled across some minor issues
in the debugging facilities I use to track self comparison problems.

I added a missing ABG_RETURN macro in the stack of equals() function
to better detect when there is a change, under the debugger.

I also fixed get_debug_representation() to properly display the
class/enum name (as expected) rather their pretty representation.

	* src/abg-ir.cc (maybe_compare_as_member_decls): Add a missing
        ABG_RETURN
	(get_debug_representation): Display the name of class and enums,
	not their pretty representation.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-08-11 17:33:32 +02:00
Giuliano Procida
cfd81dec10 PR28060 - Invalid offset for bitfields
Bitfield and other member offsets can be specified in DWARF using:

- DW_AT_data_bit_offset, or
- DW_AT_data_member_location and optionally DW_AT_bit_offset.

The code would only use the value DW_AT_data_member_location if there
was no DW_AT_bit_offset. This commit fixes this and adjusts
documentation and affected tests.

	* src/abg-dwarf-reader.cc (read_and_convert_DW_at_bit_offset):
	Update documentation.
	(die_member_offset): Treat DW_AT_bit_offset as an optional
	adjustment to DW_AT_data_member_location.
	* tests/data/test-annotate/test13-pr18894.so.abi: Update.
	* tests/data/test-annotate/test15-pr18892.so.abi: Update.
	* tests/data/test-annotate/test17-pr19027.so.abi: Update.
	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Update.
	* tests/data/test-annotate/test21-pr19092.so.abi: Update.
	* tests/data/test-diff-dwarf-abixml/PR25409-librte_bus_dpaa.so.20.0.abi:
	Regenerate.
	* tests/data/test-diff-pkg/libcdio-0.94-1.fc26.x86_64--libcdio-0.94-2.fc26.x86_64-report.1.txt:
	Report now empty.
	* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Update.
	* tests/data/test-read-dwarf/PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi:
	Update.
	* tests/data/test-read-dwarf/test13-pr18894.so.abi: Update.
	* tests/data/test-read-dwarf/test15-pr18892.so.abi: Update.
	* tests/data/test-read-dwarf/test17-pr19027.so.abi: Update.
	* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Update.
	* tests/data/test-read-dwarf/test21-pr19092.so.abi: Update.
	* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
	Update.

Signed-off-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-07-19 13:17:29 +02:00
Giuliano Procida
e6fd6b8a57 abg-ir.h: add declaration of operator<< for elf_symbol::visibility
There is a formatted output operator for elf_symbol::visibility in
abg-ir.cc. However, it had no visibile declaration and was not usable
by library users. This commit adds the declaration.

	* include/abg-ir.h (operator<<(elf_symbol::visibility): Add
	declaration.

Signed-off-by: Giuliano Procida <gprocida@google.com>
2021-07-16 17:39:42 +02:00
Giuliano Procida
401ec26be6 ir: remove "is Linux string constant" property from elf_symbol
This boolean property was obsoleted by the new symtab reader
implementation. It has no users.

Following this change, the find_ksymtab_strings_section function joins
find_ksymtab_section and find_ksymtab_gpl_section in having no users.

	* include/abg-ir.h (elf_symbol::elf_symbol): Drop
	is_linux_string_cst argument.
	(elf_symbol::create): Likewise.
	(elf_symbol::get_is_linux_string_cst): Drop method.
	* src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab):
	Remove code that gets the index of the __ksymtab_strings
	section. Drop corresponding elf_symbol::create argument.
	(lookup_symbol_from_gnu_hash_tab): Likewise.
	(lookup_symbol_from_symtab): Likewise.
	(create_default_fn_sym): Drop false is_linux_string_cst
	argument to elf_symbol::create.
	* src/abg-ir.cc (elf_symbol::priv::is_linux_string_cst_): Drop
	member variable.
	(elf_symbol::priv default ctor): Drop initialisation of
	is_linux_string_cst_.
	(elf_symbol::priv normal ctor): Drop is_linux_string_cst
	argument and corresponding is_linux_string_cst_
	initialisation.
	(elf_symbol::elf_symbol ctor): Drop is_linux_string_cst
	argument and corresponding forwarding to priv ctor.
	(elf_symbol::create): Drop is_linux_string_cst argument and
	corresponding forwarding to ctor.
	(elf_symbol::get_is_linux_string_cst): Drop method.
	* src/abg-reader.cc (build_elf_symbol): Drop false
	is_linux_string_cst argument to elf_symbol::create.
	* src/abg-symtab-reader.cc (symtab::load): Likewise.

Signed-off-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-07-16 17:02:12 +02:00
Matthias Maennich
86c06ad684 Consistently use std::unique_ptr for private implementations (pimpl)
In the absence of non-refcounting smart pointers before C++11,
std::shared_ptr was commonly used instead. Having bumped the standard to
C++11, allows us to use std::unique_ptr consistently avoiding any costs
involved with shared_ptr ref counting. Hence do that and add default
virtual destructors where required.

	* include/abg-comparison.h (diff_maps): use unique_ptr for priv_
	(diff_context): Likewise.
	(diff_traversable_base): Likewise.
	(type_diff_base): Likewise.
	(decl_diff_base): Likewise.
	(distinct_diff): Likewise.
	(var_diff): Likewise.
	(pointer_diff): Likewise.
	(reference_diff): Likewise.
	(array_diff): Likewise.
	(qualified_type_diff): Likewise.
	(enum_diff): Likewise.
	(class_or_union_diff): Likewise.
	(class_diff): Likewise.
	(base_diff): Likewise.
	(scope_diff): Likewise.
	(fn_parm_diff): Likewise.
	(function_type_diff): Likewise.
	(function_decl_diff): Likewise.
	(typedef_diff): Likewise.
	(translation_unit_diff): Likewise.
	(diff_stats): Likewise.
	(diff_node_visitor): Likewise.
	* include/abg-corpus.h (corpus): Likewise.
	(exported_decls_builder): Likewise.
	(corpus_group): Likewise.
	* include/abg-ini.h (property): Likewise.
	(property_value): Likewise.
	(string_property_value): Likewise.
	(list_property_value): Likewise.
	(tuple_property_value): Likewise.
	(simple_property): Likewise.
	(list_property): Likewise.
	(tuple_property): Likewise.
	(config): Likewise.
	(section): Likewise.
	(function_call_expr): Likewise.
	* include/abg-interned-str.h (interned_string_pool): Likewise.
	* include/abg-ir.h (environment): Likewise.
	(location_manager): Likewise.
	(type_maps): Likewise.
	(translation_unit): Likewise.
	(elf_symbol::version): Likewise.
	(type_or_decl_base): Likewise.
	(scope_decl): Likewise.
	(qualified_type_def): Likewise.
	(pointer_type_def): Likewise.
	(array_type_def): Likewise.
	(subrange_type): Likewise.
	(enum_type_decl): Likewise.
	(enum_type_decl::enumerator): Likewise.
	(typedef_decl): Likewise.
	(dm_context_rel): Likewise.
	(var_decl): Likewise.
	(function_decl::parameter): Likewise.
	(function_type): Likewise.
	(method_type): Likewise.
	(template_decl): Likewise.
	(template_parameter): Likewise.
	(type_tparameter): Likewise.
	(non_type_tparameter): Likewise.
	(template_tparameter): Likewise.
	(type_composition): Likewise.
	(function_tdecl): Likewise.
	(class_tdecl): Likewise.
	(class_decl::base_spec): Likewise.
	(ir_node_visitor): Likewise.
	* include/abg-suppression.h (suppression_base): Likewise.
	(type_suppression::insertion_range): Likewise.
	(type_suppression::insertion_range::boundary): Likewise.
	(type_suppression::insertion_range::integer_boundary): Likewise.
	(type_suppression::insertion_range::fn_call_expr_boundary): Likewise.
	(function_suppression): Likewise.
	(function_suppression::parameter_spec): Likewise.
	(file_suppression): Likewise.
	* include/abg-tools-utils.h (temp_file): Likewise.
	(timer): Likewise.
	* include/abg-traverse.h (traversable_base): Likewise.
	* include/abg-workers.h (queue): Likewise.
	* src/abg-comparison.cc (diff_context): add default destructor.
	(diff_maps): Likewise.
	(corpus_diff): Likewise.
	(diff_node_visitor): Likewise.
	(class_or_union_diff::get_priv): adjust return type.
	(class_diff::get_priv): adjust return type.
	* src/abg-corpus.cc (corpus): add default destructor.
	* src/abg-ir.cc (location_manager): Likewise.
	(type_maps): Likewise.
	(elf_symbol::version): Likewise.
	(array_type_def::subrange_type): Likewise.
	(enum_type_decl::enumerator): Likewise.
	(function_decl::parameter): Likewise.
	(class_decl::base_spec): Likewise.
	(ir_node_visitor): Likewise.

Signed-off-by: Matthias Maennich <maennich@google.com>
2021-07-16 11:16:47 +02:00
Matthias Maennich
578ba12139 symtab-reader: add support for binaries compiled with CFI
Control-Flow-Integrity (CFI) when enabled in clang built binaries
introduces an indirection when looking up ELF symbols. For DSO, the
symbol table (.dynsym) will still contain the symbols, but additional
symbols with suffix .cfi will be added to the full .symtab.
Unfortunately, the DWARF debug information refers to CFI symbols by
address to the .cfi suffixed variants as they point to the actual
implementation.

When the dwarf reader is determining whether to suppress variable or
function declarations, it does so by identifying if there is an
associated ELF symbol at the given address read from DWARF. Unless we
know about the alternative address, this will fail and the type
information will be suppressed.

Hence add the .cfi symbol values to the lookup map to associate their
address with the corresponding publicly exported symbol.

	* src/abg-symtab-reader.cc (symtab::load_): use new
	add_alternative_address_lookups method.
	(add_alternative_address_lookups): New method.
	* src/abg-symtab-reader.h (add_alternative_address_lookups): new
	function declaration.
	* tests/data/test-read-dwarf/test-libaaudio.so: New test data.
	* tests/data/test-read-dwarf/test-libaaudio.so.abi: New test data.
	* tests/data/Makefile.am: Add the two new tests input to source
	distribution.
	* tests/test-read-dwarf.cc: New test case.

Reported-by: Dan Albert <danalbert@google.com>
Reviewed-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-07-15 17:47:33 +02:00
Matthias Maennich
3a22dfaff6 elf-helpers: refactor find_symbol_table_section
Refactor the acquisition of symtabs to explicitly provide functionality
to get the .symtab and .dynsym sections. A later patch will make use of
that to acquire .symtab while find_symbol_table_section() still provides
.dynsym as default symbol table.

This also adds a new overload to find_section to acquire the first
section by type and adjusts find_symbol_table_section() to make use of
those functions.

	* src/abg-elf-helpers.cc(find_section): New overload.
	(find_symtab_section): New function.
	(find_dynsym_section): New function.
	(find_symbol_table_section): Use new find_*_section functions.
	* src/abg-elf-helpers.h(find_section): New overload declaration.
	(find_symtab_section): New function declaration.
	(find_dynsym_section): New function declaration.

Reviewed-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
2021-07-15 16:14:10 +02:00
Dodji Seketeli
e2e253e5b1 Bug 27980 - Fix updating of type scope upon type canonicalization
Once a type T is canonicalized, its scope is updated so that the
vector returned by scope_decl::get_canonical_types() now contains the
new canonical type of T.  This works, obviously, even when the scope
is itself a type.

This works well on binaries compiled using C only because, currently,
libabigail de-duplicates the DIEs of types.  This means that if the
scope of T is a non-anonymous type, the class of equivalence of that
scope contains just one element.  So updating the scope of T implies
updating just one scope.

On binaries where some files are compiled using C++ however, type DIEs
are not de-duplicated.  This is just because that feature hasn't yet
been implemented in libabigail.  Anyway, in that case, if the scope of
T is a non-anonymous type, the class of equivalence of that scope
contains more than one element.  So updating the scope of T implies
updating the scope of all the elements of the class of equivalence T.
In practise, that means updating the canonical type (scope) of T.

Libabigail fails to update the canonical type (scope) of T.  Later at
abixml emitting time, just emitting the canonical types of the scope
of T is not enough to emit the canonical type of T.  And that's how
the abixml emitter forgets to emit some types as reported in the bug
https://sourceware.org/bugzilla/show_bug.cgi?id=27980.

This patch fixes that issue.

I also noticed that when emitting abixml for unions, the emitter
fails to emit the canonical member types of the union, unlike what is
done for class types.  So that is fixed as well.

The binary provided in the bug report is added to the regression
testsuite.

	* src/abg-ir.cc (canonicalize): Update the
	scope_decl::get_canonical_types() of canonical type of the
	containing type of the newly canonicalized type.
	* src/abg-writer.cc (write_union_decl): Write the canonical types
	contained in the current union scope, just like we do for classes.
	* tests/data/test-read-dwarf/test16-pr18904.so.abi: Adjust.
	* tests/data/test-types-stability/pr27980-libc.so: New binary
	input file.
	* tests/data/Makefile.am: Add the test input file above to source
	distribution.
	* tests/test-types-stability.cc (elf_paths): Add the new test
	input file to this test harness.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-18 14:47:04 +02:00
Giuliano Procida
caf06d7e5c abg-reader: Create a fresh corpus object per corpus
Currently the XML reader reuses the same corpus object for all
corpora in a corpus group. This has an unwanted side-effect: any
abi-instr with the same path in different corpora will collide and
parts of the ABI will be lost.

Creating a new corpus object for every abi-corpus element seems like
the right thing to do. Testing with large ABIs containing many corpora
also shows a modest (~10%) abidiff speed improvement.

	* src/abg-reader.cc (read_corpus_from_input): Always create a
	fresh corpus object for each abi-corpus XML element.

Signed-off-by: Giuliano Procida <gprocida@google.com>
2021-06-10 18:50:10 +02:00
Giuliano Procida
25bd77e31e abg-reader: Ensure corpus always has a symtab reader
In the presence of an empty abi-corpus element and with the following
change to always allocate a fresh corpus object, such objects can
sometimes be left without a symtab reader, instead of inheritng one
from the previous corpus.

The reader is called to obtain sorted lists of symbols during ABI
comparisons. The simplest way to avoid a crash is to maintain the
invariant that a reader object is always present.

With this change, if there is bad XML preventing symbols from being
read, no error is raised as before, but the logic has been tweaked so
that abi-instr parsing will nevertheless be attempted.

	* src/abg-reader.cc (read_symbol_db_from_input): Fix
	documentation for this function. Allow "successful parsing" to
	include the case where no symbols were present in the input.
	(read_corpus_from_input): Unconditionally set a symtab reader
	on the corpus object. Unconditionally parse the abi-instr of a
	corpus.

Signed-off-by: Giuliano Procida <gprocida@google.com>
2021-06-10 18:48:17 +02:00
Giuliano Procida
5ccbfd4f29 dwarf-reader: Create new corpus unconditionally
The DWARF reader appears to create a new corpus object only if one is
not already present. However, the only case where there can be
multiple corpora is when build_corpus_group_from_kernel_dist_under is
called and this function clears down the reader context, including the
current corpus, between reading ELF objects.

So it's clearer to just create a fresh corpus object unconditionally
in the DWARF reader.

	* src/abg-dwarf-reader.cc (read_debug_info_into_corpus):
	Create new corpus object unconditionally.

Signed-off-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-10 17:53:03 +02:00
Ben Woodard via Libabigail
519d7ce8e5 Fix trivial typo when printing version string
When abicompat prints its version string, it does not terminate
it with a newline the way that other commands do. Contributed by
Bolo.

	* tools/abicompat.cc (main): Add a newline after version string.

Signed-off-by: Ben Woodard <woodard@redhat.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-10 15:27:35 +02:00
Dodji Seketeli
d604e33793 Revert "Fix trivial typo when printing version string"
This reverts commit ad619f14ea.
2021-06-10 15:26:36 +02:00
Ben Woodard via Libabigail
ad619f14ea Fix trivial typo when printing version string
When abicompat prints its version string, it does not terminate
it with a newline the way that other commands do. Contributed by
Bolo.

Signed-off-by: Ben Woodard <woodard@redhat.com>
2021-06-10 15:22:53 +02:00
Dodji Seketeli
e330b57a6a doc: Fix typo
David Marchand <dmarchand@redhat.com> found this typo.  Fixed thus.

	* doc/manuals/libabigail-concepts.rst: Fix typo.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-10 15:14:41 +02:00
Dodji Seketeli
9238ff4b07 abg-reader: Fix typo
* src/abg-reader.cc
	(read_context::maybe_check_abixml_canonical_type_stability): Fix
	typo.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 19:45:25 +02:00
Dodji Seketeli
923a355f16 abidw: Remove temporary .typeid files when using --debug-abidiff
I noticed that the temporary typeid file generated by abidw when using
the --debug-abidiff option was left behind.  This patch removes it.

	* tools/abidw.cc (load_corpus_and_write_abixml): Remove temporary
	typeid file after its use.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 19:45:22 +02:00
Dodji Seketeli
9681ab04d2 Fix recursive array type definition
This is a follow-up of the patch below:

commit b00ba10e1d
Author: Dodji Seketeli <dodji@redhat.com>
Date:   Sat May 22 01:07:26 2021 +0200

    xml reader: Fix recursive qualified & reference type definition

    This is a followup patch for the fix for
    https://bugzilla.redhat.com/show_bug.cgi?id=1944088, which was in the
    patch:

        commit 51ae965305
        Author: Dodji Seketeli <dodji@redhat.com>
        Date:   Fri May 21 23:55:44 2021 +0200

            abixml reader: Fix recursive type definition handling

This patch basically adjusts build_array_type_def to build the array
type early without trying to create the array element type first.  The
array type is then registered, and then the array element type is
created.  That way, if the element type indirectly needs the array
type being created, then it's going to be used.  Then the element type
is set to the array once it's created.

The patch adjusts the code of the array type to allow creating the
array without element types and then setting the element type later.

	* include/abg-ir.h (array_type_def::update_size): Declare new
	private member function.
	(array_type_def::array_type_def): Declare ...
	* src/abg-ir.cc (array_type_def::array_type_def): ... a new
	constructor that takes no element type.
	(array_type_def::update_size): Define this helper private member
	function.
	(array_type_def::get_subrange_representation): Adjust for this to
	work when there is no element type setup yet.
	(array_type_def::{set_element_type, append_subranges}): Update the
	size and name of the array.
	* src/abg-reader.cc (build_array_type_def): Create the array type
	before the element type so that the later can re-use the former.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 19:44:52 +02:00
Dodji Seketeli
1cfbff1b30 rhbz1951526 - SELF CHECK FAILED for 'gimp-2.10'
This is a fix for bug https://bugzilla.redhat.com/show_bug.cgi?id=1951526.

Although it's a patch for one bug, it addresses several different
issues that cause the observed self comparison failure.  As is often
the case on this kind of problems, the failure is difficult to
reproduce on a synthetic test case so I'll explain the root causes in
this commit log.

There are 4 different root causes to this problem.  As I couldn't come
up with a reduced test case for each one of them I am adding the fixes
for those 4 issues in this commit, along with a new regression test
extracted from the initial bugzilla problem report.

So, overall, the symptom we are seeing here is that when we build an
IR for the input binary gimp-2.0, save that IR into abixml, and read
back that abixml into another IR, comparing the two IR shows changes;
it should show no change whatsoever.  This is what we call in
libabigail jargon a self comparison (or self check) failure.

As alluded to in my introduction above, there appear to be 4 different
root causes for that self comparison failure.

1/ The first cause has to do with a situation about two anymous enums
that are (wrongly) considered different from an ABI point of view.
Using the debugging capabilities recently gained by libabigail, I
could notice that the two enums are:

    (gdb) p debug(&l)
    enum __anonymous_enum__ : unnamed-enum-underlying-type-32
    {
      // size in bits: 32
      // translation unit: /usr/src/debug/gimp-2.10.22-2.el9.1.aarch64/app/<artificial>-757de
      // @: 0x698fb68, @canonical: 0

      GIMP_INTERPOLATION_NONE = 0,
      GIMP_INTERPOLATION_LINEAR = 1,
      GIMP_INTERPOLATION_CUBIC = 2,
      GIMP_INTERPOLATION_NOHALO = 3,
      GIMP_INTERPOLATION_LOHALO = 4,
    };

    $1 = (abigail::ir::decl_base *) 0x698fba0
    (gdb) p debug(&r)
    enum __anonymous_enum__ : unnamed-enum-underlying-type-32
    {
      // size in bits: 32
      // translation unit: /usr/src/debug/gimp-2.10.22-2.el9.1.aarch64/app/app.c
      // @: 0xa6d83e8, @canonical: 0

      GIMP_INTERPOLATION_NONE = 0,
      GIMP_INTERPOLATION_LINEAR = 1,
      GIMP_INTERPOLATION_CUBIC = 2,
      GIMP_INTERPOLATION_NOHALO = 3,
      GIMP_INTERPOLATION_LOHALO = 4,
      GIMP_INTERPOLATION_LANCZOS = 3,
    };

    $2 = (abigail::ir::decl_base *) 0xa6d8420
    (gdb)

Note how the second enum has a new enumerator named
'GIMP_INTERPOLATION_LANCZOS', but its value is '3', which is the exact
same value of as the one of the existing enumerator
GIMP_INTERPOLATION_NOHALO.

During type canonicalization of the IR from the input binary,
libabigail (wrongly) considers these two enums as being different.
This leads to the type 'Gimp*' (or anything type indirectly using any
one of the anonymous enums above) coming from one translation unit
being considered different from a type 'Gimp*' coming from another
translation unit, just because their are not using either one version
of the anonymous enum above or the other.

This leads to a *LOT* of spurious type changes from the first IR, that
are saved into abixml.

To fix this first problem, this patch introduces "two modes" of
comparing enums. There is a binary-only mode which only looks
enumerator values, not enumerator names.  And then there is the
source-level mode which looks at both enumerator names and values when
comparing enums.  The former mode is used during type
canonicalization.  However, when a change is detected between two
enums, then the diff-IR built to describe the change is constructed
using the later mode.  Using the later mode allows to describe
precisely things like enumerator insertion/removal by referring to the
names of the added/removed enumerators.

2/ The second root cause is that a struct, say, 'struct _GimpImage'
from a translation unit is considered different from a 'struct
_GimpImage' because the DWARF reader wrongly assign them different sizes.

Here is what it looks like in the debugger:

    (gdb) p debug(&l)
    struct _GimpImage
    {  // size in bits: 384
       // definition point: ../../app/core/gimpimage.h:39:1
       // translation unit: /usr/src/debug/gimp-2.10.22-2.el9.1.aarch64/app/<artificial>-757de
       // @: 0x69b9d10, @canonical: 0

      GimpViewable parent_instance;'
      Gimp* gimp;'
      GimpImagePrivate* priv;'
    };

    $8 = (abigail::ir::type_base *) 0x69b9d10
    (gdb) p debug(&r)
    struct _GimpImage
    {  // size in bits: 0
       // definition point: :0:0
       // translation unit: /usr/src/debug/gimp-2.10.22-2.el9.1.aarch64/app/<artificial>-8813f
       // @: 0x6ac7a50, @canonical: 0

    };

Notice how the second 'struct _GimpImage' has a size of zero.

This is because when reading the DWARF, we first encounter the DIE for
the first' struct _GimpImage' and we properly build a type for it,
along with its declaration.  Then when we encounter another DIE
defining 'struct _GimpImage' again, from a different translation unit,
the DWARF reader recognizes that it's a DIE for a declaration of
'struct _GimpImage' and fails to re-use the previous definition for
'struct _GimpImage'.  So it wrongly builds declaration-only 'struct
_GimpImage' for it, hence the second struct _GimpImage with a zero
size.

Here again that creates spurious changes (after type canonicalization)
in types using struct _GimpImage.  And that is a lot of types,
including things like 'Gimp*' and the like.

The fix for this root cause issue is to change
add_or_update_class_type in the DWARF reader to recognize that we are
seeing a type declaration for which there was already a definition and
return that definition instead of creating a new declaration.

3/ The third root cause is better explained with a "screen shot".
Consider these two 'versions' of the same struct _GdkDevice from two
different translation units:

    struct _GdkDevice
    {  // size in bits: 576
       // definition point: /usr/include/gtk-2.0/gdk/gdkinput.h:98:1
       // translation unit: /usr/src/debug/gimp-2.10.22-2.el9.1.aarch64/app/<artificial>-2d0352
       // @: 0x8820530, @canonical: 0

      GObject parent_instance;'
      gchar* name; // uses canonical type '@0x6892980'
      GdkInputSource source;'
      GdkInputMode mode;'
      gboolean has_cursor; // uses canonical type '@0x688dd00'
      gint num_axes; // uses canonical type '@0x688dd00'
      GdkDeviceAxis* axes;'
      gint num_keys; // uses canonical type '@0x688dd00'
      GdkDeviceKey* keys;'
    };

    $9 = (abigail::ir::type_base *) 0x8820530
    (gdb) p debug(&r)
    struct _GdkDevice
    {  // size in bits: 576
       // definition point: /usr/include/gtk-2.0/gdk/gdkinput.h:98:1
       // translation unit: /usr/src/debug/gimp-2.10.22-2.el9.1.aarch64/app/<artificial>-1fdb18
       // @: 0x7cd71e0, @canonical: 0

      GObject parent_instance;'
      gchar* _g_sealed__name; // uses canonical type '@0x6892980'
      GdkInputSource _g_sealed__source;'
      GdkInputMode _g_sealed__mode;'
      gboolean _g_sealed__has_cursor; // uses canonical type '@0x688dd00'
      gint _g_sealed__num_axes; // uses canonical type '@0x688dd00'
      GdkDeviceAxis* _g_sealed__axes;'
      gint _g_sealed__num_keys; // uses canonical type '@0x688dd00'
      GdkDeviceKey* _g_sealed__keys;'
    };

    $10 = (abigail::ir::type_base *) 0x7cd71e0
    (gdb)

Notice how the name of the second data member 'name' was changed to
'_g_sealed_name'.  A similar scheme happens to several other data
member names.  The offsets and types of the struct _GdkDevice haven't
changed however.  So from an ABI standpoint, the two versions of that
struct are equal.  Libabigail consider them different however.
Because that type is used by tons of other types of the binary being
analyzed, this leads to lots of spurious canonical type difference
that shouldn't be there.

These three issues are magnified by the fact that the gimp binary is
compiled using "link time optimization".  That brings in a lot more
opportunities to see these underlying issues that have been there for
a long time.

4/ The fourth and last root cause issue.  When the abixml writer emits
a translation unit (TU), it keeps track of the 'non-emitted referred
to type' of the currently emitted translation unit and emits them at
the end of each TU.  For instance, if the type 'Gimp*' (pointer to
Gimp) was emitted, and yet the referred-to type 'Gimp' wasn't emitted,
the TU writer makes sure to emit the referred-to 'Gimp' type at the
end of the TU.  This has been going on for quite some time now.

The problem however is that although the non-emitted referred-to type
was referred to in this current TU, it might no have been *DEFINED* in
this TU.  In that case, it should not be emitted in this TU.

Otherwise, the TU where that type is defined in the abixml might
appear different from where it is defined in the initial binary,
leading to self comparison failures down the road.

This patch ensures that a non-emitted referred-to type is always
emitted in the TU it belongs to.

5/ After doing all this, it appears that we were forgetting to emit
some function types that were defined in TUs emitted earlier and yet
were being referred-to later.  Looking closer, I realized that we
should just emit function types seen in a given TU, regardless of the
referred-to relation.

The problem with that is that function types are special in libabigail
because there are two situation in which they are created.

Basically, a function type is created by the DWARF DIE
DW_TAG_subroutine_type.  This is for instance how pointer to functions
are represented in DWARF, namely, by a DW_TAG_pointer_type that points
to a DW_TAG_subroutine_type.  That is represented in the libabigail ir
by an instance of the abigail::ir::function_type type.  This is
represented in abixml as a 'function-type' XML element.

But then, libabigail considers that all decls have a type.  This
applies obviously for variables or data member.  Right.  But then,
libabigail considers that a function is also a decl, which has a type.
And the type of a function is a function type, represented by the same
abigail::ir::function_type.  A practical difference with the former
situation is that function decls are *NOT* represented in abixml using
a 'function-type' element.  Instead a 'function-decl' XML element uses
return type and parameter elements to represent the types involved
with a function decl.

Said otherwise, the former 'function type' concept used to represent
the type of functions in the libabigail IR is artificial.  This
artificial-ness was not explicitly expressed in libabigail.  This
patch now expresses that artificial-ness for function types.  So the
abixml writer now just decide to not emit artificial function types,
and instead, emit all the non-artificial function types instead.

This addresses this last issues by being able to emit all
non-artificial function types defined in a given TU, without having to
bother with the fact that they are referred-to or not.

Together, fixing these 5 problems fixes this reported problem.

The changes to the reference test outputs are adjustments needed
because of the abixml output indeed changes.

	* include/abg-ir.h
	(environment::use_enum_binary_only_equality): Declare
	accessors.  (type_or_decl_base::{s,g}et_is_artificial):
	Likewise.  (decl_base::{s,g}et_is_artificial): Remove
	accessors.  * src/abg-ir.cc
	(environment::priv::use_enum_binary_only_equality): Define new
	data member.
	(environment::priv::use_enum_binary_only_equality): Define
	accessors.  (type_or_decl_base::priv::is_artificial_): Define
	new data member.  It has actually moved here from
	decl_base::priv::is_artificial_.
	(type_or_decl_base::priv::priv): Initialize it.
	(type_or_decl_base::{g,s}et_is_artificial): Define accessors.
	(decl_base::is_artificial_): Move this to
	type_or_decl_base::is_artificial_.
	(maybe_adjust_canonical_type): In a given class of equivalence
	of function types, if there is one non-artificial function
	type, then the entire class of equivalence is considered
	non-artificial; so flag the canonical function type as being
	non-artificial.  (is_enumerator_present_in_enum): Define new
	static function.  (equals): Re-arrange the overload for enums
	so the order of the enumerators doesn't count in the
	comparison.  Also, two enums with different numbers of
	enumerators can still be equal, with the right redundancy.  In
	the overload for var_decl, avoid taking into account the names
	of data members in the comparison.
	(enum_type_decl::enumerator::operator==): In the binary-level
	comparison mode, only compare the value of enumerators, not
	their name.  * src/abg-comparison.cc (compute_diff): In the
	overload for enum_type_decl, if the enums compare different
	using binary-level comparison, then use source-level
	comparison to build the diff-IR.  * src/abg-dwarf-reader.cc
	(read_context::compare_before_canonicalisation): Compare enums
	using binary-level comparison.  (add_or_update_class_type): If
	we are looking at the definition of an existing declaration
	that has been already defined then use the previous
	definition, in case we are going to need to update the
	definition.  Also, update the size only if it's needed.
	(build_function_type): By default, consider the newly built
	function type as artificial.  (build_ir_node_from_die): When
	looking at a DW_TAG_subroutine_type DIE, consider the built
	function type as non-artificial.  * src/abg-reader.cc
	(read_context::maybe_check_abixml_canonical_type_stability):
	Don't consider declaration-only classes in an ODR context
	because they don't have canonical types.
	(build_function_decl): Flag the function type of the function
	as artificial.  (build_class_decl): Make sure to reuse class
	types that were already created.  * src/abg-writer.cc
	(write_translation_unit): Allow emitting empty classes.  Make
	sure referenced types are emitting in the translation unit
	where they belong.  Avoid emitting artificial function types.
	*
	tests/data/test-alt-dwarf-file/rhbz1951526/rhbz1951526-report-0.txt:
	New test reference output.  *
	tests/data/test-alt-dwarf-file/rhbz1951526/usr/bin/gimp-2.10:
	New reference test binary input.  *
	tests/data/test-alt-dwarf-file/rhbz1951526/usr/lib/debug/.dwz/gimp-2.10.22-2.el9.1.aarch64:
	Likewise.  *
	tests/data/test-alt-dwarf-file/rhbz1951526/usr/lib/debug/usr/bin/gimp-2.10-2.10.22-2.el9.1.aarch64.debug:
	Likewise.  * tests/data/Makefile.am: Add the new test files to
	source directory.  * tests/test-alt-dwarf-file.cc: Add the new
	test inputs to this test harness.  *
	tests/data/test-abidiff/test-PR18791-report0.txt: Adjust.  *
	tests/data/test-abidiff/test-enum0-report.txt: Likewise.  *
	tests/data/test-annotate/libtest23.so.abi: Likewise.  *
	tests/data/test-annotate/libtest24-drop-fns-2.so.abi:
	Likewise.  *
	tests/data/test-annotate/libtest24-drop-fns.so.abi: Likewise.
	* tests/data/test-annotate/test-anonymous-members-0.o.abi:
	Likewise.  * tests/data/test-annotate/test13-pr18894.so.abi:
	Likewise.  * tests/data/test-annotate/test14-pr18893.so.abi:
	Likewise.  * tests/data/test-annotate/test15-pr18892.so.abi:
	Likewise.  * tests/data/test-annotate/test17-pr19027.so.abi:
	Likewise.  *
	tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
	Likewise.  *
	tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.  *
	tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.  * tests/data/test-annotate/test21-pr19092.so.abi:
	Likewise.  *
	tests/data/test-diff-dwarf-abixml/test0-pr19026-libvtkIOSQL-6.1.so.1.abi:
	Likewise.  *
	tests/data/test-diff-dwarf/PR25058-liblttng-ctl-report-1.txt:
	Likewise.  * tests/data/test-diff-dwarf/test6-report.txt:
	Likewise.  *
	tests/data/test-diff-filter/test31-pr18535-libstdc++-report-0.txt:
	Likewise.  *
	tests/data/test-diff-filter/test31-pr18535-libstdc++-report-1.txt:
	Likewise.  * tests/data/test-diff-filter/test8-report.txt:
	Likewise.  *
	tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-2.txt:
	Likewise.  *
	tests/data/test-diff-pkg/spice-server-0.12.4-19.el7.x86_64-0.12.8-1.el7.x86_64-report-3.txt:
	Likewise.  *
	tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
	Likewise.  *
	tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-1.txt:
	Likewise.  *
	tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
	* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Likewise.
	*
	tests/data/test-read-dwarf/PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi:
	Likewise.  *
	tests/data/test-read-dwarf/PR26261/PR26261-exe.abi: Likewise.
	* tests/data/test-read-dwarf/libtest23.so.abi: Likewise.  *
	tests/data/test-read-dwarf/libtest24-drop-fns-2.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/libtest24-drop-fns.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
	* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test12-pr18844.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test13-pr18894.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test14-pr18893.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test15-pr18892.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test16-pr18904.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test17-pr19027.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.  * tests/data/test-read-dwarf/test21-pr19092.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
	Likewise.  *
	tests/data/test-read-dwarf/test9-pr18818-clang.so.abi:
	Likewise.  *
	tests/data/test-read-write/test28-without-std-fns-ref.xml:
	Likewise.  *
	tests/data/test-read-write/test28-without-std-vars-ref.xml:
	Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 18:43:02 +02:00
Dodji Seketeli
ed87d0a29b reader: Canonicalizing a type once is enough
While looking at something else, I noticed that the abixml reader was
trying to canonicalize each type twice.  Once should be enough.

	* src/abg-reader.cc (build_type): Don't try to canonicalize the
	type here because all the sub-routines of this function (which
	actually build the type) already try to canonicalize it.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 18:42:24 +02:00
Dodji Seketeli
f7ad3366fb ir: make 'debug(artefact)' support showing enums
While debugging something else, I realized that 'debug(artifact)'
couldn't show the enumerators of an enum. I also realized that we were
not showing the 'declaration-only-ness' of the artefact either.  This
patch fixes that.

	* src/abg-ir.cc (get_debug_representation): Add support for
	showing details for enums.  Also show declaration-only-ness for
	class or unions.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 18:37:26 +02:00
Dodji Seketeli
f7e6ce3160 location:expand() shouldn't crash when no location manager available
While debugging, I noticed that trying to expand location not yet
associated with any location manager would crash.

This patch fixes that.

	* src/abg-ir.cc (location::expand): When no location manager is
	present, just expand to an empty location.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 18:20:53 +02:00
Dodji Seketeli
d1b4247c16 Add environment::{get_type_id_from_pointer,get_canonical_type_from_type_id}
When debugging self comparison issues, once the abixml file is read
back into memory, I often want to get the type-id of an artifact that
was read from abixml or get the canonical type of an artifact which
type-id is known.

Part of that information is indirectly present in the data member
abigail::reader::reader_context::m_pointer_type_id_map after the
.typeid file is loaded from file into memory.  The problem is that the
instance of abigail::reader::reader_context is transient as it's
destroyed quickly after the abixml file is read.  We want it to stay
alive longer.  So this patch moves that data member into
abigail::environment instead, along with its accessors.  The patch
then adds the new member functions
environment::{get_type_id_from_pointer,get_canonical_type_from_type_id}
to get the type-id of an artifact de-serialized from abixml and the
canonical type of an artifact for which we now the type-id string.

	* include/abg-ir.h (environment::{get_pointer_type_id_map,
	get_type_id_from_pointer, get_canonical_type_from_type_id}):
	Declare new member functions.
	* src/abg-ir.cc (environment::{get_pointer_type_id_map,
	get_type_id_from_pointer, get_canonical_type_from_type_id}):
	Define member functions.
	(environment::priv::pointer_type_id_map_): Move
	this data member here from ...
	* src/abg-reader.cc (read_context::m_pointer_type_id_map):
	... here.
	(read_context::get_pointer_type_id_map): Remove this as it's now
	defined in environment::get_pointer_type_id_map.
	(read_context::maybe_check_abixml_canonical_type_stability):
	Adjust.
	(build_type): Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 18:19:59 +02:00
Dodji Seketeli
e9e3e06454 ir: Enable setting breakpoint on first type inequality
When debugging type canonicalization in
type_base::get_canonical_type_for, I more often than not want to know
why a type compares different to another.  Until now, I've been doing
that by stepping in the debugger.  I figure a much efficient way of
doing that is to be able to set a breakpoint on the first occurrence
of type inequality.

To do that, I am adding a few macros to use in the 'equals' functions
to return their value: ABG_RETURN(value), ABG_RETURN_REQUAL(l,r) and
ABG_RETURN_FALSE.  Those invoke a new function called
'notify_equality_failed' when the result of the comparison is false.
This allows to just set a debugger breakpoint on
'notify_equality_failed' to know when and why the type comparison
fails.

These macros invoke notify_equality_failed only if the
WITH_DEBUG_SELF_COMPARISON macro is defined.  Otherwise, they do what
the code was doing previously.  Said otherwise, this whole shebang is
enabled only when the code is configured with
--enable-debug-self-comparison.

This patch incurs no functional change.

	* src/abg-ir.cc (notify_equality_failed): Define new static
	function if WITH_DEBUG_SELF_COMPARISON is defined.
	(ABG_RETURN_EQUAL, ABG_RETURN_FALSE, ABG_RETURN): Define new macros.
	(try_canonical_compare): Use ABG_RETURN_EQUAL rather than just
	returning the result of a comparison.
	(equals): In all the overloads, use the new ABG_RETURN* macros,
	rather than just returning boolean values.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-06-09 18:18:49 +02:00
Dodji Seketeli
b00ba10e1d xml reader: Fix recursive qualified & reference type definition
This is a followup patch for the fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1944088, which was in the
patch:

    commit 51ae965305
    Author: Dodji Seketeli <dodji@redhat.com>
    Date:   Fri May 21 23:55:44 2021 +0200

	abixml reader: Fix recursive type definition handling

After that patch, I noticed that qualified and reference types also
need to be able to handle the case where their underlying/pointed-to
type recursively refers to the type being created.  Just like typedef
and pointer types in that patch.

This patch thus adjusts build_qualified_type_decl and
build_reference_type_def to support that.  It also adjusts the
qualified_type_def and reference_type_def classes to support being
created without underlying/pointed-to type initially.

	* include/abg-ir.h (qualified_type_def::qualified_type_def):
	Declare a constructor with no underlying type.
	(reference_type_def::reference_type_def): Declare a constructor
	with no pointed-to type.
	(reference_type_def::set_pointed_to_type): Declare new method.
	* src/abg-ir.cc (qualified_type_def::priv::priv): Define a
	constructor that takes no underlying type.
	(qualified_type_def::build_name): Make this work even on
	incomplete types with no underlying type.  In that case, this
	behaves like if the underlying type is "void".
	(qualified_type_def::qualified_type_def): Define a constructor
	that takes no underlying type.
	(qualified_type_def::get_size_in_bits): Make this work on
	incomplete types with no underlying type.
	(qualified_type_def::set_underlying_type): Adjust to properly
	update this type when a new underlying type is set.  Particularly,
	its name and the lookup maps from the type scope.
	(reference_type_def::reference_type_def): Define a constructor
	that takes no pointed-to type.
	(reference_type_def::set_pointed_to_type): Define new function.
	* src/abg-reader.cc (build_qualified_type_decl): Construct the
	qualified type early before we try to construct its underlying
	type.  Associate this incomplete type with the type-id.  Then try
	to construct the underlying type.  During its construction, if
	this incomplete qualified type is needed due to recursion then it
	can be used, leading to just one qualified type being used as it
	should be.
	(build_reference_type_def): Likewise for building incomplete
	reference type first before its pointed-to type.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 12:48:12 +02:00
Dodji Seketeli
7a9fa3fe5a abixml reader: Fix recursive type definition handling
This should fix self comparison bug https://bugzilla.redhat.com/show_bug.cgi?id=1944088

This arose from a self comparison check failing on the library
libgvpr.so.2 from the graphviz-2.44.0-17.el9.aarch64.rpm package.

Now that we have facilities to see what type (instantiated from the
abixml representation of the libgvpr.so library) exactly the
canonicalization process is failing for, I decided to use it ;-)

I extracted the package and its associating debug info into a
directory named 'extract' and ran abidw --debug-abidiff on it:

    $ build/tool/abidw  --debug-abidiff -d extract/usr/lib/debug extract/usr/lib64/libgvpr.so.2

That yielded the output below:

    error: problem detected with type 'typedef Vmalloc_t' from second corpus
    error: canonical type for type 'typedef Vmalloc_t' of type-id 'type-id-170' changed from 'd72618' to '14a7448'
    error: problem detected with type 'Vmalloc_t*' from second corpus
    error: canonical type for type 'Vmalloc_t*' of type-id 'type-id-188' changed from 'd72ba8' to '14a7968'
    [...]

This tells me that "typedef Vmalloc_t", created from the abixml
compares different from its originating peer that was created from the
binary directly.  The same goes for the pointer type "Vmalloc_t*", etc.

Using the new debugging/logging functionalities from the command line
of the debugger, I could see that in the abixml reader,
build_typedef_decl can fail subtly when the underlying type of the
typedef refers to the typedef itself.  In that case, we need to ensure
that the typedef created by build_typedef_decl is the same one that is
used by the underlying type.  which is not the case at the moment.  At
the moment, the underlying type would create a new typedef beside the
one currently being created by build_typedef_decl.  That leads to more
than one typedef in the system to designate "typedef Vmalloc_t".  And
that wreaks havoc later down the road.

This patch arranges so that build_typedef_decl creates the typedef
"early" before the underlying type is created.  That typedef
temporarily has no underlying type.  It's registered as being the
typedef for the type-id string that identifies it in the abixml.  And
then the function goes to create the underlying type.  This
arrangement ensures that if the underlying type refers to the typedef
being created (via its type-id string), then the typedef that was
created early is effectively re-used.  This ensures that a typedef
which recursively refer to itself is properly represented.  It's only
when the underlying type is fully created that it's added to the
typedef.

Something similar is done for pointer types, in
build_pointer_type_def.

Note that to do this, the patch adjusts the typedef_decl and
pointer_type_def classes so that they can be created with no
underlying/pointed-to types.  The underlying/pointed-to type can thus
be added later.

I believe this patch is the minimal patch necessary to fix this issue.
The graphviz RPM is added to the regression test suite for good
measure.

After visual inspection, I realized that there are other types besides
typedef and pointer types that exhibit the same class of problem even
if they are not involved in this issue on this particular binary.  A
subsequent patch is going to address the problem for those types,
namely, qualified and reference types.

	* include/abg-ir.h (pointer_type_def::pointer_type_def): Declare a
	constructor with no pointed-to type.
	(pointer_type_def::set_pointed_to_type): Declare new method.
	(typedef_decl::typedef_decl): Declare a constructor with no
	underlying type.
	* src/abg-ir.cc (pointer_type_def::pointer_type_def): Define a
	constructor with no pointed-to type.  The pointed-to type can thus
	later be set when it becomes available.
	(pointer_type_def::set_pointed_to_type): Define new method.
	(pointer_type_def::get_qualified_name): Make this work on a
	pointer type that (momentarily) has no pointed-to type.
	(typedef_decl::typedef_decl): Define a constructor with no
	underlying type.
	(typedef_decl::get_size_in_bits): Make this work on a typedef that
	has (momentarily) no underlying type.
	(typedef_decl::set_underlying_type): Update the size and alignment
	of the typedef from its new underlying type.
	* src/abg-reader.cc (build_pointer_type_def): Construct the
	pointer type early /BEFORE/ we even try to construct its
	pointed-to type.  Associate this incomplete type with the type-id.
	Then try to construct the pointed-to type.  During the
	construction of the pointed-to type, if this pointer is needed
	(due to recursion) then the incomplete pointer type can be used,
	leading to just one pointer type used (recursively) as it should
	be.
	(build_typedef_decl): Likewise for building typedef type early
	without its underlying type so that it can used by the underlying
	type if needed.
	* tests/data/test-diff-pkg/graphviz-2.44.0-18.el9.aarch64-self-check-report-0.txt:
	New test reference output.
	* tests/data/test-diff-pkg/graphviz-2.44.0-18.el9.aarch64.rpm: New
	binary test input.
	* tests/data/test-diff-pkg/graphviz-debuginfo-2.44.0-18.el9.aarch64.rpm: Likewise.
	* tests/data/Makefile.am: Add the new test material above to
	source distribution.
	* tests/test-diff-pkg.cc (in_out_specs): Add the test inputs above
	to this test harness.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 12:39:57 +02:00
Dodji Seketeli
d94947440e Introduce artificial locations
When an abixml file is "read in" and the resulting in-memory internal
representation is saved back into abixml, the saved result can often
differ from the initial input in a non deterministic manner.  That
read-write instability is non-desirable because it generates
unnecessary changes that cloud our ability to build reliable
regression tests, among other things.  Also, that unnecessarily
increases the changes to the existing regression test reference
outputs leading to a lot more churn than necessary.

This patch tries to minimize that abixml read-write instability in
preparation of patches that would otherwise cause too much churn in
reference output files of the regression test suite.

The main reason why this read-write instability occurs is that a lot
of type definitions don't have source location.

For instance, all the types that are not user defined fall into that
category.  Those types can't be topologically sorted by using their
location as a sorting criteria.  Instead, we are currently using the
order in which those location-less types are processed by the reader
as the output (i.e, write time) order.  The problem with that approach
is that the processing order can be dependant on the other of which
OTHER TYPES likes class types are processed.  And that order can be
changed by patches in the future.  That in and of itself shouldn't
change the write order of these types.

For instance, if a class Foo has data members and member functions
whose types are non-user-defined types, then, the order in which those
data members are processed can possibly determine the order in which
those non-user-defined are processed.

This patch thus introduces the concept of artificial location.

A *NON-ARTIFICIAL* location is a source location that was emitted by
the original emitter of the type meta-data.  In the case of DWARF type
meta-data, the compiler originally emitted source location.  That
location when read is considered non-artificial, or natural, if you
prefer.

In the case of abixml however, an artificial location would be the
source location at which an XML element is encountered.

For instance, consider the abixml file below "path/to/exmaple.abi" below:

     1	    <abi-corpus version='2.0' path='path/to/example.abi'>
     2	      <abi-instr address-size='64' path='test24-drop-fns.cc' language='LANG_C_plus_plus'>
     3		<type-decl name='bool' size-in-bits='8' id='type-id-1'/>
     4	      </abi-instr>
     5	    </abi-corpus/>

I've added line numbers for ease of reading.

At line 3 of that file, the non-user defined type name "bool" is
defined using the XML element "type-decl".  Note how that element
lacks the "filepath", "line" and "column" attributes that would collectively
define the source location of that type.  So this type "bool" don't
carry any natural location.

The abixml reader can however generate an artificial location for it.
That the filepath of that artificial location would thus be the path
to that ABI corpus, i.e, "path/to/example.abi".  The line number would
be 3.  The column would be left to zero.

That artificial location will never be explicitly be written down as
an XML attribute as it can always be implicitly retrieved by
construction.

The patch changes the internal representation so that each ABI
artifact of the internal representation can now carry both an
artificial and a natural location.

When two artifacts have an artificial location, then its used to
topologically sort them.  The one that is defined topologically
"earlier" obviously comes first.

When two artifacts have a natural location then its used to
topologically sort them.

Otherwise, they are sorted lexicographically.

This makes the output of abilint a lot more read-write stable.

	* include/abg-fwd.h (get_artificial_or_natural_location): Declare
	new function.
	* include/abg-ir.h (location::location): Initialize & copy ...
	(location::is_artificial_): ... a new data member.
	(location::{g,s}et_is_artificial): New accessors.
	(location::{operator=}): Adjust.
	(type_or_decl_base::{set,get,has}_artificial_location): Declare
	new member functions.
	* src/abg-ir.cc (decl_topo_comp::operator()): In the overload for
	decl_base*, use artificial location for topological sort in
	priority.  Otherwise, use natural location.  Otherwise, sort
	lexicographically.
	(type_topo_comp::operator()): In the overload for type_base*, use
	lexicographical sort only for types that don't have location at
	all.
	(type_or_decl_base::priv::artificial_location_): Define new data
	member.
	(type_or_decl_base::{set,get,has}_artificial_location): Define new
	member functions.
	(decl_base::priv): Allow a constructor without location.  That one
	sets no natural location to the artifact.
	(decl_base::decl_base): Use decl_base::set_location in the
	constructor now.
	(decl_base::set_location): Adjust this to support setting a
	natural or an artificial location.
	(get_debug_representation): Emit debugging log showing the
	location of an artifact, using its artificial location in
	priority.
	(get_natural_or_artificial_location): Define new function.
	* src/abg-reader.cc (read_artificial_location)
	(maybe_set_artificial_location): Define new static functions.
	(read_location): Read artificial location when no natural location
	was found.
	(build_namespace_decl, build_function_decl, build_type_decl)
	(build_qualified_type_decl, build_pointer_type_def)
	(build_reference_type_def, build_subrange_type)
	(build_array_type_def, build_enum_type_decl, build_typedef_decl)
	(build_class_decl, build_union_decl, build_function_tdecl)
	(build_class_tdecl, build_type_tparameter)
	(build_non_type_tparameter, build_template_tparameter): Read and
	set artificial location.
	* src/abg-writer.cc (write_location): Don't serialize artificial
	locations.
	(write_namespace_decl): Topologically sort member declarations
	before serializing them.
	* tests/data/test-read-write/test28-without-std-fns-ref.xml:
	Adjust.
	* tests/data/test-read-write/test28-without-std-vars-ref.xml:
	Likewise.
	* tests/data/test-annotate/libtest23.so.abi: Likewise.
	* tests/data/test-annotate/libtest24-drop-fns-2.so.abi: Likewise.
	* tests/data/test-annotate/libtest24-drop-fns.so.abi: Likewise.
	* tests/data/test-annotate/test0.abi: Likewise.
	* tests/data/test-annotate/test13-pr18894.so.abi: Likewise.
	* tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
	* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
	* tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
	* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
	Likewise.
	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.
	* tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
	* tests/data/test-read-dwarf/PR22015-libboost_iostreams.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
	* tests/data/test-read-dwarf/PR25007-sdhci.ko.abi: Likewise.
	* tests/data/test-read-dwarf/PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi:
	Likewise.
	* tests/data/test-read-dwarf/PR26261/PR26261-exe.abi: Likewise.
	* tests/data/test-read-dwarf/libtest23.so.abi: Likewise.
	* tests/data/test-read-dwarf/libtest24-drop-fns-2.so.abi: Likewise.
	* tests/data/test-read-dwarf/libtest24-drop-fns.so.abi: Likewise.
	* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
	* tests/data/test-read-dwarf/test-suppressed-alias.o.abi: Likewise.
	* tests/data/test-read-dwarf/test0.abi: Likewise.
	* tests/data/test-read-dwarf/test0.hash.abi: Likewise.
	* tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise.
	* tests/data/test-read-dwarf/test11-pr18828.so.abi: Likewise.
	* tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise.
	* tests/data/test-read-dwarf/test13-pr18894.so.abi: Likewise.
	* tests/data/test-read-dwarf/test14-pr18893.so.abi: Likewise.
	* tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise.
	* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
	* tests/data/test-read-dwarf/test17-pr19027.so.abi: Likewise.
	* tests/data/test-read-dwarf/test18-pr19037-libvtkRenderingLIC-6.1.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test21-pr19092.so.abi: Likewise.
	* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.
	* tests/data/test-read-write/test28-without-std-fns-ref.xml:
	Likewise.
	* tests/data/test-read-write/test28-without-std-vars-ref.xml:
	Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 12:31:14 +02:00
Dodji Seketeli
27d2927107 Detect abixml canonical type instability during abidw --debug-abidiff
In the debugging mode of self comparison induced by the invocation of
"abidw --debug-abidiff <binary>", it's useful to be able to ensure the
following invariant:

    The pointer value of the canonical type of a type T that is
    serialized into abixml with the id string "type-id-12" (for
    instance) must keep the same canonical type pointer value when
    that abixml file is de-serialized back into memory.  This is
    possible mainly because libabigail stays loaded in memory all the
    time during both serialization and de-serialization.

This patch adds support for detecting when that invariant is not
respected.

In other words it detects when the type build from de-serializing the
type which id is "type-id-12" (for instance) has a canonical type
which pointer value is different from the pointer value of the
canonical type (of the type) that was serialized as having the type id
"type-id-12".

This is done in three phases.

The first phase happens in the code of abidw itself; after the abixml
is written on disk, another file called the "typeid file" is written
on disk as well.  That later file contains a set of records; each
record associates a "type id string" (like the type IDs that appear in
the abixml file) to the pointer value of the canonical type that
matches that type id string.  That file is thus now available for
manual inspection during a later debugger session.  This is done by
invoking the new function write_canonical_type_ids.

The second phase appears right before abixml loading time.  The typeid
file is read back and the association "type-id string" <-> is stored
in a hash map that is returned by
environment::get_type_id_canonical_type_map().  This is done by
invoking the new function load_canonical_type_ids.

The third phase happens right after the canonicalization (triggered in
the abixml reader) of a type coming from abixml, corresponding to a
given type id.  It checks if the pointer value of the canonicalization
type just computed is the same as the one associated to that type id
in the map returned by environment::get_type_id_canonical_type_map.

This is a way of verifying the "stability" of a canonical type during
its serialization and de-serialization to and from abixml and it's
done as part of "abidw --debug-abidiff <binary>".

Just as an example, here is the kind of error output that I am getting
on a real life debugging session on a binary that exhibits self
comparison error:

    $ abidw  --debug-abidiff -d <some-binary>
    error: problem detected with type 'typedef Vmalloc_t' from second corpus
    error: canonical type for type 'typedef Vmalloc_t' of type-id 'type-id-179' changed from '1a083e8' to '21369b8'
    [...]
    $

From this output, I see that the first type for which libabigail
exhibits an instability on the pointer value of the canonical type is
the type 'typedef Vmalloc_t'.  In other words, when that type is saved
to abixml, the type we read back is different.  This needs further
debugging but at least it pinpoints exactly what type we are seeing
the core issue on first.  This is of a tremendous help in the root
cause analysis needed to understand why the self comparison is
failing.

	* include/abg-ir.h (environment::get_type_id_canonical_type_map):
	Declare new data member.
	* src/abg-ir.cc (environment::priv::type_id_canonical_type_map_):
	Define new data member.
	(environment::get_type_id_canonical_type_map): Define new method.
	* include/abg-reader.h (load_canonical_type_ids): Declare new
	function.
	* src/abg-reader.cc (read_context::m_pointer_type_id_map):
	Define new data member.
	(read_context::{get_pointer_type_id_map,
	maybe_check_abixml_canonical_type_stability}): Define new methods.
	(read_context::{maybe_canonicalize_type,
	perform_late_type_canonicalizing}): Invoke
	maybe_perform_self_comparison_canonical_type_check after
	canonicalization to perform canonicalization type stability
	checking.
	(build_type): Associate the pointer value for the newly built type
	with the type id string identifying it in the abixml.  Once the
	abixml representation is dropped from memory and we are about to
	perform type canonicalization, we can still know what the type id
	of a given type coming from abixml was; it's thus possible to
	verify that the canonical type associated to that type id is the
	same as the one stored in the typeid file.
	(read_type_id_string): Define new static function.
	(load_canonical_type_ids): Define new function.
	* include/abg-writer.h (write_canonical_type_ids): Likewise.
	* src/abg-writer.cc (write_canonical_type_ids): Define new
	function overloads.
	* tools/abidw.cc (options::type_id_file_path): New data member.
	(load_corpus_and_write_abixml): Write and read back the typeid
	file.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 12:24:26 +02:00
Dodji Seketeli
104468d1a4 Detect failed self comparison in type canonicalization of abixml
During the self comparison triggered by "abidw --abidiff <binary>",
some comparison errors can happen when canonicalizing types that are
"de-serialized" from the abixml that was serialized from the input
binary.

This patch adds some debugging checks and messaging to emit a message
when a type from the abixml appears to not "match" the original type
from the initial corpus it originated from.

This is the more detailed description:

Let's consider a type T coming from the corpus of the input binary.

That input corpus is serialized into abixml and de-serialized again
into a second corpus that we shall name the abixml corpus.  From that
second corpus, let's consider the type T' that is the result of
serializing T into abixml and de-serializing it again.  T is said to
be the original type of T'.  If T is a canonical type, then T' should
equal T.  Otherwise, if T is not a canonical type, its canonical type
should equal the canonical type of T'.

For the sake of simplicity, let's consider that T is a canonical
type.  During the canonicalization of T', T' should equal T.  Each and
every canonical type coming from the abixml corpus should be equal to its
original type from the binary corpus.

If a T' is different from its original type T, then there is an
"equality problem" between T and T'.  In other words, there is a
mismatch between T and T'.  We want to be notified of that problem so
that we can debug it further and fix it.

So this patch introduces the option "abidw --debug-abidiff <binary>"
to trigger the "debug self comparison mode".  At canonicalization
time, we detect that we are in that debug self comparison mode and
during canonicalization of types from the abixml corpus, it detects
when they compare different from their counterpart from the original
corpus.

This debugging capability can be enabled at configure time with a new
--enable-debug-self-comparison configure option.  That option defines
a new WITH_DEBUG_SELF_COMPARISON compile time macro that is used to
conditionally compile the implementation of this debugging feature.

So, one example of this might look like this:

    abidw  --debug-abidiff bin:
    error: problem detected with type 'typedef Vmalloc_t' from second corpus
    error: problem detected with type 'Vmalloc_t*' from second corpus
    [...]

So that means the "typedef Vmalloc_t" read from the abixml compares
different from its original type where it should not.

So armed with this new insight, I know I need to debug that comparison
in particular to see why it wrongly results in two different types.

	* doc/manuals/abidw.rst: Add documentation for the --debug-abidiff
	option.
	* include/abg-ir.h (environment::{set_self_comparison_debug_input,
	get_self_comparison_debug_inputs, self_comparison_debug_is_on}):
	Declare new methods.
	* configure.ac: Define a new --enable-debug-self-comparison option
	that is disabled by default.  That option defines a new
	WITH_DEBUG_SELF_COMPARISON preprocessor macro.
	* src/abg-ir.cc
	(environment::priv::{first_self_comparison_corpus_,
	second_self_comparison_corpus_, self_comparison_debug_on_}): New
	data members.  Also, re-indent the data members.
	(environment::{set_self_comparison_debug_input,
	get_self_comparison_debug_inputs, self_comparison_debug_is_on}):
	Define new method.
	(type_base::get_canonical_type_for): In the "debug self comparison
	mode", if a type coming from the second corpus compares different
	from its counterpart coming from the first corpus then log a debug
	message.
	* src/abg-dwarf-reader.cc (read_debug_info_into_corpus): When
	loading the first corpus, if the debug self comparison mode is on,
	then save that corpus on the side in the environment.
	* src/abg-reader.cc (read_corpus_from_input): When loading the
	second corpus, if the debug self comparison mode is on, then save
	that corpus on the side in the environment.
	* tools/abidw.cc: Include the config.h file for preprocessor
	macros defined at configure
	(options::debug_abidiff): New data member.
	(parse_command_line): Parse the --debug-abidiff option.
	(load_corpus_and_write_abixml): Switch the self debug mode on when
	the --debug-abidiff option is provided.  Use a read_context for
	the abixml loading.  That is going to be useful for subsequent
	patches.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 12:16:25 +02:00
Dodji Seketeli
6eee409137 Add primitives callable from the command line of the debugger
During debugging it can be extremely useful to be able to visualize
the data members of a class type, instance of
abigail::ir::class_decl*.  It's actually useful to visualize the
pretty representation (type name and kind) of all types and decls that
inherit abigail::ir::type_or_decl_base, basically.

Today, in the debugger, if we have a variable defined as
"abigail::ir::type_or_decl_base* t", we can type:

    $ p t->get_pretty_representation(true, true);

This would display something like:

    $ typedef foo_t

However, if 't' is declared as:
"abigail::ir::class_decl* t", then if we type:

   (gdb) p t->get_pretty_representation(true, true);

We'll get something like:

   class foo_klass
   (gdb)

So we get the kind and the name of the ABI artifact; but in case of a
class, we don't get the details of its data members.

This patch introduces a function named "debug" which, would be invoked
on the 't' above like this:

   (gdb) p debug(t)

I would yield:

    struct tm
    {  // size in bits: 448
       // translation unit: test24-drop-fns.cc
       // @: 0x5387a0, @canonical: 0x5387a0

      int tm_sec; // uses canonical type '@0x538270'
      int tm_min; // uses canonical type '@0x538270'
      int tm_hour; // uses canonical type '@0x538270'
      int tm_mday; // uses canonical type '@0x538270'
      int tm_mon; // uses canonical type '@0x538270'
      int tm_year; // uses canonical type '@0x538270'
      int tm_wday; // uses canonical type '@0x538270'
      int tm_yday; // uses canonical type '@0x538270'
      int tm_isdst; // uses canonical type '@0x538270'
      long int tm_gmtoff; // uses canonical type '@0x461200'
      const char* tm_zone; // uses canonical type '@0x544528'
    };
    (gdb)

This gives much more information to understand what 't' designates.

The patch also provides functions to retrieve one data member from a
given type that happens to designate a class type.  For instance:

    (gdb) p get_data_member(t, "tm_sec")

This would yield:

    $19 = std::shared_ptr<abigail::ir::var_decl> (use count 4, weak count 0) = {get() = 0x9d9a80}

We could visualize that data member by doing:

    (gdb) p debug(get_data_member(t, "tm_sec")._M_ptr)
    int tm::tm_sec
    (gdb)

The patch also provides a new 'debug_equals' function that allow us to
easily perform an artifact comparison from the command line of the
debugger, as well as methods to the environment type to poke at the
canonical types available in the environment.

These new debugging primitives already proved priceless while
debugging issues that are fixed by subsequent patches to come.

	* include/abg-fwd.h (get_debug_representation, get_data_member)
	(debug, debug_equals): Declare new functions.
	* include/abg-ir.h (environment{get_canonical_types,
	get_canonical_type}): Declare new member functions.
	* src/abg-ir.cc (environment::{get_canonical_types,
	get_canonical_type}): Define new member functions.
	(get_debug_representation, get_data_member)
	(debug, debug_equals): Define new functions.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 12:08:10 +02:00
Dodji Seketeli
e89bf5abe8 Peel array types when peeling pointers from a type
In peel_typedef_pointer_or_reference_type, we want to peel typedefs
and pointer types (in general) from a given type.  We need to peel
array types as well, as those are conceptually a pointer-like type as
well.

This patch does that.

	* src/abg-ir.cc (peel_typedef_pointer_or_reference_type): In the
	overloads for type_base_sptr and type_base*, peel array type off
	as well.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 11:58:02 +02:00
Dodji Seketeli
fa5ff32afb Fix DWARF type DIE canonicalization
While looking at something else, I noticed that the DWARF type DIE
canonicalization code wasn't taking the type of array elements into
account when comparing arrays.

This patch fixes that.

	* src/abg-dwarf-reader.cc (compare_dies): When comparing array
	type DIEs, take into account the type of the elements of the
	arrays.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 11:39:04 +02:00
Dodji Seketeli
073185e7ab Miscellaneous indentation and comments cleanups
While looking at something else, I did some indentation and comments cleanups.

	* src/abg-ir.cc (environment::priv::{config_, canonical_types_,
	sorted_canonical_types_, void_type_, variadic_marker_type_}):
	Re-indent these data members.
	(peel_typedef_pointer_or_reference_type): Fix comment.
	(var_decl::var_decl): Likewise.
	(function_decl::function_decl): Add a comment.
	* src/abg-reader.cc (handle_reference_type_def): Fix indentation
	of parameters.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 11:26:14 +02:00
Dodji Seketeli
26c41c060b Fix thinko in configure.ac
* configure.ac: Fix a thinko I spotted while looking at something
	else.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-25 10:38:16 +02:00
Dodji Seketeli
1656f9dd7b reader: Use xmlFirstElementChild/xmlNextElementSibling to iterate over children elements
Use xmlFirstElementChild/xmlNextElementSibling to iterate over element
children nodes rather than doing it by hand in the various for loops.

	* src/abg-reader.cc (walk_xml_node_to_map_type_ids)
	(read_translation_unit, read_translation_unit_from_input)
	(read_symbol_db_from_input, build_needed)
	(read_elf_needed_from_input, read_corpus_group_from_input)
	(build_namespace_decl, build_elf_symbol_db, build_function_decl)
	(build_function_type, build_array_type_def, build_enum_type_decl)
	(build_class_decl, build_union_decl, build_function_tdecl)
	(build_class_tdecl, build_type_composition)
	(build_template_tparameter): Use
	xmlFirstElementChild/xmlNextElementSibling rather than poking at
	xmlNode::children and looping over xmlNode::next by hand.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-03 17:15:25 +02:00
Dodji Seketeli
09c7a773a3 reader: Use xmlFirstElementChild and xmlNextElementSibling rather than xml::advance_to_next_sibling_element
The xml::advance_to_next_sibling_element is redundant with the
xmlNextElementSibling API of libxml.  Similarly, xmlFirstElementChild
is redundant with using xml::advance_to_next_sibling_element on the
xmlNode::children data member.  Let's use the libxml API instead.

	* include/abg-libxml-utils.h (advance_to_next_sibling_element):
	Remove the declaration of this function.
	* src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay)
	(advance_to_next_sibling_element): Remove definitions of these functions.
	* src/abg-reader.cc (read_translation_unit_from_input)
	(read_elf_needed_from_input, read_corpus_group_from_input): Use xmlNextElementSibling instead
	of xml::advance_to_next_sibling_element.
	(read_corpus_from_input): Likewise.  Also, use
	xmlFirstElementChild instead of
	xml::advance_to_next_sibling_element on the xmlNode::children data
	member.
	(read_corpus_group_from_input): use xmlFirstElementChild instead
	of xml::advance_to_next_sibling_element on the xmlNode::children
	data member.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-03 17:15:22 +02:00
Dodji Seketeli
dd55550355 reader: Handle 'abi-corpus' element being possibly empty
This problem was reported at https://sourceware.org/bugzilla/show_bug.cgi?id=27616.

The abixml reader wrongly assumes that the 'abi-corpus' element is
always non-empty.  Note that until now, the only emitter of abixml
consumed in practice was abg-writer.cc and it only emits non-empty
'abi-corpus' elements.  So the issue wasn't exposed.

So, the reader assumes that an 'abi-corpus' element has at least a
text node.

For instance, consider this minimal input file named test-v0.abi:

    $cat test-v0.abi

    <abi-corpus-group architecture='elf-arm-aarch64'>
     <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'>
     </abi-corpus>
    </abi-corpus-group>

    $

Now, compare it to this file where the abi-corpus element is an empty
element (doesn't even contain any text):

    $cat test-v0.abi

    <abi-corpus-group architecture='elf-arm-aarch64'>
     <abi-corpus path='vmlinux'/>
    </abi-corpus-group>

    $

comparing the two files with abidiff (wrongly) reports:

    $ abidiff test-v0.abi test-v1.abi
    ELF architecture changed
    Functions changes summary: 0 Removed, 0 Changed, 0 Added function
    Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

    architecture changed from 'elf-arm-aarch64' to ''
    $

What's happening is that read_corpus_from_input is getting out early
when it sees that the node is empty.  This is at:

   xmlNodePtr node = ctxt.get_corpus_node();
@@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt)
 	corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
     }

  if (!node->children)  // <---- we get out early here and we
    return nil;         // forget about the properties of
                        // the current empty corpus element node

So, at its core, fixing the issue at hand involves avoiding the early
return there.

But then, it turns out that's not enough.

In the current setting, the different abixml processing entry points
are designed to be used in a semi "streaming" mode.

So for instance, read_translation_unit_from_input can be invoked
repeatedly to "stream in" the next translation unit at each
invocation.

Alternatively, the lower level xmlTextReaderNext can be used to
iterate over XML node until we reach the translation unit XML element
we are interested in.  At that point xmlTextReaderExpand can be used
to expand the XML node, then we let the context know that this is
the current node of the corpus that needs to be processed, using
read_context::get_corpus_node.  Once we've done that,
read_translation_unit_from_input can be called to process that
particular corpus node.  Note that the corpus node at hand, that needs
to be processed will be retrieved by read_context::get_corpus_node.

These two modes of operation are also available for
read_corpus_from_input, read_symbol_db_from_input,
read_elf_needed_from_input etc.

Today, these functions all assume that the current node returned by
read_context::get_corpus_node is the node /before/ the node of the
corpus to be processed.  So they all start looking at the /next sibling/
of the node returned by read_context::get_corpus_node.  So the code
was implicitly assuming that read_context::get_corpus_node was
pointing to a text node that was before the node of the corpus that we
want to process.

This is wrong.  read_context::get_corpus_node should just return the
current node of the corpus that needs to be processed and voila.

And so read_context::set_corpus_node should be used to set the current
node of the corpus to the current element node that needs to be processed.

That's the spirit of the change done by this patch.

As its name suggests, the existing
xml::advance_to_next_sibling_element is used to skip non element xml
nodes (including text nodes) and move to the next element node to
process, which is set to the context using
read_context::set_corpus_node.

Then the actual processing functions like read_corpus_from_input get
the node to process, using read_context::get_corpus_node and process
it rather than processing the sibling node that comes after it.

The other changes are either to prevent related crashes that I noticed
while doing various tests, update the abilint tool used to read and
debug abixml input files and add better documentation.

	* src/abg-reader.cc (read_context::get_corpus_node): Add comment
	to this member function.
	(read_translation_unit_from_input, read_symbol_db_from_input)
	(read_elf_needed_from_input): Start processing the current node of
	the corpus that needs to be processed rather than its next
	sibling.  Once the processing is done, set the new "current node
	of the corpus to be processed" properly by skipping to the next
	element node to be processed.
	(read_corpus_from_input): Don't get out early when the
	'abi-corpus' element is empty.  If, however, it has children node,
	skip to the first child element and flag it -- using
	read_context::set_corpus_node -- as being the element node to be
	processed by the processing facilities of the reader.  If we are
	in a mode where we called xmlTextReaderExpand ourselves to get the
	node to process, then it means we need to free that node
	indirectly by calling xmlTextReaderNext.  In that case, that node
	should not be flagged by read_context::set_corpus_node.  Add more
	comments.
	* src/abg-corpus.cc (corpus::is_empty): Do not crash when no
	symtab is around.
	* src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay):
	Fix typo in comment.
	(advance_to_next_sibling_element): Don't crash when given a nil
	node.
	* tests/data/test-abidiff/test-PR27616-squished-v0.abi: Add new
	test input.
	* tests/data/test-abidiff/test-PR27616-squished-v1.abi: Likewise.
	* tests/data/test-abidiff/test-PR27616-v0.xml: Likewise.
	* tests/data/test-abidiff/test-PR27616-v1.xml: Likewise.
	* tests/data/Makefile.am: Add the new test inputs above to source
	distribution.
	* tests/test-abidiff.cc (specs): Add the new tests inputs above to
	this harness.
	* tools/abilint.cc (main): Support writing corpus groups.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-05-03 17:13:31 +02:00
Dodji Seketeli
b215a21153 dwarf-reader: properly set artificial-ness in opaque types
get_opaque_version_of_type forgets to set the "is-artificial" property
according to the initial type the opaque type is derived from.  This
can lead to some instability in the abixml output.

Fixed thus.

	* src/abg-dwarf-reader.cc (get_opaque_version_of_type): Propagate
	the artificial-ness of the original type here.
	* tests/data/test-read-dwarf/PR27700/test-PR27700.abi: Adjust.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2021-04-13 16:28:27 +02:00