Bug 19658 - Type canonicalization slow for the 2nd binary loaded

When loading two binaries (e.g, when the library is used by abidiff),
and when the second one does have deep types (e.g, classes with
recursively deep hierarchies) with lots of duplicated types in lots of
translation units, canonicalizing the types of the second binaries can
take a *lot* of time, given the quadratic nature of the structural
type comparisons that take place and the cheer number of those type
comparisons (because of the duplication).

There is already an optimization based on the One Definition Rule in
the canonicalization code.  That optimization avoids structural
comparison of types of the same corpus which have the same name.  But
then, this optimization only works on types of the first corpus.

As soon as we are loading a second corpus, all types being
canonicalized are coming from a corpus that is different from the
first corpus, by definition.  So a structural comparison is taking
place for *all* those types.

The patch extends the existing optimization to make it work on the
second corpus being loaded.  Once a type from the second corpus is
canonicalized, the canonical type is cached inside the corpus.  Then,
later, when a type with the same name has to be canonicalized, the
system looks inside the cache of that corpus to see if there is a
canonicalized type the same name.

I tested the patch on this command:

    abipkgdiff --d1 nss-debuginfo-3.19.1-8.el6_7.i686.rpm \
               --d2 nss-debuginfo-3.21.0-0.1.el6_7.i686.rpm \
               nss-3.19.1-8.el6_7.i686.rpm \
               nss-3.21.0-0.1.el6_7.i686.rpm

I whitnessed a x10 speedup, at least.

On binaries that don't have a lot of duplicated deep types, the patch
doesn't have any noticeable effect.  At lesat It doesn't slow things
down in that case.

	* include/abg-corpus.h (corpus::{record_canonical_type,
	lookup_canonical_type}): Declare new member functions.
	* src/abg-corpus.cc (corpus::priv::canonical_types_): New data
	member.
	(corpus::{record_canonical_type, lookup_canonical_type}): Define
	new member functions.
	* src/abg-ir.cc (type_base::get_canonical_type_for): Cache the
	canonical type inside the corpus of the type being canonicalized.
	Then later when canonicalizing another type, lookup in the cache
	inside its corpus to see if there is a type with the same name.
	* tests/data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64--tbb-4.3-3.20141204.fc23.x86_64-report-0.txt:
	Adjust.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
This commit is contained in:
Dodji Seketeli 2016-02-18 15:40:09 +01:00
parent 9cef1838c9
commit e62901963f
4 changed files with 82 additions and 8 deletions

View File

@ -76,6 +76,9 @@ private:
corpus();
void record_canonical_type(const type_base_sptr&) const;
type_base_sptr lookup_canonical_type(const string& qualified_name) const;
public:
corpus(const string&, ir::environment*);
@ -264,6 +267,8 @@ public:
exported_decls_builder_sptr
get_exported_decls_builder() const;
friend class type_base;
};// end class corpus.
/// Abstracts the building of the set of exported variables and

View File

@ -850,6 +850,7 @@ corpus::exported_decls_builder::maybe_add_var_to_exported_vars(var_decl* var)
struct corpus::priv
{
mutable unordered_map<string, type_base_sptr> canonical_types_;
environment* env;
corpus::exported_decls_builder_sptr exported_decls_builder;
origin origin_;
@ -1130,6 +1131,46 @@ corpus::priv::build_unreferenced_symbols_tables()
}
}
/// Record a canonical type which has been computed for the current
/// corpus.
///
/// This function associates the name of the canonical type to the
/// canonical type, so that further lookup of this canonical using
/// corpus::lookup_canonical_type succeeds in a faster manner.
///
/// @param t a canonical type for a type present in the current
/// corpus. This type can later be retrieved from the current corpus
/// using corpus::lookup_canonical_type().
void
corpus::record_canonical_type(const type_base_sptr& t) const
{
string n = get_pretty_representation(t, /*internal=*/true);
priv_->canonical_types_[n] = t;
}
/// Retrieve a canonical type present in the current corpus, from its
/// name.
///
/// Note that the type must have been cached inside the corpus by an
/// invocation to corpus::record_canonical_type().
///
/// @param qualified_name the name of the canonical type to retrieve.
/// It must have been computed using the function
/// get_pretty_representation(type_base_sptr, true).
///
/// @return the canonical type which name is @p qualified_name and
/// which has been previously cached inside the corpus by an
/// invocation to corpus::record_canonical_type().
type_base_sptr
corpus::lookup_canonical_type(const string& qualified_name) const
{
unordered_map<string, type_base_sptr>::const_iterator i =
priv_->canonical_types_.find(qualified_name);
if (i == priv_->canonical_types_.end())
return type_base_sptr();
return i->second;
}
/// Constructor of the @ref corpus type.
///
/// @param path the path to the file containing the ABI corpus.

View File

@ -35,6 +35,7 @@
#include <tr1/unordered_map>
#include "abg-sptr-utils.h"
#include "abg-ir.h"
#include "abg-corpus.h"
namespace
{
@ -6469,6 +6470,13 @@ type_base::get_canonical_type_for(type_base_sptr t)
// composite types which would have "class Foo" as a sub-type.
string repr = ir::get_pretty_representation(t, /*internal=*/true);
// This is the corpus of the type we want to canonicalize.
const corpus* t_corpus = t->get_corpus();
// If 't' already has a canonical type 'inside' its corpus
// (t_corpus), then this variable is going to contain that canonical
// type.
type_base_sptr canonical_type_present_in_corpus;
environment::canonical_types_map_type& types =
env->get_canonical_types_map();
@ -6483,7 +6491,6 @@ type_base::get_canonical_type_for(type_base_sptr t)
}
else
{
const corpus* t_corpus = t->get_corpus();
vector<type_base_sptr> &v = i->second;
// Let's compare 't' structurally (i.e, compare its sub-types
// recursively) against the canonical types of the system. If it
@ -6512,10 +6519,10 @@ type_base::get_canonical_type_for(type_base_sptr t)
// canonical types are present in this vector (e.g, when
// comparing two ABI corpora), the canonical types of the
// second corpus are going to be near the end the vector.
// As this function is likely to called during the loading
// of the ABI corpus, looking from the end of the vector
// maximizes the changes of triggering the optimization,
// even when we are reading the second corpus.
// As this function is likely to be called during the
// loading of the ABI corpus, looking from the end of the
// vector maximizes the changes of triggering the
// optimization, even when we are reading the second corpus.
if (t_corpus
// We are not doing the optimizatin for anymous types
// because, well, two anonymous type have the same name
@ -6539,7 +6546,16 @@ type_base::get_canonical_type_for(type_base_sptr t)
{
if (const corpus* it_corpus = (*it)->get_corpus())
{
if (it_corpus == t_corpus
// This is true if the canonical type candidate and
// the type being canonicalized are both from the
// same corpus.
bool same_corpus = (it_corpus == t_corpus);
if (!same_corpus)
// Maybe a canonical type for 't' has already been
// computed from this corpus?
canonical_type_present_in_corpus =
t_corpus->lookup_canonical_type(repr);
if ((same_corpus || canonical_type_present_in_corpus)
// Let's add one more size constraint to rule
// out programs that break the One Definition
// Rule too easily.
@ -6551,7 +6567,9 @@ type_base::get_canonical_type_for(type_base_sptr t)
// be equal. Using that rule would saves us
// from a potentially expensive type comparison
// here.
result = *it;
result = same_corpus
? *it
: canonical_type_present_in_corpus;
break;
}
}
@ -6568,6 +6586,16 @@ type_base::get_canonical_type_for(type_base_sptr t)
result = t;
}
}
if (result && t_corpus && !canonical_type_present_in_corpus)
// So we have found a canonical type for 't'. Let's cache that
// canonical type inside the corpus of t. So that next time we
// come across a type of that corpus which has the same name as
// this canonical type, we know that both types ought to be the
// same (per the One Definition Rule), potentially saving us from
// expensive structural type comparisons.
t_corpus->record_canonical_type(result);
return result;
}

View File

@ -1,5 +1,5 @@
================ changes of 'libtbb.so.2'===============
Functions changes summary: 0 Removed, 8 Changed (17 filtered out), 17 Added functions
Functions changes summary: 0 Removed, 8 Changed (16 filtered out), 17 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info
Variable symbols changes summary: 3 Removed, 0 Added variable symbols not referenced by debug info