mirror of
git://sourceware.org/git/libabigail.git
synced 2024-12-18 07:54:36 +00:00
912eb7e36b
Recursive type hashing was showing up as the major hot spot of performance profiles. After spending a few days on trying to speed it up, I have officially declared recursive tree node hashing as a slow process and I am giving up. I have thus decided to not use that at type canonicalization time. Rather, I am proposing a new type canonicalization routine where types are first hashed by hashing their pretty representation string. Basically, if T is the total number of types in the system and C the number of classes of equivalences (or the number of canonical types), the number of type comparisons done by a naive type canonicalization routine is N x C. With the worse C being equal to N itself, that worse number of comparisons is N*N. By using a hash table to store the canonical types, keyed by a hash of their pretty representation string, the number of type comparisons can be brought down to N*P, where P is a the greater number of which pretty representation string hash collide. That number P is usually small; my measurements show that N usually goes from 1 to 3. And moreover, computing the hash of the pretty representation string of the function is way faster than using the recursive type hash! As a result, running abidw on the libcilkrts.so library, from GCC goes from 12 minutes to 0.4 seconds! Incidentally, now that we are not trying to speed up the recursive type hashing process, all the complicated business we had around caching the result of the hashing is gone! I was thinking that hash cashing was inherently a bad idea, especially for recursive types -- that refer to themselves directly or indirectly, because in those case, depending on when you cached the hash value, the value of the hashing can be different. The abixml writer's code doesn't use the recursive type hash anymore either; it uses the pointer value of the canonical type as hash. Super fast too! The patch had to fix pieces here and there to comply with the fact that canonical types are now used across the board in a mandatory fashion. * include/abg-ir.h (canonical_types_map_type): Adjust this typedef to make it point to an unordered_map which the key is now a string and the value is a vector of types. (type_or_decl_base::{get_cached_hash_value, set_cached_hash_value, cached_hash}): Remove these member functions and type. (struct type_base::cached_hash): Remove. * src/abg-ir.cc (struct type_or_decl_base::priv::hash_): Remove. (type_or_decl_base::priv::priv): Adjust. (type_or_decl_base::{g,s}et_cached_hash_value): Remove. (type_base::get_canonical_type_for): For declaration-only classes, look at their definition for the canonical_type. Do not use recursive type hashing anymore. Rather, use the pretty representation string, and hash that. (class_decl::base_spec::get_hash): Do away with hash value caching here. (class_decl::operator==): For decl-only classes, look at their definitions for canonical types. (hash_type_or_decl): Adjust comment. Use the canonical type pointer value for type hash. That's the fast path. Otherwise, if not available, fall back to a slow path which is the recursive type hash we were using before. * src/abg-dwarf-reader.cc (maybe_canonicalize_type): Schedule all classes and typedef to classes for late canonicalization. * src/abg-hash.cc (type_base::dynamic_hash::operator()): There is no hash value cashing anymore. (type_base::cached_hash::operator()): Remove. * src/abg-reader.cc (read_context::get_type): Slight style adjustment. (read_translation_unit_from_file) (read_translation_unit_from_buffer): Do not forget to canonicalize types when reading just one translation unit. (build_type_tparameter, build_template_tparameter): Canonicalize the type. * src/abg-writer.cc (struct type_hasher): New hasher type. (type_ptr_map): Use a deep pointer comparison equal operator functor, and canonical types as type hash values. (write_class_decl): Do not write size and alignment on decl-only classes. Do not record decl-only classes as being emitted. Their definition must be emitted before. * tests/test-read-write.cc (main): Do not do abi testing on translation units (as opposed to doing it on abi corpora) as that code is not wet yet. We need to know how to diff namespaces. * tests/data/test-abidiff/test-PR18791-report0.txt: Adjust. * tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise. * tests/data/test-read-dwarf/test10-pr18818-gcc.so.abi: Likewise. * tests/data/test-read-dwarf/test12-pr18844.so.abi: Likewise. * tests/data/test-read-dwarf/test13-pr18894.so.abi: Likewise. * tests/data/test-read-dwarf/test14-pr18893.so.abi: Likewise. * tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise. * tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise. Signed-off-by: Dodji Seketeli <dodji@redhat.com> |
||
---|---|---|
.. | ||
data | ||
Makefile.am | ||
print-diff-tree.cc | ||
runtestcanonicalizetypes.sh.in | ||
test-abicompat.cc | ||
test-abidiff.cc | ||
test-alt-dwarf-file.cc | ||
test-core-diff.cc | ||
test-diff2.cc | ||
test-diff-dwarf.cc | ||
test-diff-filter.cc | ||
test-diff-pkg.cc | ||
test-diff-suppr.cc | ||
test-dot.cc | ||
test-ir-walker.cc | ||
test-lookup-syms.cc | ||
test-read-dwarf.cc | ||
test-read-write.cc | ||
test-svg.cc | ||
test-utils.cc | ||
test-utils.h | ||
test-write-read-archive.cc |