libabigail/include/abg-interned-str.h
Dodji Seketeli cf8eba68c3 Implement string interning for Libabigail
This patch implements string interning optimization.  One can read
about the principles of this optimization at
https://en.wikipedia.org/wiki/String_interning.

The patch introduces an abigail::interned_string type, as well as an
abigail::interned_string_pool type.  Each environment type owns a
string pool and strings are interned in that pool for all types and
decls of that environments.  The interned_string has methods to
interact seemingly with std::string including a hashing function.  Of
course hashing and comparing interned_string is faster than for
std::string.

To enable ABI artifacts to intern strings, each constructor of ABI
artifacts now takes the environment it's constructed in as parameter.
From the environment, it can thus use the interned string pool.

The patch then changes declaration names to be of type
interned_string, and performs the necessary adjustments.  The hash
maps that hash strings coming from those declaration names are
adjusted to hash interned_string.

	* include/Makefile.am: Add the new abg-interned-str.h file to
	source distribution.
	* include/abg-corpus.h (corpus::corpus): Re-arrange the order of
	* src/abg-corpus.cc
	(corpus::exported_decls_builder::priv::get_id): Return
	interned_string rather than std::string.
	(corpus::corpus): Re-arrange the order of parameters: take an
	environment as first parameter.  parameters: take an environment
	as first parameter.
	* include/abg-dwarf-reader.h (lookup_symbol_from_elf)
	(lookup_public_function_symbol_from_elf): Likewise.
	* src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
	(lookup_symbol_from_gnu_hash_tab)
	(lookup_symbol_from_elf_hash_tab, lookup_symbol_from_symtab)
	(lookup_symbol_from_elf, lookup_public_function_symbol_from_elf)
	(lookup_public_variable_symbol_from_elf, lookup_symbol_from_elf)
	(lookup_public_function_symbol_from_elf): Take an environment as
	first parameter and adjust.
	(build_translation_unit_and_add_to_ir)
	(build_namespace_decl_and_add_to_ir, build_type_decl)
	(build_enum_type, finish_member_function_reading)
	(build_class_type_and_add_to_ir, build_function_type)
	(read_debug_info_into_corpus, read_corpus_from_elf): Adjust.
	* include/abg-fwd.h: Include abg-interned-str.h
	(get_type_name, get_function_type_name, get_method_type_name):
	Return a interned_string, rather than a std::string.
	* include/abg-interned-str.h: New declarations for interned strings
	and their pool.
	* include/abg-ir.h (environment::intern): Declare new method.
	(elf_symbol::{g,s}et_environment): Likewise.
	(type_or_decl_base::type_or_decl_base): Make the default
	constructor private.
	({translation, type_or_decl_base}::set_environment)
	(set_environment_for_artifact): Take a const environment*.
	(elf_symbol::elf_symbol)
	(elf_symbol::create)
	(type_or_decl_base::type_or_decl_base)
	(translation::translation, decl_base::decl_base)
	(scope_decl::scope_decl, type_base::type_base)
	(type_decl::type_decl, scope_type_decl::scope_type_decl)
	(namespace_decl::namespace_decl)
	(enum_type_decl::enumerator::enumerator)
	(function_type::function_type, method_type::method_type)
	(template_decl::template_decl, function_tdecl::function_tdecl)
	(class_tdecl::class_tdecl, class_decl::class_decl): Take an
	environment.
	(type_or_decl_base::operator=)
	(enum_type_decl::enumerator::get_environment): Declare new method.
	(decl_base::{peek_qualified_name, peek_temporary_qualified_name,
	get_qualified_name, get_name, get_qualified_parent_name,
	get_linkage_name}, qualified_type_def::get_qualified_name)
	(reference_type_def::get_qualified_name)
	(array_type_def::get_qualified_name)
	(enum_type_decl::enumerator::{get_name, get_qualified_name})
	({var,function}_decl::get_id)
	(function_decl::parameter::{get_type_name, get_name_id}): Return
	an interned_string, rather than a std::string.
	(decl_base::{set_qualified_name, set_temporary_qualified_name,
	get_qualified_name, set_linkage_name})
	(qualified_type_def::get_qualified_name)
	(reference_type_def::get_qualified_name)
	(array_type_def::get_qualified_name)
	(function_decl::parameter::get_qualified_name): Take an
	interned_string, rather than a std::string.
	(class_decl::member_{class,function}_template::member_{class,function}_template):
	Adjust.
	* src/abg-ir.cc (environment_setter::env_): Make this be a pointer
	to const environment.
	(environment_setter::visit_begin): Adjust.
	(interned_string_pool::priv): Define new type.
	(interned_string_pool::*): Define the method declared in
	abg-interned-str. h.
	(operator==, operator!=, operator+): Define operator for interned_string and
	std::string
	(operator<<): Define for interned_string.
	(translation_unit::priv::env_): Make this be a pointer to const
	environment.
	(translation_unit::priv::priv): Take a pointer to const
	environment.
	(elf_symbol::priv::env_): New data member.
	(elf_symbol::priv::priv): Adjust.  Make an overoad take an
	environment.
	(translation_unit::{g,s}et_environment): Adjust.
	(interned_string_bool_map_type): New typedef.
	(environment::priv::classes_being_compared_): Make this hastable
	of string be a hashtable of interned_string.
	(environment::priv::string_pool_): New data member.
	(environment::{get_void_type_decl,
	get_variadic_parameter_type_decl}): Adjust.
	(type_or_decl_base::priv::env_): Make this be a pointer to const
	environment.
	(type_or_decl::base::priv::priv): Adjust.
	(type_or_decl_base::set_environment)
	(set_environment_for_artifact): Take a pointer to const
	environment.
	(elf_symbol::{g,s}et_environment, environment::intern)
	(type_or_decl_base::operator=): Define new methods.
	(decl_base::priv::{name_, qualified_parent_name_,
	temporary_qualified_name_, qualified_name_, linkage_name_}): Make
	these data member be of tpe interned_string.
	(decl_base::priv::priv): Make this take an environment. Adjust.
	(decl_base::{peek_qualified_name, peek_temporary_qualified_name,
	get_linkage_name, get_qualified_parent_name, get_name,
	get_qualified_name}, get_type_name, get_function_type_name)
	(get_method_type_name, get_node_name)
	(qualified_type_def::get_qualified_name)
	(pointer_type_def::get_qualified_name)
	(array_type_def::get_qualified_name)
	(enum_type_decl::enumerator::get_qualified_name)
	(var_decl::get_id, function_decl::get_id)
	(function_decl::parameter::get_{name_id, type_name}): Return an
	interned_string.
	(decl_base::{set_qualified_name, set_temporary_qualified_name})
	(qualified_type_def::get_qualified_name)
	(pointer_type_def::get_qualified_name)
	(reference_type_def::get_qualified_name)
	(array_type_def::get_qualified_name)
	(function_decl::parameter::get_qualified_name): Take an
	interned_string.
	(decl_base::{set_name, set_linkage_name}): Intern the std::string
	passed in parameter.
	(equals): In the overload for decl_base, adjust for a little speed
	optimization that is justified by profiling.
	(pointer_type_def::priv::{internal_qualified_name_,
	temp_internal_qualified_name_}): Make these data member be
	interned_string.
	(enum_type_decl::enumerator::priv::env_): New data member.
	(enum_type_decl::enumerator::priv::{name_, qualified_name}): Make
	these data member be of type interned_string.
	(enum_type_decl::enumerator::get_environment): New method.
	(enum_type_decl::enumerator::priv::priv) Adjust.
	(typedef_decl::operator==): Implement a little speed optimization.
	(var_decl::priv::nake_type_): New data member.
	(var_decl::priv::id_): Make this data member be of type
	interned_string.
	(equals): In the overload for var_decl, function_type,
	function_decl, adjust for the use of interned_string.
	(function_decl::priv::id_): Make this be of type interned_string.
	(scope_decl::{add_member_decl, insert_member_decl})
	(lookup_function_type_in_translation_unit)
	(synthesize_type_from_translation_unit, lookup_node_in_scope)
	(lookup_type_in_scope, scope_decl::scope_decl)
	(qualified_type_def::qualified_type_def)
	(qualified_type_def::get_qualified_name)
	(pointer_type_def::pointer_type_def)
	(reference_type_def::reference_type_def)
	(array_type_def::array_type_def, array_type_def::append_subrange)
	(array_type_def::get_qualified_name)
	(enum_type_decl::enum_type_decl)
	(enum_type_decl::enumerator::get_qualified_name)
	(enum_type_decl::enumerator::set_name)
	(typedef_decl::typedef_decl, var_decl::var_decl)
	(function_type::function_type, method_type::method_type)
	(function_decl::function_decl)
	(function_decl::parameter::parameter)
	(class_decl::priv::comparison_started)
	(class_decl::add_base_specifier)
	(class_decl::base_spec::base_spec)
	(class_decl::method_decl::method_decl)
	(type_tparameter::type_tparameter)
	(non_type_tparameter::non_type_tparameter)
	(template_tparameter::template_tparameter)
	(type_composition::type_composition)
	(function_tdecl::function_tdecl, class_tdecl::class_tdecl)
	(qualified_name_setter::do_update): Adjust.
	(translation_unit::translation_unit, elf_symbol::elf_symbol)
	(elf_symbol::create, type_or_decl_base::type_or_decl_base)
	(decl_base::decl_base, type_base::type_base)
	(type_decl::type_decl, scope_type_decl::scope_type_decl)
	(namespace_decl::namespace_decl)
	(enum_type_decl::enumerator::enumerator, class_decl::class_decl)
	(template_decl::template_decl, function_tdecl::function_tdecl)
	(class_tdecl::class_tdecl): Take an environment.
	* src/abg-comparison.cc
	(function_suppression::suppresses_function): Adjust.
	* src/abg-reader.cc (read_translation_unit)
	(read_corpus_from_input, build_namespace_decl, build_elf_symbol)
	(build_function_parameter, build_function_decl, build_type_decl)
	(build_function_type, build_enum_type_decl, build_enum_type_decl)
	(build_class_decl, build_function_tdecl, build_class_tdecl)
	(read_corpus_from_native_xml): Likewise.
	* src/abg-writer.cc (id_manager::m_cur_id): Make this mutable.
	(id_manager::m_env): New data member.
	(id_manager::id_manager): Adjust.
	(id_manager::get_environment): New method.
	(id_manager::{get_id, get_id_with_prefix}): Return an
	interned_string.
	(type_ptr_map): Make this be a hash map of type_base* ->
	interned_string, rather a type_base* -> string.
	(write_context::m_env): New data member.
	(write_context::m_type_id_map): Make this data member be mutable.
	(write_context::m_emitted_type_id_map): Make this be a hash map of
	interned_string -> bool, rather than string -> bool.
	(write_context::write_context): Take an environment and adjust.
	(write_context::get_environment): New method.
	(write_context::get_id_manager): New const overload.
	(write_context::get_id_for_type): Return an interned_string; adjust.
	(write_context::{record_type_id_as_emitted,
	record_type_as_referenced}): Adjust.
	(write_context::type_id_is_emitted): Take an interned_string.
	(write_context::{type_is_emitted,
	record_decl_only_type_as_emitted}): Adjust.
	(write_translation_unit, write_corpus_to_native_xml, dump):
	Adjust.
	* tools/abisym.cc (main): Adjust.
	* tests/data/test-read-write/test22.xml: Adjust.
	* tests/data/test-read-write/test23.xml: Adjust.
	* tests/data/test-read-write/test26.xml: Adjust.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2016-02-24 15:13:20 +01:00

248 lines
6.6 KiB
C++

// -*- Mode: C++ -*-
//
// Copyright (C) 2016 Red Hat, Inc.
//
// This file is part of the GNU Application Binary Interface Generic
// Analysis and Instrumentation Library (libabigail). This library is
// free software; you can redistribute it and/or modify it under the
// terms of the GNU Lesser General Public License as published by the
// Free Software Foundation; either version 3, or (at your option) any
// later version.
// This library is distributed in the hope that it will be useful, but
// WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// General Lesser Public License for more details.
// You should have received a copy of the GNU Lesser General Public
// License along with this program; see the file COPYING-LGPLV3. If
// not, see <http://www.gnu.org/licenses/>.
//
// Author: Dodji Seketeli
/// @file
///
/// Declaration of types pertaining to the interned string pool used
/// throughout Libabigail, for performance reasons.
///
/// For the record, the concept of the String Interning method is
/// explained at https://en.wikipedia.org/wiki/String_interning.
#ifndef __ABG_INTERNED_STR_H__
#define __ABG_INTERNED_STR_H__
#include <tr1/memory>
#include <tr1/functional>
#include <ostream>
#include <string>
using std::tr1::shared_ptr;
namespace abigail
{
/// The abstraction of an interned string.
///
/// It's a wrapper around a pointer to a std::string, along with a set
/// of method that helps make this string integrate with std::string
/// seamlessly. For instance, the type provides equality operators
/// that help compare it against std::string.
///
/// Note that this @ref interned_string type is design to have the
/// same size as a pointer to a string.
class interned_string
{
std::string* raw_;
/// Constructor.
///
/// @param raw the pointer to string that this interned_string
/// wraps.
interned_string(std::string* raw)
: raw_(raw)
{}
public:
/// Default constructor.
///
/// Constructs an empty pointer to string.
interned_string()
: raw_()
{}
/// Copy constructor.
///
/// @param o the other instance to copy from.
interned_string(const interned_string& o)
{raw_ = o.raw_;}
/// Test if the current instance of @ref interned_string is empty.
///
/// @return true iff the currentisntance of @ref interned_string is
/// empty.
bool
empty() const
{return !raw_;}
/// Return the underlying pointer to std::string that this
/// interned_string wraps.
///
/// @return a pointer to the underlying std::string, or 0 if this
/// interned_string is empty.
const std::string*
raw() const
{return raw_;}
/// Compare the current instance of @ref interned_string against
/// another instance of @ref interned_string.
///
/// Note that this comparison is done in O(1), because it compares
/// the pointer values of the two underlying pointers to std::string
/// held by each instances of @ref interned_string.
///
/// @param o the other @ref interned_string to compare against.
///
/// @return true iff the current instance equals @p o.
bool
operator==(const interned_string& o) const
{return raw_ == o.raw_;}
/// Inequality operator.
///
/// @param o the other @ref interned_string to compare the current
/// instance against.
///
/// @return true iff the current instance is different from the @p
/// o.
bool
operator!=(const interned_string& o) const
{return !operator==(o);}
/// Compare the current instance of @ref interned_string against
/// an instance of std::string.
///
/// Note that this comparison is done in O(N), N being the size (in
/// number of characters) of the strings being compared.
///
/// @param o the instance of std::string to compare against.
///
/// @return true iff the current instance equals @p o.
bool
operator==(const std::string& o) const
{
if (raw_)
return *raw_ == o;
return o.empty();
}
/// Inequality operator.
///
/// Takes the current instance of @ref interned_string and an
/// instance of std::string.
///
/// @param o the instance of std::string to compare the current
/// instance of @ref interned_string against.
///
/// @return true if the current instance of @ref interned_string is
/// different from @p o.
bool
operator!=(const std::string& o) const
{return ! operator==(o);}
/// "Less than" operator.
///
/// Lexicographically compares the current instance of @ref
/// interned_string against another instance.
///
/// @param o the other instance of @ref interned_string to compare
/// against.
///
/// @return true iff the current instance of interned_string is
/// lexicographycally less than the string @p o.
bool
operator<(const interned_string& o) const
{return static_cast<std::string>(*this) < static_cast<std::string>(o);}
/// Conversion operator to string.
///
/// @return the underlying string this instance refers too.
operator std::string() const
{
if (!raw_)
return "";
return *raw_;
}
friend class interned_string_pool;
}; // end class interned_string
bool
operator==(const std::string& l, const interned_string& r);
bool
operator!=(const std::string& l, const interned_string& r);
std::ostream&
operator<<(std::ostream& o, const interned_string& s);
std::string
operator+(const interned_string& s1,const std::string& s2);
std::string
operator+(const std::string& s1, const interned_string& s2);
/// A functor to hash instances of @ref interned_string.
struct hash_interned_string
{
/// The hash operator.
///
/// It's super fast because hashing an interned string amounts to
/// hashing the pointer to it's underlying string. It's because
/// every distinct string is present only in one copy in the
/// environment.
///
/// @param s the instance of @ref interned_string to hash.
///
/// @return the returned hash value.
size_t
operator()(const interned_string& s) const
{
std::tr1::hash<size_t> hash_size_t;
return hash_size_t(reinterpret_cast<size_t>(s.raw()));
}
}; // end struct hash_interned_string
/// The interned string pool.
///
/// This is where all the distinct strings represented by the interned
/// strings leave. The pool is the actor responsible for creating
/// interned strings.
class interned_string_pool
{
struct priv;
typedef shared_ptr<priv> priv_sptr;
priv_sptr priv_;
public:
interned_string_pool();
interned_string
create_string(const std::string&);
bool
has_string(const char* s) const;
const char*
get_string(const char* s) const;
~interned_string_pool();
}; // end class interned_string_pool
} // end namespace abigail
#endif // __ABG_INTERNED_STR_H__