The Git repository of the Libabigail Project
Go to file
Dodji Seketeli 7b35e89315 Bug 19139 - DWARF reader doesn't handle garbage in function names
In this bug, the DWARF debug info of the binary (generated by Intel's
ICC compiler) has interesting constructs like:

     [ 6b5a0]    subprogram
		 decl_line            (data2) 787
		 decl_column          (data1) 15
		 decl_file            (data1) 46
		 declaration          (flag)
		 accessibility        (data1) public (1)
		 type                 (ref4) [ 6b56a]
		 prototyped           (flag)
		 name                 (string) "ldiv"
		 MIPS_linkage_name    (string) "ldiv"
     [ 6b5b6]      formal_parameter
		   type                 (ref4) [ 5f2aa]
		   name                 (string) "$Ë2"
     [ 6b5bf]      formal_parameter
		   type                 (ref4) [ 5f2aa]
		   name                 (string) "$Ë3"

Note the strings that make up the name of the formal parameters of the
function, near the end:

     [ 6b5b6]      formal_parameter
		   type                 (ref4) [ 5f2aa]
		   name                 (string) "$Ë2"
     [ 6b5bf]      formal_parameter
		   type                 (ref4) [ 5f2aa]
		   name                 (string) "$Ë3"

The strings "$Ë2" and $Ë3" (which are the names of the
parameters of the function) are garbage.

Libabigail's DWARF reader naively uses those strings as names for the
function parameters, in the type of the function.

Then, the abixml writer emits an XML document, with these strings as
property values, representing the name of the type of the function.

And of course, the XML later chokes when it tries to read that XML
document, saying that the property is not valid UTF-8.

This patch addresses the issue by dropping those garbage names on the
floor, for function type names.  In that context, any string that is
not made of ASCII characters is considered as being garbage, for now.

The patch, in the abixml writer, also escapes function parameters
names so that they don't contain characters that are not allowed in
XML.  The abixml reader already handles the un-escaping of the names
it reads, so I think there is nothing to do there.

Ultimately, I guess I should get the unicode value of the characters
of that string, encode the string into UTF-8 and use the result as the
name for the parameter.  That would mean using UTF-8 strings for
function parameter names, and, for all declarations names.  But that
is too much for worfk too little gain for now.  The great majority of
the binaries we are dealing with are still using ASCII for declaration
names.

The patch also introduces a new test harness that runs "abidw
--abidiff" on a bunch of input binaries.  This harness runs over the
binaries that were submitted in this bug report.

	* include/abg-tools-utils.h (string_is_ascii): Declare new
	function ...
	* src/abg-tools-utils.cc (string_is_ascii): ... and define it.
	* src/abg-writer.cc (write_function_type): Escape forbidden XML
	characters in function type names.
	* src/abg-dwarf-reader.cc (build_function_type):  If a parameter
	name is not ascii, drop it on the floor.
	* tests/data/test-types-stability/pr19139-DomainNeighborMapInst.o:
	New test input binary.
	* tests/data/test-types-stability/pr19202-libmpi_gpfs.so.5.0:
	Likewise.
	* tests/data/Makefile.am: Add the new binaries above to the build
	system.
	* tests/test-types-stability.cc: New test harness.
	* tests/Makefile.am: Add the new test harness to the build system.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
2015-11-05 16:40:22 +01:00
doc Fix minor warnings when building documentation. 2015-10-05 09:27:16 +02:00
include Bug 19139 - DWARF reader doesn't handle garbage in function names 2015-11-05 16:40:22 +01:00
m4 Delete ltsugar.m4 and pkg.m4 files from m4/ 2015-01-06 09:54:45 +01:00
scripts Initial DOT work. 2013-07-23 23:13:55 +02:00
src Bug 19139 - DWARF reader doesn't handle garbage in function names 2015-11-05 16:40:22 +01:00
tests Bug 19139 - DWARF reader doesn't handle garbage in function names 2015-11-05 16:40:22 +01:00
tools Make abidw --abidiff not show definitely harmless changes 2015-10-15 13:50:49 +02:00
.gitignore Update .gitignore 2014-11-01 12:10:06 +01:00
abigail.m4 For usage from within GCC set header path to $includedir/libabigail 2013-08-14 16:10:15 +02:00
AUTHORS Initial AUTHORS and README 2013-02-28 13:25:20 +01:00
ChangeLog Update ChangeLog file 2015-06-25 08:13:21 +02:00
COMMIT-LOG-GUIDELINES Allow introductory text in commit log and ignore it when generating ChangeLog 2014-11-18 23:18:06 +01:00
COMPILING Encourage people to use autoreconf -i 2015-10-01 10:40:51 +02:00
config.h.in Make abipkgdiff compare tar archives containing binaries 2015-08-22 14:32:20 +02:00
configure.ac Fix activation of Debian package support 2015-10-15 13:50:56 +02:00
CONTRIBUTING Update the CONTRIBUTING file 2015-03-19 12:47:59 +01:00
COPYING Use a better wording for the COPYING file 2015-04-22 09:53:18 +02:00
COPYING-GPLV3 Update licence texts 2015-04-20 13:51:21 +02:00
COPYING-LGPLV2 Initial import of gen-changelog.py 2014-11-18 23:18:06 +01:00
COPYING-LGPLV3 LGPLv3 License the library 2013-07-23 23:13:55 +02:00
gen-changelog.py [gen-changelog] Make subject line always come first 2014-11-18 23:18:06 +01:00
install-sh Add missing autoconfiscation files into version control 2013-03-01 00:47:49 +01:00
libabigail.pc.in Make libxml2 a private dependency wrt pkconfig 2013-08-22 17:41:29 +02:00
ltmain.sh Add missing autoconfiscation files into version control 2013-03-01 00:47:49 +01:00
Makefile.am Update licence texts 2015-04-20 13:51:21 +02:00
README Fix wording in README 2015-09-05 10:30:00 +02:00

This is the Application Binary Interface Generic Analysis and
Instrumentation Library.

It aims at constructing, manipulating, serializing and de-serializing
ABI-relevant artifacts.

The set of artifacts that we are intersted is made of quantities like
types, variable, fonctions and declarations of a given library or
program.  For a given library or program this set of quantities is
called an ABI corpus.

This library aims at (among other things) providing a way to compare
two ABI Corpora (apparently the plural of corpus is copora, heh,
that's cool), provide detailed information about their differences,
and help build tools to infer interesting conclusions about these
differences.

You are welcome to contribute to this project after reading the files
CONTRIBUTING and COMMIT-LOG-GUIDELINES files in the source tree.

Communicating with the maintainers of this project -- including
sending patches to be include to the source code -- happens via email
at libabigail@sourceware.org.