Add support for reading XZ-compressed files

In preparation for an upcoming patch about comparing sets of packages
(especially kernel packages that can contain xz-compressed kernel
modules) this patch adds support for detecting the content of an input
file compressed with the XZ tool.

Note that the various libabigail tools need to know the kind of input
file they are given in order to know what front-end to instantiate to
handle the input file.  The determination of the kind of file is done
by the function tools::utils::guess_file_type either by looking at the
suffix of the file name (for certain special files like tar files) or
by opening the file and looking at its magic bytes.

This patch uses liblzma to de-compress xz-ed files.  The decompressing
is done by a custom-written std::streambuf class.  That decompressor
streambuf is then instantiated with the input compressed ifstream.
Then, an istream is constructed using that decompressor and so readers
of that istream can just transparently just read from it, without
knowing anything about the upstream decompressor streambuf.

The existing code of guess_file_type has been adapted to instantiate
the right decompressor streambuf.  Once it has it and constructed the
(decompressed) istream, the rest of the code remains unchanged and
uses the istream as before.

A new function tools_utils::get_decompressed_streambuf is introduced
to get the right streambuf decompressor for the given input file.
That function must be amended to support new compression schemes to
come.

With this patch, libabigail now can transparently read xz-ed ABIXML
files as well as xz'ed ELF files.  Note that both elfutils and libxml2
know how to read xz'ed binaries so I haven't had to do anything for
them to handle the xz'ed input files.  Pretty neat.

The patch also adds regression tests handling input ABIXML files and
ELF files.

	* configure.ac: Detect the liblzma library.
	* include/abg-tools-utils.h (enum file_type): Add a new
	FILE_TYPE_XZ enumerator.
	(class xz_decompressor_type): Declare a new custom std::streambuf
	class.
	* src/abg-elf-helpers.h: Include elfutils/libdwelf.h.
	* src/abg-elf-reader.cc (get_type_of_elf_file): Use
	dwelf_elf_begin instead of elf_begin to transparently handle
	compressed input file.
	* src/abg-tools-util*s.cc (struct xz_decompressor_type::priv):
	Define private type.
	(xz_decompressor_type::{xz_decompressor_type,
	~xz_decompressor_type, underflow}): Define new methods for
	xz_decompressor_type.
	(operator<<(ostream&, file_type)): Add
	support for the new FILE_TYPE_XZ enumerator.
	(enum compression_kind): Define new enum.
	(is_compressed_file_type, get_decompressed_streambuf): Define new
	static functions.
	(guess_file_type): In the overload for std::istream, detect the XZ file
	type.  In the overload for std::string, use the new
	is_compressed_file_type and get_decompressed_streambuf to
	decompress a compressed file on the fly and handing the resulting
	decompressed istream to the overload for std::istream.
	* tests/data/test-read-dwarf/test0.xz: New input binary test.
	* tests/data/test-read-dwarf/test0.xzbinary: Likewise.
	* tests/data/test-read-write/test28.xml.xz: Likewise.
	* tests/data/test-read-write/test28.xml.xzed: Likewise.
	* tests/data/Makefile.am: Add the new test input to source
	distribution.
	* tests/test-read-dwarf.cc (in_out_specs): Add the new test input
	to this test harness.
	* tests/test-read-write.cc (in_out_specs): Likewise.
	* tools/abicompat.cc (read_corpus): Add support for the new
	abigail::tools_utils::FILE_TYPE_XZ.
	* tools/abidiff.cc (main): Likewise.
	* tools/abilint.cc (main): Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
This commit is contained in:
Dodji Seketeli 2025-01-29 14:51:52 +01:00
parent b93b6c5379
commit e57abd7828
15 changed files with 355 additions and 19 deletions

View File

@ -530,6 +530,11 @@ PKG_CHECK_MODULES(XXHASH, libxxhash >= $XXHASH_VERSION)
AC_SUBST(XXHASH_CFLAGS)
AC_SUBST(XXHASH_LIBS)
dnl Check for dependency: liblzma
LIBLZMA_VERSION=5.2.5
PKG_CHECK_MODULES(LZMA, liblzma >= $LIBLZMA_VERSION)
AC_SUBST(LZMA_CFLAGS)
AC_SUBST(LZMA_LIBS)
dnl Check for some programs like rm, mkdir, etc ...
AC_CHECK_PROG(HAS_RM, rm, yes, no)
@ -912,7 +917,7 @@ AM_CONDITIONAL(ENABLE_RUNNING_TESTS_WITH_PY3, test x$RUN_TESTS_WITH_PY3 = xyes)
AM_CONDITIONAL(ENABLE_PYTHON3_INTERPRETER, test x$PYTHON3_INTERPRETER != xno)
AC_SUBST(PYTHON)
DEPS_CPPFLAGS="$XML_CFLAGS $XXHASH_CFLAGS $ELF_CFLAGS $DW_CFLAGS"
DEPS_CPPFLAGS="$XML_CFLAGS $XXHASH_CFLAGS $ELF_CFLAGS $DW_CFLAGS $LZMA_CFLAGS"
AC_SUBST(DEPS_CPPFLAGS)
dnl Check for the presence of doxygen program
@ -954,7 +959,7 @@ AX_VALGRIND_CHECK
dnl Set the list of libraries libabigail depends on
DEPS_LIBS="$XML_LIBS $ELF_LIBS $DW_LIBS $CTF_LIBS $BPF_LIBS XXHASH_LIBS"
DEPS_LIBS="$XML_LIBS $ELF_LIBS $DW_LIBS $CTF_LIBS $BPF_LIBS XXHASH_LIBS LZMA_LIBS"
AC_SUBST(DEPS_LIBS)
if test x$ABIGAIL_DEVEL != x; then

View File

@ -233,7 +233,22 @@ enum file_type
FILE_TYPE_DIR,
/// A tar archive. The archive can be compressed with the popular
/// compression schemes recognized by GNU tar.
FILE_TYPE_TAR
FILE_TYPE_TAR,
// All non-tared compression scheme go under here. When one of
// these is returned, the goal is to look into the uncompressed
// stream to get what format has been compressed, then return an
// enumerator for that compressed format instead.
//
// Please note that each time a new enumerator is added here, one
// needs to add a corresponding enumerator to the @ref
// compression_kind enum in abg-tools-utils.cc and update the
// is_compressed_file_type and get_compressed_streambuf functions
// accordingly.
/// The XZ (lzma) compresson scheme.
FILE_TYPE_XZ
};
/// Exit status for abidiff and abicompat tools.
@ -370,6 +385,49 @@ create_best_elf_based_reader(const string& elf_file_path,
bool show_all_types,
bool linux_kernel_mode = false);
/// This is a custom std::streambuf that knows how to decompress an
/// input stream that was compressed using xz.
///
/// The code was inspired by the example in the source code of the xz
/// project at
/// https://github.com/tukaani-project/xz/blob/master/doc/examples/02_decompress.c.
///
/// here is an example of how a user code would use this custom
/// streambuf to decode an xz'ed file and emit its content to stdout.
///
/// ifstream input_file("/path/to/a/compressed/file.xz", ifstream::binary);
/// xz_decompressor_type xzed_streambuf(input_file);
/// istream input_stream(&xzed_streambuf);
///
/// const size_t BUFFER_SIZE = 1024 * 4;
/// vector<char> decompressed_data(BUFFER_SIZE);
/// input_stream.read(decompressed_data.data(), BUFFER_SIZE);
/// size_t nb_bytes_read = input_stream.gcount();
/// while (nb_bytes_read && !input_stream.bad())
/// {
/// for (auto c : decompressed_data)
/// std::out << c;
/// }
/// input_file.close();
///
/// Voila.
class xz_decompressor_type : public std::streambuf
{
struct priv;
std::unique_ptr<priv> priv_;
public:
xz_decompressor_type(std::istream& xz_istream);
~xz_decompressor_type();
protected:
int_type
underflow() override;
}; // end class xz_decompressor_type.
}// end namespace tools_utils
/// A macro that expands to aborting the program when executed.

View File

@ -13,6 +13,7 @@
#include "config.h"
#include <elfutils/libdwfl.h>
#include <elfutils/libdwelf.h>
#include <gelf.h>
#include <string>

View File

@ -1071,7 +1071,9 @@ get_type_of_elf_file(const string& path, elf::elf_type& type)
return false;
elf_version (EV_CURRENT);
Elf *elf = elf_begin (fd, ELF_C_READ_MMAP, NULL);
// Note that the dwelf_elf_begin function supports decompressing the
// content of the input file, which is pretty cool.
Elf *elf = dwelf_elf_begin(fd);
type = elf_file_type(elf);
elf_end(elf);
close(fd);

View File

@ -35,6 +35,7 @@
#include <libgen.h>
#include <libxml/parser.h>
#include <libxml/xmlversion.h>
#include <lzma.h>
#include <algorithm>
#include <cstdlib>
#include <cstring>
@ -1555,12 +1556,43 @@ operator<<(ostream& output,
case FILE_TYPE_TAR:
repr = "GNU tar archive type";
break;
case FILE_TYPE_XZ:
repr = "XZ compressed file";
}
output << repr;
return output;
}
/// The kind of compression we want a de-compression std::streambuf
/// for.
///
/// This enum must be amended to add support for new compression
/// schemes, especially whenever a new enumerator is added to the enum
/// @ref file_type.
enum compression_kind
{
COMPRESSION_KIND_UNKNOWN,
/// The LZMA compression (used by the xz tool).
COMPRESSION_KIND_XZ
}; //end enum compression_kind
/// Test if one of the enumerators of @ref file_type designates a
/// compression scheme.
///
/// This helper function needs to be updated whenever a new
/// compression-related enumerator is added to @ref file_type.
///
/// @return the kind of compression designated by @p t.
static compression_kind
is_compressed_file_type(file_type t)
{
if (t == FILE_TYPE_XZ)
return COMPRESSION_KIND_XZ;
return COMPRESSION_KIND_UNKNOWN;
}
/// Guess the type of the content of an input stream.
///
/// @param in the input stream to guess the content type for.
@ -1572,11 +1604,11 @@ guess_file_type(istream& in)
const unsigned BUF_LEN = 264;
const unsigned NB_BYTES_TO_READ = 263;
char buf[BUF_LEN];
unsigned char buf[BUF_LEN];
memset(buf, 0, BUF_LEN);
std::streampos initial_pos = in.tellg();
in.read(buf, NB_BYTES_TO_READ);
in.read(reinterpret_cast<char*>(buf), NB_BYTES_TO_READ);
in.seekg(initial_pos);
if (in.gcount() < 4 || in.bad())
@ -1588,6 +1620,17 @@ guess_file_type(istream& in)
&& buf[3] == 'F')
return FILE_TYPE_ELF;
// XZ format. Described at
// https://tukaani.org/xz/xz-file-format.txt.
if (in.gcount() >= 6
&& buf[0] == 0xFD
&& buf[1] == '7'
&& buf[2] == 'z'
&& buf[3] == 'X'
&& buf[4] == 'Z'
&& buf[5] == 0)
return FILE_TYPE_XZ;
if (buf[0] == '!'
&& buf[1] == '<'
&& buf[2] == 'a'
@ -1596,7 +1639,7 @@ guess_file_type(istream& in)
&& buf[5] == 'h'
&& buf[6] == '>')
{
if (strstr(buf, "debian-binary"))
if (strstr(reinterpret_cast<char*>(buf), "debian-binary"))
return FILE_TYPE_DEB;
else
return FILE_TYPE_AR;
@ -1674,6 +1717,42 @@ guess_file_type(istream& in)
return FILE_TYPE_UNKNOWN;
}
/// The factory of an std::streambuf aimed at decompressing data
/// coming from an input stream compressed with a particular
/// compression scheme.
///
/// This function must be amended to add support for new compression
/// schemes.
///
/// @param compressed_input the compressed input to create the
/// decompressor std::streambuf for.
///
/// @param compr the compression scheme kind.
///
/// @return a pointer to the std::streambuf to use for decompression.
/// If the compression scheme is not supported, the function returns
/// nil.
static shared_ptr<std::streambuf>
get_decompressed_streambuf(std::istream& compressed_input,
compression_kind compr)
{
shared_ptr<std::streambuf> result;
switch(compr)
{
case COMPRESSION_KIND_UNKNOWN:
ABG_ASSERT_NOT_REACHED;
break;
case COMPRESSION_KIND_XZ:
shared_ptr<std::streambuf> r(new xz_decompressor_type(compressed_input));
result = r;
break;
};
return result;
};// end struct compression_handler_type
/// Guess the type of the content of an file.
///
/// @param file_path the path to the file to consider.
@ -1702,9 +1781,57 @@ guess_file_type(const string& file_path)
|| string_ends_with(file_path, ".tz"))
return FILE_TYPE_TAR;
ifstream in(file_path.c_str(), ifstream::binary);
file_type r = guess_file_type(in);
in.close();
file_type r = FILE_TYPE_UNKNOWN;
compression_kind compr_kind = COMPRESSION_KIND_UNKNOWN;
shared_ptr<std::streambuf> decompressor_streambuf;
if (string_ends_with(file_path, ".lzma")
|| string_ends_with(file_path, ".lz")
|| string_ends_with(file_path, ".xz"))
compr_kind = COMPRESSION_KIND_XZ;
// else if there are other compression schemes supported, recognize
// their file suffix here!
do
{
shared_ptr<ifstream> input_fstream(new ifstream(file_path.c_str(),
ifstream::binary));
shared_ptr<istream> input_stream = input_fstream;
if (compr_kind != COMPRESSION_KIND_UNKNOWN)
decompressor_streambuf = get_decompressed_streambuf(*input_stream,
compr_kind);
if (decompressor_streambuf)
input_stream.reset(new istream(decompressor_streambuf.get()));
r = guess_file_type(*input_stream);
input_fstream->close();
if (!decompressor_streambuf)
{
// So we haven't attempted to decompress the input stream.
//
// Have we found out that it was compressed nonetheless?
compr_kind = is_compressed_file_type(r);
if (compr_kind)
{
// yes, we found out the input file is compressed, so we
// do have the means to decompress it. However, we
// haven't yet gotten the de-compressor; that might be
// because we detected the compression just by looking
// at the file name suffix. Let's go back to calling
// get_decompressed_streambuf again to get the
// decompressor.
;
}
else
// No the file is not compressed let's get out of here.
break;
}
} while (!decompressor_streambuf && compr_kind);
return r;
}
@ -3288,6 +3415,115 @@ create_best_elf_based_reader(const string& elf_file_path,
return result;
}
/// ---------------------------------------------------
/// <xz_decompressor definition>
///----------------------------------------------------
/// The private data of the @ref xz_decompressor_type class.
struct xz_decompressor_type::priv
{
std::istream& xz_istream;
lzma_stream lzma;
// A 10k bytes buffer for xz data coming from the
// xz'ed istream. That buffer is going to be fed into the lzma
// decoding machinery.
char inbuf[1024 * 10] = {};
// A 10k bytes buffer for decompressed data coming
// out of the lzma machinery
char outbuf[1024 * 10] = {};
priv(std::istream& i)
: xz_istream(i),
lzma(LZMA_STREAM_INIT)
{}
};// end xz_decompressor_type::priv
/// Constructor of the @ref xz_decompressor_type class.
///
/// @param xz_istream the input stream containing the xz-compressed
/// data to decompress.
xz_decompressor_type::xz_decompressor_type(std::istream& xz_istream)
: priv_(new priv(xz_istream))
{
// Initialize the native LZMA stream to decompress.
lzma_ret status = lzma_stream_decoder(&priv_->lzma,
UINT64_MAX,
LZMA_CONCATENATED);
ABG_ASSERT(status == LZMA_OK);
}
/// Destructor of the @ref xz_decompressor_type class.
xz_decompressor_type::~xz_decompressor_type()
{
lzma_end(&priv_->lzma);
}
/// The implementation of the virtual protected
/// std:streambuf::underlying method. This method is invoked by the
/// std::streambuf facility to re-fill its internals buffers with data
/// coming from the associated input stream and to update the gptr()
/// and egptr() pointers by using the std::streambuf::setg method.
///
/// This is where the decompression using the lzma library is
/// performed.
std::streambuf::int_type
xz_decompressor_type::underflow()
{
if (gptr() < egptr())
return *gptr();
// Let's read 'nr' bytes of xz data into inbuf
priv_->xz_istream.read(priv_->inbuf, sizeof(priv_->inbuf));
size_t nr = priv_->xz_istream.gcount();
if (nr == 0)
{
// Tell the lzma machinery that we've reached the end of the
// data.
lzma_ret result = lzma_code(&priv_->lzma, LZMA_FINISH);
ABG_ASSERT(result == LZMA_OK || result == LZMA_STREAM_END);
return EOF;
}
// Let's prepare the lzma input/output stream/machinery.
priv_->lzma.avail_in = nr;
priv_->lzma.next_in = reinterpret_cast<uint8_t*>(priv_->inbuf);
priv_->lzma.avail_out = sizeof(priv_->outbuf);
priv_->lzma.next_out = reinterpret_cast<uint8_t*>(priv_->outbuf);
// Let's now ask the lzma machinery to decompress the inbuf and
// put the result into outbuf.
lzma_ret result = lzma_code(&priv_->lzma, LZMA_RUN);
if (result != LZMA_OK && result != LZMA_STREAM_END)
{
// TODO: list the possible error codes and tell them explicitely
// to the user, just like what is done in
// https://github.com/tukaani-project/xz/blob/master/doc/examples/02_decompress.c.
std::ostringstream o;
o << "LZMA decompression failed;"
<< " return code of lzma_code() is : "
<< result;
throw std::runtime_error(o.str());
}
// Let's get the number of bytes decompressed by the lzma
// machinery. I got this from the example in the xz code base at
// https://github.com/tukaani-project/xz/blob/master/doc/examples/02_decompress.c.
size_t nr_decompressed_bytes = sizeof(priv_->outbuf) - priv_->lzma.avail_out;
// Now set the relevant index pointers of this streambuf.
setg(priv_->outbuf, priv_->outbuf, priv_->outbuf + nr_decompressed_bytes);
if (nr_decompressed_bytes > 0)
return *gptr();
return EOF;
}
/// ---------------------------------------------------
/// </xz_decompressor definition>
///----------------------------------------------------
}//end namespace tools_utils
using abigail::ir::function_decl;

View File

@ -29,6 +29,8 @@ test-read-write/test25.xml \
test-read-write/test26.xml \
test-read-write/test27.xml \
test-read-write/test28.xml \
test-read-write/test28.xml.xz \
test-read-write/test28.xml.xzed \
test-read-write/test28-drop-std-fns.abignore \
test-read-write/test28-without-std-fns.xml \
test-read-write/test28-without-std-fns-ref.xml \
@ -714,14 +716,16 @@ test-read-common/PR27700/test-PR27700.c \
test-read-common/PR27700/test-PR27700.o \
\
test-read-dwarf/test0 \
test-read-dwarf/test0.abi \
test-read-dwarf/test0.xzbinary \
test-read-dwarf/test0.xz \
test-read-dwarf/test0.abi \
test-read-dwarf/test0.hash.abi \
test-read-dwarf/test0.cc \
test-read-dwarf/test0.cc \
test-read-dwarf/test1 \
test-read-dwarf/test1.abi \
test-read-dwarf/test1.abi \
test-read-dwarf/test1.hash.abi \
test-read-dwarf/test1.cc \
test-read-dwarf/test2.h \
test-read-dwarf/test1.cc \
test-read-dwarf/test2.h \
test-read-dwarf/test2-0.cc \
test-read-dwarf/test2-1.cc \
test-read-dwarf/test2.so \

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -46,6 +46,24 @@ static InOutSpec in_out_specs[] =
"output/test-read-dwarf/test0.abi",
NULL,
},
{
"data/test-read-dwarf/test0.xz",
"",
"",
SEQUENCE_TYPE_ID_STYLE,
"data/test-read-dwarf/test0.abi",
"output/test-read-dwarf/test0-xz.abi",
NULL,
},
{
"data/test-read-dwarf/test0.xzbinary",
"",
"",
SEQUENCE_TYPE_ID_STYLE,
"data/test-read-dwarf/test0.abi",
"output/test-read-dwarf/test0-xzbinary.abi",
NULL,
},
{
"data/test-read-dwarf/test0",
"",

View File

@ -229,6 +229,18 @@ InOutSpec in_out_specs[] =
"data/test-read-write/test28-without-std-fns-ref.xml",
"output/test-read-write/test28-without-std-fns.xml"
},
{
"data/test-read-write/test28.xml.xz",
"data/test-read-write/test28-drop-std-fns.abignore",
"data/test-read-write/test28-without-std-fns-ref.xml",
"output/test-read-write/test28-without-std-fns-xz.xml"
},
{
"data/test-read-write/test28.xml.xzed",
"data/test-read-write/test28-drop-std-fns.abignore",
"data/test-read-write/test28-without-std-fns-ref.xml",
"output/test-read-write/test28-without-std-fns-xzed.xml"
},
{
"data/test-read-write/test28.xml",
"data/test-read-write/test28-drop-std-vars.abignore",

View File

@ -899,6 +899,7 @@ read_corpus(options opts,
case abigail::tools_utils::FILE_TYPE_DIR:
case abigail::tools_utils::FILE_TYPE_TAR:
case abigail::tools_utils::FILE_TYPE_NATIVE_BI:
case abigail::tools_utils::FILE_TYPE_XZ:
break;
}

View File

@ -1487,6 +1487,7 @@ main(int argc, char* argv[])
case abigail::tools_utils::FILE_TYPE_DEB:
case abigail::tools_utils::FILE_TYPE_DIR:
case abigail::tools_utils::FILE_TYPE_TAR:
case abigail::tools_utils::FILE_TYPE_XZ:
break;
}
@ -1576,6 +1577,7 @@ main(int argc, char* argv[])
case abigail::tools_utils::FILE_TYPE_DEB:
case abigail::tools_utils::FILE_TYPE_DIR:
case abigail::tools_utils::FILE_TYPE_TAR:
case abigail::tools_utils::FILE_TYPE_XZ:
break;
}

View File

@ -836,14 +836,11 @@ main(int argc, char* argv[])
}
break;
case abigail::tools_utils::FILE_TYPE_RPM:
break;
case abigail::tools_utils::FILE_TYPE_SRPM:
break;
case abigail::tools_utils::FILE_TYPE_DEB:
break;
case abigail::tools_utils::FILE_TYPE_DIR:
break;
case abigail::tools_utils::FILE_TYPE_TAR:
case abigail::tools_utils::FILE_TYPE_XZ:
break;
}