RepoMirrors/musl

mirror of git://git.musl-libc.org/musl synced 2025-02-22 13:56:49 +00:00

Author	SHA1	Message	Date
Rich Felker	54941eddfd	update case mappings to unicode 10.0 the mapping tables and code are not automatically generated; they were produced by comparing the output of towupper/towlower against the mappings in the UCD, ignoring characters that were previously excluded from case mappings or from alphabetic status (micro sign and circled letters), and adding table entries or code for everything else missing. based very loosely on a patch by Reini Urban.	2017-12-18 19:34:21 -05:00
Rich Felker	c72c1c52bc	update ctype tables to unicode 10.0	2017-12-18 18:05:23 -05:00
Rich Felker	d3f23337ee	reformat ctype tables to be diff-friendly, match tool output the new version of the code used to generate these tables forces a newline every 256 entries, whereas at the time these files were originally generated and committed, it only wrapped them at 80 columns. the new behavior ensures that localized changes to the tables, if they are ever needed, will produce localized diffs. commit `d060edf6c5` made the corresponding changes to the iconv tables.	2017-12-18 18:01:42 -05:00
Natanael Copa	179766aa2e	towupper/towlower: fast path for ascii chars Make a fast path for ascii chars which is assumed to be the most common case. This has significant performance benefit on xml json and similar	2017-05-31 21:54:22 -04:00
Rich Felker	1507ebf837	byte-based C locale, phase 1: multibyte character handling functions this patch makes the functions which work directly on multibyte characters treat the high bytes as individual abstract code units rather than as multibyte sequences when MB_CUR_MAX is 1. since MB_CUR_MAX is presently defined as a constant 4, all of the new code added is dead code, and optimizing compilers' code generation should not be affected at all. a future commit will activate the new code. as abstract code units, bytes 0x80 to 0xff are represented by wchar_t values 0xdf80 to 0xdfff, at the end of the surrogates range. this ensures that they will never be misinterpreted as Unicode characters, and that all wctype functions return false for these "characters" without needing locale-specific logic. a high range outside of Unicode such as 0x7fffff80 to 0x7fffffff was also considered, but since C11's char16_t also needs to be able to represent conversions of these bytes, the surrogate range was the natural choice.	2015-06-16 05:28:48 +00:00
Rich Felker	3d7e32d28d	add macro version of ctype.h isascii function presumably internal code (ungetwc and fputwc) was written assuming a macro implementation existed; otherwise use of isascii is just a pessimization.	2015-06-06 18:16:22 +00:00
Rich Felker	4674809bdf	fix case mapping for U+00DF (ß) U+00DF ('ß') has had an uppercase form (U+1E9E) available since Unicode 5.1, but Unicode lacks the case mappings for it due to stability policy. when I added support for the new character in commit `1a63a9fc30`, I omitted the mapping in the lowercase-to-uppercase direction. this choice was not based on any actual information, only assumptions. this commit adds bidirectional case mappings between U+00DF and U+1E9E, and removes the special-case hack that allowed U+00DF to be identified as lowecase despite lacking a mapping. aside from strong evidence that this is the "right" behavior for real-world usage of these characters, several factors informed this decision: - the other "potentially correct" mapping, to "SS", is not representable in the C case-mapping system anyway. - leaving one letter in lowercase form when transforming a string to uppercase is obviously wrong. - having a character which is nominally lowercase but which is fixed under case mapping violates reasonable invariants.	2014-09-05 03:28:00 -04:00
Szabolcs Nagy	b04971d91a	add inline isspace in ctype.h as an optimization isspace can be a bottleneck in a simple parser, inlining it gives slightly smaller and faster code src/locale/pleval.o already had this optimization, the size change for other libc functions for i386 is src/internal/intscan.o 2134 2118 -16 src/locale/dcngettext.o 1562 1552 -10 src/network/res_msend.o 1961 1940 -21 src/network/lookup_name.o 2627 2608 -19 src/network/getnameinfo.o 1814 1811 -3 src/network/lookup_serv.o 643 624 -19 src/stdio/vfscanf.o 2675 2663 -12 src/stdlib/atoll.o 117 107 -10 src/stdlib/atoi.o 95 91 -4 src/stdlib/atol.o 95 91 -4 src/time/strptime.o 1515 1503 -12 (TOTALS) 432451 432321 -130	2014-08-13 16:47:51 +02:00
Rich Felker	d89fdec51b	consolidate *_l ctype/wctype functions into their non-_l source files the main practical purposes of this commit are to remove a huge amount of clutter from the src/locale directory, to cut down on the length of the $(AR) and $(LD) command lines, and to reduce the amount of space wasted by object file headers in the static libc.a. build time may also be reduced, though this has not been measured. as an additional justification, if there ever were a need for the behavior of these functions to vary by locale, it would be necessary for the non-_l versions to call the _l versions, so that linking the former without the latter would not be possible anyway.	2014-07-02 21:16:05 -04:00
Szabolcs Nagy	571744447c	include cleanups: remove unused headers and add feature test macros	2013-12-12 05:09:18 +00:00
rofl0r	d8e8f1464c	iswspace: fix handling of 0	2013-11-11 05:44:47 +01:00
Rich Felker	da1442c9a8	fix types for wctype_t and wctrans_t wctype_t was incorrectly "int" rather than "long" on x86_64. not only is this an ABI incompatibility; it's also a major design flaw if we ever wanted wctype_t to be implemented as a pointer, which would be necessary if locales support custom character classes, since int is too small to store a converted pointer. this commit fixes wctype_t to be unsigned long on all archs, matching the LSB ABI; this change does not matter for C code, but for C++ it affects mangling. the same issue applied to wctrans_t. glibc/LSB defines this type as const __int32_t *, but since no such definition is visible, I've just expanded the definition, int, everywhere. it would be nice if these types (which don't vary by arch) could be in wctype.h, but the OB XSI requirement in POSIX that wchar.h expose some types and functions from wctype.h precludes doing so. glibc works around this with some hideous hacks, but trying to duplicate that would go against the intent of musl's headers.	2013-03-04 19:22:14 -05:00
rofl0r	c50925071c	make some arrays const this way they'll go into .rodata, decreasing memory pressure.	2013-02-02 03:19:25 +01:00
Rich Felker	b0fc78520d	fix argument type error on wcwidth function since the correct declaration was not visible, and since the representation of the types wchar_t and wint_t always match, a compiler would have to go out of its way to make this bug manifest, but better to fix it anyway.	2012-08-02 21:02:34 -04:00
Rich Felker	ac4fb51dde	fix broken wcwidth tables unicode char data has both "W" and "F" wide types and the old table only included the "W" ones. this omitted U+3000 (ideographic space) and all the wide-ascii, etc.	2012-06-20 15:22:03 -04:00
Rich Felker	908bed20cd	fix ctype abi junk (pointer should point to 0 slot, not -128 slot)	2012-06-05 19:42:33 -04:00
Rich Felker	9372655e88	add LSB abi junk for ctype functions this should be the last major fix needed to support running glibc-linked conforming POSIX programs with musl in place of glibc, as long as musl provides the features they need and they don't use pthread cancellation (which is implemented as c++ exceptions in glibc, and fundamentally incompatible with musl).	2012-06-02 17:49:14 -04:00
Rich Felker	1b0ce9af6d	new wcwidth implementation (fast table-based) i tried to go with improving the old binary-search-based algorithm, but between growth in the number of ranges, bad performance, and lack of confidence in the binary search code's stability under changes in the table, i decided it was worth the extra 1.8k to have something clean and maintainable. also note that, like the alpha and punct tables, there's definitely room to optimize the nonspacing/wide tables by overlapping subtables. this is not a high priority, but i've begun looking into how to do it, and i suspect the table sizes can be roughly halved. if that turns out to be true, the new, fast, table-based implementation will be roughly the same size as if i had just extended the old binary search one.	2012-04-24 04:23:55 -04:00
Rich Felker	1a63a9fc30	sync case mappings with unicode 6.1 also special-case ß (U+00DF) as lowercase even though it does not have a mapping to uppercase. unicode added an uppercase version of this character but does not map it, presumably because the uppercase version is not actually used except for some obscure purpose...	2012-04-23 19:19:26 -04:00
Rich Felker	38b5d7d052	optimize iswprint	2012-04-23 16:10:36 -04:00
Rich Felker	640fe75ce8	fix spurious punct class for some surrogate codepoints (invalid) this happened due to their entries in UnicodeData.txt	2012-04-23 16:02:46 -04:00
Rich Felker	7e38b1ea2b	destubify iswalpha and update iswpunct to unicode 6.1 alpha is defined as unicode property "Alphabetic" plus category Nd minus ASCII digits minus 2 special-cased Thai punctuation marks supposedly misclassified by Unicode as letters. punct is defined as all of unicode except control, alphanumeric, and space characters. the tables were generated by a simple tool based on the code posted previously to the mailing list. in the future, this and other code used for maintaining locale/iconv/i18n data will be published either in the main source repository or in a separate locale data generation repository.	2012-04-23 15:25:23 -04:00
Rich Felker	ed2911a113	document iswspace and remove wrongly-included zwsp character	2012-02-09 00:27:19 -05:00
Rich Felker	520f3ee2b6	fix typo in iswspace space list table	2012-02-09 00:20:24 -05:00
Rich Felker	c247ebdd98	more header fixes, minor warning fix	2011-02-14 19:33:11 -05:00
Rich Felker	0b44a0315b	initial check-in, version 0.5.0	2011-02-12 00:22:29 -05:00

26 Commits