RepoMirrors/musl - musl

Commit Graph

Author	SHA1	Message	Date
Rich Felker	23ab8c2555	mbrtowc: do not leave mbstate_t in permanent-fail state after EILSEQ the standard is clear that the old behavior is conforming: "In this case, [EILSEQ] shall be stored in errno and the conversion state is undefined." however, the specification of mbrtowc has one peculiarity when the source argument is a null pointer: in this case, it's required to behave as mbrtowc(NULL, "", 1, ps). no motivation is provided for this requirement, but the natural one that comes to mind is that the intent is to reset the mbstate_t object. for stateful encodings, such behavior is actually specified: "If the corresponding wide character is the null wide character, the resulting state described shall be the initial conversion state." but in the case of UTF-8 where the mbstate_t object contains a partially-decoded character rather than a shift state, a subsequent '\0' byte indicates that the previous partial character is incomplete and thus an illegal sequence. naturally, applications using their own mbstate_t object should clear it themselves after an error, but the standard presently provides no way to clear the builtin mbstate_t object used when the ps argument is a null pointer. I suspect this issue may be addressed in the future by specifying that a null source argument resets the state, as this seems to have been the intent all along. for what it's worth, this change also slightly reduces code size.	2013-04-08 23:09:11 -04:00
Rich Felker	ea34b1b90c	implement mbtowc directly, not as a wrapper for mbrtowc the interface contract for mbtowc admits a much faster implementation than mbrtowc can achieve; wrapping mbrtowc with an extra call frame only made the situation worse. since the regex implementation uses mbtowc already, this change should improve regex performance too. it may be possible to improve performance in other places internally by switching from mbrtowc to mbtowc.	2013-04-08 23:01:32 -04:00
Rich Felker	a49e038bab	optimize mbrtowc this simple change, in my measurements, makes about a 7% performance improvement. at first glance this change would seem like a compiler-specific hack, since the modified code is not even used. however, I suspect the reason is that I'm eliminating a second path into the main body of the code, allowing the compiler more flexibility to optimize the normal (hot) path into the main body. so even if it weren't for the measurable (and quite notable) difference in performance, I think the change makes sense.	2013-04-08 22:49:59 -04:00
Rich Felker	8f06ab0eb9	fix out-of-bounds access in UTF-8 decoding SA and SB are used as the lowest and highest valid starter bytes, but the value of SB was one-past the last valid starter. this caused access past the end of the state table when the illegal byte '\xf5' was encountered in a starter position. the error did not show up in full-character decoding tests, since the bogus state read from just past the table was unlikely to admit any continuation bytes as valid, but would have shown up had we tested feeding '\xf5' to the byte-at-a-time decoding in mbrtowc: it would cause the funtion to wrongly return -2 rather than -1. I may eventually go back and remove all references to SA and SB, replacing them with the values; this would make the code more transparent, I think. the original motivation for using macros was to allow misguided users of the code to redefine them for the purpose of enlarging the set of accepted sequences past the end of Unicode...	2013-04-08 22:29:46 -04:00
Rich Felker	771c6cead0	cleanup wcstombs remove redundant headers and comments; this file is completely trivial now. also, avoid temp var.	2013-04-04 14:55:42 -04:00
Rich Felker	b5a527f9ff	cleanup mbstowcs wrapper remove unneeded headers. this file is utterly trivial now and there's no sense in having a comment to state that it's in the public domain.	2013-04-04 14:53:53 -04:00
Rich Felker	f62b12d051	minor optimization to mbstowcs there is no need to zero-fill an mbstate_t object in the caller; mbsrtowcs will automatically treat a null pointer as the initial state.	2013-04-04 14:51:05 -04:00
Rich Felker	40b2b5fa94	fix incorrect range checks in wcsrtombs negative values of wchar_t need to be treated in the non-ASCII case so that they can properly generate EILSEQ rather than getting truncated to 8bit values and stored in the output.	2013-04-04 14:48:48 -04:00
Rich Felker	50d9661d9b	overhaul mbsrtowcs these changes fix at least two bugs: - misaligned access to the input as uint32_t for vectorized ASCII test - incorrect src pointer after stopping on EILSEQ in addition, the text of the standard makes it unclear whether the mbstate_t object is to be modified when the destination pointer is null; previously it was cleared either way; now, it's only cleared when the destination is non-null. this change may need revisiting, but it should not affect most applications, since calling mbsrtowcs with non-zero state can only happen when the head of the string was already processed with mbrtowc. finally, these changes shave about 20% size off the function and seem to improve performance by 1-5%.	2013-04-04 14:42:35 -04:00
Rich Felker	400c5e5c83	use restrict everywhere it's required by c99 and/or posix 2008 to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.	2012-09-06 22:44:55 -04:00
Rich Felker	6436b371af	fix failure of mbsinit(0) (not UB; required to return nonzero) issue reported by Richard Pennington; slightly simpler fix applied	2012-05-26 18:02:45 -04:00
Rich Felker	485fb14ab4	fix longstanding exit logic bugs in mbsnrtowcs and wcsnrtombs these are POSIX 2008 (previously GNU extension) functions that are rarely used. apparently they had never been tested before, since the end-of-string logic was completely missing. mbsnrtowcs is used by modern versions of bash for its glob implementation, and and this bug was causing tab completion to hang in an infinite loop.	2012-05-02 13:59:48 -04:00
Rich Felker	78e79d9d50	new attempt at working around the gcc 3 visibility bug since gcc is failing to generate the necessary ".hidden" directive in the output asm, generate it explicitly with an __asm__ statement...	2012-02-24 20:07:21 -05:00
Rich Felker	7fa29920ed	remove useless attribute visibility from definitions this was a failed attempt at working around the gcc 3 visibility bug affecting x86_64. subsequent patch will address it with an ugly but working hack.	2012-02-24 20:02:42 -05:00
Rich Felker	bae2e52bfd	cleanup and work around visibility bug in gcc 3 that affects x86_64 in gcc 3, the visibility attribute must be placed on both the declaration and on the definition. if it's omitted from the definition, the compiler fails to emit the ".hidden" directive in the assembly, and the linker will either generate textrels (if supported, such as on i386) or refuse to link (on targets where certain types of textrels are forbidden or impossible without further assumptions about memory layout, such as on x86_64). this patch also unifies the decision about when to use visibility into libc.h and makes the visibility in the utf-8 state machine tables based on libc.h rather than a duplicate test.	2012-02-23 21:24:56 -05:00
Rich Felker	9ae8d5fc71	fix all implicit conversion between signed/unsigned pointers sadly the C language does not specify any such implicit conversion, so this is not a matter of just fixing warnings (as gcc treats it) but actual errors. i would like to revisit a number of these changes and possibly revise the types used to reduce the number of casts required.	2011-03-25 16:34:03 -04:00
Rich Felker	015d33c507	cleanup utf-8 multibyte code, use visibility if possible this code was written independently of musl, with support for a the backwards, nonstandard "31-bit unicode" some libraries/apps might want. unfortunately the extra code (inside #ifdef) makes the source harder to read and makes code that should be simple look complex, so i'm removing it. anyone who wants to use the old code can find it in the history or from elsewhere. also, change the visibility of the __fsmu8 state machine table to hidden, if supported. this should improve performance slightly in shared-library builds.	2011-02-27 00:28:59 -05:00
Rich Felker	cfcbea1e43	remove sample utf-8 code that's not part of the standard library	2011-02-21 15:43:26 -05:00
Rich Felker	f9d880d258	cleanup multibyte stuff to remove ugly casts, sanitize the ptr align casts	2011-02-13 23:08:18 -05:00
Rich Felker	0b44a0315b	initial check-in, version 0.5.0	2011-02-12 00:22:29 -05:00

20 Commits