From 23ab8c255543a7e0876c7e1858ef0d4bbd562729 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Mon, 8 Apr 2013 23:09:11 -0400 Subject: [PATCH] mbrtowc: do not leave mbstate_t in permanent-fail state after EILSEQ the standard is clear that the old behavior is conforming: "In this case, [EILSEQ] shall be stored in errno and the conversion state is undefined." however, the specification of mbrtowc has one peculiarity when the source argument is a null pointer: in this case, it's required to behave as mbrtowc(NULL, "", 1, ps). no motivation is provided for this requirement, but the natural one that comes to mind is that the intent is to reset the mbstate_t object. for stateful encodings, such behavior is actually specified: "If the corresponding wide character is the null wide character, the resulting state described shall be the initial conversion state." but in the case of UTF-8 where the mbstate_t object contains a partially-decoded character rather than a shift state, a subsequent '\0' byte indicates that the previous partial character is incomplete and thus an illegal sequence. naturally, applications using their own mbstate_t object should clear it themselves after an error, but the standard presently provides no way to clear the builtin mbstate_t object used when the ps argument is a null pointer. I suspect this issue may be addressed in the future by specifying that a null source argument resets the state, as this seems to have been the intent all along. for what it's worth, this change also slightly reduces code size. --- src/multibyte/mbrtowc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/multibyte/mbrtowc.c b/src/multibyte/mbrtowc.c index ec323859..db803661 100644 --- a/src/multibyte/mbrtowc.c +++ b/src/multibyte/mbrtowc.c @@ -51,7 +51,7 @@ loop: *(unsigned *)st = c; return -2; ilseq: - *(unsigned *)st = FAILSTATE; + *(unsigned *)st = 0; errno = EILSEQ; return -1; }