improper UTF-8 characters handling by ICC on Windows

improper UTF-8 characters handling by ICC on Windows

Hello, everyone,

Fot ICU4C build using ICC on Windows got error:

sh-4.4$ (INSTALLDIR="$PWD/../../ICC64RH" && (CC="icl" CFLAGS="-MD" CXX="icl" CXXFLAGS="-MD" LD="xilink" ./configure --prefix="$INSTALLDIR" --disable-debug --enable-release --enable-shared --disable-static >_configure.log && make >_make.log) 2>_stderr.log)
[snip]
sh-4.4$ make tests
[snip]
icl   -DHAVE_DLOPEN=0 -DU_HAVE_ATOMIC=1 -DU_HAVE_MMAP=0 -DU_HAVE_DIRENT_H=0 -DU_HAVE_POPEN=0 -DU_HAVE_STRTOD_L=0  -DU_RELEASE=1 -D_CRT_SECURE_NO_DEPRECATE -I. -I../../common -I../../i18n -I../../tools/toolutil -I../../tools/ctestfw -DUNISTR_FROM_CHAR_EXPLICIT= -DUNISTR_FROM_STRING_EXPLICIT= -DUCHAR_TYPE=char16_t -DU_ATTRIBUTE_DEPRECATED= -DWIN32 -DCYGWINMSVC -D'U_TOPSRCDIR="../../"' -D'U_TOPBUILDDIR="/c/libICU4C-59.1/build/source/"' -MD   -GF -nologo -EHsc -Zc:wchar_t -c   -Forbbitst.o rbbitst.cpp
rbbitst.cpp
rbbitst.cpp(1282): error: too many characters in character constant
              if (c == u'???') {
                       ^

compilation aborted for rbbitst.cpp (code 2)
make[2]: *** [../../config/mh-msys-msvc:142: rbbitst.o] Error 2
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/c/libICU4C-59.1/build/source/test/intltest'
make[1]: *** [Makefile:68: all-recursive] Error 2
make[1]: Leaving directory '/c/libICU4C-59.1/build/source/test'
make: *** [Makefile:213: tests] Error 2

which relate to code:

if (c == u'•') {

in the specified file and compiler option '-utf-8' in MSVC toolchain (both appeared simultaneously in ICU4C 59.1).

ICC doesn't support this option and throw warnings:

icl: command line warning #10006: ignoring unknown option '/utf-8'

Similar error could be reproduced using MSVC, if ICU was built without '-utf-8' option:

sh-4.4$ (INSTALLDIR="$PWD/../../MSVC64RH" && (CC="cl" CFLAGS="-MD" CXX="cl" CXXFLAGS="-MD" LD="link" ./configure --prefix="$INSTALLDIR" --disable-debug --enable-release --enable-shared --disable-static >_configure.log && make >_make.log) 2>_stderr.log)
[snip]
sh-4.4$ make tests
[snip]
make[2]: Entering directory '/c/libICU4C-59.1/build/source/test/intltest'
cl   -DHAVE_DLOPEN=0 -DU_HAVE_ATOMIC=1 -DU_HAVE_MMAP=0 -DU_HAVE_DIRENT_H=0 -DU_HAVE_POPEN=0 -DU_HAVE_TZNAME=0 -DU_HAVE_STRTOD_L=0  -DU_RELEASE=1 -D_CRT_SECURE_NO_DEPRECATE -I. -I../../common -I../../i18n -I../../tools/toolutil -I../../tools/ctestfw -DUNISTR_FROM_CHAR_EXPLICIT= -DUNISTR_FROM_STRING_EXPLICIT= -DUCHAR_TYPE=char16_t -DU_ATTRIBUTE_DEPRECATED= -DWIN32 -DCYGWINMSVC -D'U_TOPSRCDIR="../../"' -D'U_TOPBUILDDIR="/c/libICU4C-59.1/build/source/"' -MD   -GF -nologo -EHsc -Zc:wchar_t -c   -Forbbitst.o rbbitst.cpp
rbbitst.cpp
rbbitst.cpp(1282): error C2015: too many characters in constant
make[2]: *** [../../config/mh-msys-msvc:142: rbbitst.o] Error 2
make[2]: Leaving directory '/c/libICU4C-59.1/build/source/test/intltest'
make[1]: *** [Makefile:68: all-recursive] Error 2
make[1]: Leaving directory '/c/libICU4C-59.1/build/source/test'
make: *** [Makefile:213: tests] Error 2

As for mingw-w64, it doesn't reproduce this error even without '-fexec-charset= and -finput-charset=' options.

Using undocumented ICC option '-Qoption,cpp,"--uliterals"' (see threads ICC v13 Beta Questions, C++11, Intel Parallel Studio 13 XE & Unicode String Literals, etc.) doesn't solve the issue.

 

Since ICC on Windows imitates MSVC, is it possible to add him '-utf-8' option support? It would make behavior of ICC on Windows regarding UTF-8 characters identical to MSVC.

 

Environment:

  • Windows 10 x64,
  • IPSXE 2017 Update 2,
  • VS 2015 Update 3,
  • Windows SDK 10.0.14393.33,
  • MSYS2 20161025,
  • ICU4C 59.1.

 

Alexander

 

Zone: 

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

 

I don't know what ICU4C on Windows is but using "icl" on Windows when the reference compiler is MSVC++ 2015 should enable utf8 character literals by default, i.e.:

128% cat utf8.cpp

int main() {
 char c = 'a';

 if (c == u'a');

 return 0;
}

129% icl -c utf8.cpp
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0 Beta Build x
Built Apr  5 2017 13:14:29 by jward4 on JWARD4-DESK1 in D:/workspaces/17_0cfe/dev
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

utf8.cpp
130%

Can you attach your code so we can take a look? You said it only compiles with MSVC++ 2015 if you add the /utf-8 option, right?

You are right we should add the Microsoft /utf-8 option.

thanks

Judy

Thank you for a quick response. Added "ICU4C" code to your test:

int main() {
    char c = u'•';

    return 0;
}

and reproduced error using ICC and MSVC:

c:\TEST>icl -c utf8.cpp
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0 Build 20170105
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

utf8.cpp
utf8.cpp(2): error: too many characters in character constant
      char c = u'???';
               ^

compilation aborted for utf8.cpp (code 2)

c:\TEST>icl -c -Qoption,cpp,"--uliterals" utf8.cpp
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0 Build 20170105
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

utf8.cpp
utf8.cpp(2): error: too many characters in character constant
      char c = u'???';
               ^

compilation aborted for utf8.cpp (code 2)

c:\TEST>cl -c utf8.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24210 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

utf8.cpp
utf8.cpp(2): error C2015: too many characters in constant

c:\TEST>cl -c -utf-8 utf8.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24210 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

utf8.cpp

c:\TEST>

Test was successfully built using MSVC after adding '-utf-8' option to build command.

 

Alexander

 

Best Reply

 

thank for reporting this.

I have submitted cmplrs-43403 to hook up the /utf-8 switch to the necessary front end support.

In the meantime you should be able to use the following:

   -Qoption,cpp,--unicode_source_kind,"UTF-8"

Thank you. The only flaw left, that special utf-8 character '•' is displayed incorrectly in ICC output to console:

c:\TEST>icl -c -Qoption,cpp,--unicode_source_kind,"UTF-8" utf8.cpp
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.4.210 Build 20170411
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

utf8.cpp
utf8.cpp(2): warning #69: integer conversion resulted in truncation
      char c = u'???';
               ^

Though Windows console display special utf-8 characters correctly:

c:\TEST>echo '•'
'•'

 

Alexander

Leave a Comment

Please sign in to add a comment. Not a member? Join today