How do c++ and g++ deal with unicode? -

i'm trying figure out proper way deal unicode in c++. want understand how g++ handles literal wide character strings, , regular c strings containing unicode characters. have set basic tests , don't understand happening.

wstring ws1(l"«¬.txt"); // these first 2 characters correspond 0xab, 0xac string s1("«¬.txt");  ifstream in_file( s1.c_str() ); // wifstream in_file( s1.c_str() ); // throws exception when                                      // call in_file >> s; string s; in_file >> s; // s contains «¬  wstring ws = texttowide(s);  wcout << ws << endl; // these 2 lines work independently of each other,                      // combining them makes second 1 print incorrectly cout << s << endl; printf( "%s", s.c_str() ); // same case here, these work independently of 1 another,                            // calling 1 after other makes second call                            // print incorrectly wprintf( l"%s", ws.c_str() );  wstring texttowide(string s) {     mbstate_t mbstate;     char *cc = new char[s.length() + 1];     strcpy(cc, s.c_str());     cc[s.length()] = 0;     size_t numbytes = mbsrtowcs(0, (const char **)&cc, 0, &mbstate);     wchar_t *buff = new wchar_t[numbytes + 1];     mbsrtowcs(buff, (const char **)&cc, numbytes + 1, &mbstate);     wstring ws = buff;     delete [] cc;     delete [] buff;     return ws; } 

it seems calls wcout , wprintf corrupt stream somehow, , safe call cout , printf long strings encoded utf-8.

would best way handle unicode convert input wide before processing, , convert output utf-8 before sending outupt?

the comprehensive way handle unicode use unicode library such icu. unicode has many more aspects bunch of encodings. c++ not offer apis work of these aspects. icu does.

if want handle encodings, working way use built-in c++ methods correctly. includes calling

std::setlocale(lc_all,                 /*some system-specific locale name, */ "en_us.utf-8") 

in beginning of program. also, not using cout/printf , wcout/wprintf in same program. (you can use regular , wide stream objects other standard handles in same program).

converting input wide , converting output utf-8 reasonable strategy. working utf-8 reasonable too. lot depends on application. c++11 has built-in utf8, utf16 , utf32 string types simplify task somewhat.

whatever do, don't use elements of extended character set in string literals. (in c++11 it's ok use them in utf8/16/32 string literals).


Popular posts from this blog

java - JavaFX 2 slider labelFormatter not being used -

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -