c++ - getting a sub string of a std::wstring -
how can substring of std::wstring
includes non-ascii characters?
the following code not output anything:
(the text arabic word contains 4 characters each character has 2 bytes, plus word "hello")
#include <iostream> #include <string> using namespace std; int main() { wstring s = l"سلام hello"; wcout << s.substr(0,3) << endl; wcout << s.substr(4,5) << endl; return 0; }
this should work: live on coliru
#include <iostream> #include <string> #include <boost/regex/pending/unicode_iterator.hpp> using namespace std; template <typename c> std::string to_utf8(c const& in) { std::string result; auto out = std::back_inserter(result); auto utf8out = boost::utf8_output_iterator<decltype(out)>(out); std::copy(begin(in), end(in), utf8out); return result; } int main() { wstring s = l"سلام hello"; auto first = s.substr(0,3); auto second = s.substr(4,5); cout << to_utf8(first) << endl; cout << to_utf8(second) << endl; }
prints
سلا hell
frankly though, think substring
calls making weird assumptions. let me suggest fix in minute:
Comments
Post a Comment