YOU CAN CODE!

 

With The Case Of UCanCode.net  Release The Power OF  Visual C++ !   Home Products | Purchase Support | Downloads  
View in English
View in Japanese
View in
참고
View in Français
View in Italiano
View in 中文(繁體)

Download Free Trial
Pricing & Purchase?
E-XD++Visual C++/ MFC Products
ActiveX COM Products
Technical Support

Links

Get Ready to Unleash the Power of UCanCode .NET

VC++ Converting ANSI to Unicode with _MSC_VER, MBCS, Multiple Byte
Having just looked at ASCII strings to Unicode in C++[^], here's my preferred solution to this part of the never-ending story of string conversion:
 
 
#include <locale>
#include <string>
std::wstring widen(const std::string& str)
{
    std::wstring wstr(str.size(), 0);
#if _MSC_VER >= 1400    // use Microsofts Safe libraries if possible (>=VS2005)
    std::use_facet<std::ctype<wchar_t> >(std::locale())._Widen_s
        (&str[0], &str[0]+str.size(), &wstr[0], wstr.size());
#else
    std::use_facet<std::ctype<wchar_t> >(std::locale()).widen
        (&str[0], &str[0]+str.size(), &wstr[0]);
#endif
    return wstr;
}
 
std::string narrow(const std::wstring& wstr, char rep = '_')
{
    std::string str(wstr.size(), 0);
#if _MSC_VER >= 1400
    std::use_facet<std::ctype<wchar_t> >(std::locale())._Narrow_s
        (&wstr[0], &wstr[0]+wstr.size(), rep, &str[0], str.size());
#else
    std::use_facet<std::ctype<wchar_t> >(std::locale()).narrow
        (&wstr[0], &wstr[0]+wstr.size(), rep, &str[0]);
#endif
    return str;
}
 
Yes, it does look nasty - but it is the way to go in pure C++. Funny enough, I never found any good and comprehensive documentation on C++ locales, most books tend to leave the topic unharmed.
 
By using the standard constructor of
std::locale in the functions, the "C" locale defines the codepage for the conversion. The current codepage can be applied by calling std::locale::global(std::locale("")); before any call to narrow(...) or widen(...).
 
One possible problem with this code is the use of multi-byte character sets. The predefined size of the function output strings expects a 1:1 relationship in
size() between the string formats.
I found another option using the code_cvt facet. This code is a "bit" more complex but will also work with MBCSs such as codepage 932 (Japanese). I have tested it with some central european and japanese characters on VS2008:
 
 
#include <sstream>
#include <locale>
#include <string>

template<size_t buf_size = 100>
class cp_converter {
    const std::locale loc;
public:
    cp_converter(const std::locale& loc) :
        loc(loc)
    {
    }
    std::wstring widen(const std::string& in) {
        return convert<char, wchar_t>(in);
    }
    std::string narrow(const std::wstring& in) {
        return convert<wchar_t, char>(in);
    }
private:
    typedef std::codecvt<wchar_t, char, mbstate_t> codecvt_facet;
 
    // widen
    inline codecvt_facet::result cv(
        const codecvt_facet& facet,
        mbstate_t& s,
        const char* f1, const char* l1, const char*& n1,
        wchar_t* f2, wchar_t* l2, wchar_t*& n2) const
    {
        return facet.in(s, f1, l1, n1, f2, l2, n2);
    }
 
    // narrow
    inline codecvt_facet::result cv(
        const codecvt_facet& facet,
        mbstate_t& s,
        const wchar_t* f1, const wchar_t* l1, const wchar_t*& n1,
        char* f2, char* l2, char*& n2) const
    {
        return facet.out(s, f1, l1, n1, f2, l2, n2);
    }
    template<class ct_in, class ct_out>
    std::basic_string<ct_out> convert(const std::basic_string<ct_in>& in)
    {
        using namespace std;
        const codecvt_facet& facet = use_facet<codecvt_facet>(loc);
        basic_stringstream<ct_out> os;
        ct_out buf[buf_size];
        mbstate_t state = {0};
        codecvt_facet::result result;
        const ct_in* ipc = &in[0];
        do {
            ct_out* opc = 0;
            result = cv(facet, state,
                ipc, &in[0] + in.size(), ipc,
                buf, buf + buf_size, opc);
            os << basic_string<ct_out>(buf, opc - buf);
        } while ((ipc < &in[0] + in.size()) && (result != codecvt_facet::error));
        if (codecvt_facet::ok != result) throw std::exception("result is not ok!");
        return os.str();
    }
};
 
In order to use the class template, create an object from it with the locale you want to use. The
widen(...) member will the convert text from the selected locale/code page to std::wstring. The narrow(...) will convert wide characters to a std::string in the selected locale/code page.
 
cp_converter<> conv_polish(std::locale("Polish"));
 
// test LATIN SMALL LETTER D WITH STROKE
assert(conv_polish.widen("\xF0") == L"\x0111");
 
// When a wide character is converted to a code page where it is not represented, 
// it is not possible to control the outcome manually
cp_converter<> conv_english(std::locale("English"));
// VC++ 2008 will return LATIN SMALL LETTER D wich is close match.
assert(conv_english.narrow(L"\x0111") == "\x64");
 
Important: Setting
buf_size to odd values (e.g. 99) may result in buffer overflows in buf. It seems that VC++ 2008 disregards the value of l2 in facet.out(...) if a wchar_t is expanded to more than one char and this happens to be at the last byte of the buffer. As mcbs windows code pages are limited (that is just a guess, correct me if i'm wrong) to 1 or 2 bytes for each character, it should be safe to set buf_size to any even number.
 
Important II: The C++ standard does not state what the native encoding should be - neither for
char nor for wchar_t. It just happens to be ANSI and UTF-16 in VC++ on Windows. Other C++ compilers may be standard compliant but using other encodings. Then this tip would still convert encodings but the result won't be as expected.

 

Ask any questions by MSN: ucancode@hotmail.com Yahoo: ucan_code@yahoo.com


 

Copyright ?1998-2011 UCanCode.Net Software , all rights reserved.
Other product and company names herein may be the trademarks of their respective owners.

Please direct your questions or comments to webmaster@ucancode.net