How to load file contained non-latin chars?

About Monkey 2 Forums Monkey 2 Programming Help How to load file contained non-latin chars?

This topic contains 9 replies, has 4 voices, and was last updated by  nerobot 1 year, 4 months ago.

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #11886

    nerobot
    Participant

    How to load file contained non-latin chars?

    It’s very old problem I faced since blitzmax.

    In monkey2: libc.stat( path,Varptr st ) returns -1, and it means error.

    But why it happen? And how to fix that?

    I’m on windows 10 and have Cyrillic chars.

    #11890

    Mark Sibly
    Keymaster

    Basically, the libc module (and more…) needs to be rewritten for Windows to use the ‘wide char’ versions of various OS filesystem APIs.

    This is not a problem on any other OSes because they all very sensibly use utf-8 c strings for all APIs, which is backwardly compatible with old school ascii and means I can use ‘standard c’ APIs everywhere – except Windows!

    It’ll happen eventually…

    #11941

    nerobot
    Participant

    Searching solution I found _wfopen() method (brother of fopen() ), but it requires wchar_t * , and I can’t to convert monkey’s string into wchar_t Ptr .

    I gave up..

    #11970

    scurty
    Participant

    I think I’m onto something. x’D

    I’ve at least converted the strings to wchar_t.

    (Libc)ore.monkey2

    Not sure if this works as it’s coded. It’s kinda hacky and probably doesn’t work. Lol. Almost there… Darn.

    #11986

    nerobot
    Participant

    Yes, it doesn’t work. 🙁

    Waiting for Mark’s implementation of utf-8. 🙂

    #11987

    DruggedBunny
    Participant

    I’m not sure it would need rewritten entirely — I read this web site a while back, and the recommendation seems to be “just convert to/from WCHAR” only at the point where the Win32 API requires it:

    utf8everywhere.org (Should jump to “Our Conclusions”.)

     

    Portability, cross-platform interoperability and simplicity are more important than interoperability with existing platform APIs. So, the best approach is to use UTF-8 narrow strings everywhere and convert them back and forth when using platform APIs that don’t support UTF-8 and accept wide strings (e.g. Windows API). Performance is seldom an issue of any relevance when dealing with string-accepting system APIs (e.g. UI code and file system APIs), and there is a great advantage to using the same encoding everywhere else in the application, so we see no sufficient reason to do otherwise.

     

    #11992

    Mark Sibly
    Keymaster

    “just convert to/from WCHAR” only at the point where the Win32 API requires it

    This is the basic idea but it’s not trivial!

    I should have it mostly done now though – just pushed to develop branch so feel free to check it out.

    There is also a minor chance that I’ve borked something in the process, as this stuff is used in lots of places. Ted2go is a pretty good test of libc/filesystem stuff and that seems to be going OK but still…use with caution.

    Also found this in the process:

    http://utf8everywhere.org/

    A pretty good read on all the issues, and I agree with about 97% of it I think…

    #11994

    DruggedBunny
    Participant

    That’s the web site I linked!

    (Yeah, no doubt much more complex to implement than it appears anyway.)

    #11995

    Mark Sibly
    Keymaster

    That’s the web site I linked!

    That’ll teach me to skim read before coffee at 7.00 am…

    And the wchar stuff isn’t all that complex to implement, you just need to be really careful of 2 huge C bogey-men – buffer over-runs and memory leaks!

    #12001

    nerobot
    Participant

    I should have it mostly done now though – just pushed to develop branch so feel free to check it out.

    Very nice news! It works for me.

    And my modest $15 sent to this gentleman! 🙂

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.