[Home] [Downloads] [Search] [Help/forum]

Gammon Forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  Python
. . -> [Subject]  Unicode Encoding issues with world.WindowTextWidth
Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?

Unicode Encoding issues with world.WindowTextWidth

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Mr.lundmark   (46 posts)  [Biography] bio
Date Thu 14 Oct 2010 06:41 PM (UTC)

Amended on Thu 14 Oct 2010 06:44 PM (UTC) by Mr.lundmark

Message
Hi.

Calling:
world.WindowTextWidth() with a string that's a with a circle above (this stupid forum won't even accept it, wtf?) will not work since that will return -3 because of bad utf-8 format.

Trying to do encode on it will give me this error instead:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 58: ordinal not in range()
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #1 on Thu 14 Oct 2010 08:44 PM (UTC)

Amended on Thu 14 Oct 2010 08:45 PM (UTC) by Nick Gammon

Message
You can't put Unicode characters directly into strings, you have to UTF-8 encode them.

eg.


print (utils.tohex (utils.utf8encode (0xe5)))   --> C3A5

t = utils.utf8decode (utils.fromhex ("C3A5"))
print (t [1])  --> 229  (which is 0xE5)


So to display the a-with-a-circle (sorry about the forum problem) this works:



win = "test_" .. GetPluginID () 

WindowCreate (win, 0, 0, 200, 200, miniwin.pos_center_all, 0, ColourNameToRGB("white"))  -- create window
WindowShow (win,  true)  -- show it 

WindowFont (win, "f", "Trebuchet MS", 14, true, false, false, false) -- define font

s = "Test<" .. utils.fromhex ("C3A5") .. ">"

width   = WindowTextWidth (win, "f", s, true)  -- width of text

print (width)  --> 71 pixels

WindowText (win, "f", 
                s,   -- text
                5, 20, 0, 0,        -- rectangle
                ColourNameToRGB ("darkgreen"), -- colour
                true)              -- Unicode



- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #2 on Thu 14 Oct 2010 10:11 PM (UTC)

Amended on Fri 15 Oct 2010 06:26 AM (UTC) by Nick Gammon

Message
Alternatively, if you only want to display characters in the range 0x00 to 0xFF don't use Unicode mode. This works for your test character:


win = "test_" .. GetPluginID () 

WindowCreate (win, 0, 0, 200, 200, miniwin.pos_center_all, 0, ColourNameToRGB("white"))  -- create window
WindowShow (win,  true)  -- show it 

WindowFont (win, "f", "Trebuchet MS", 14, true, false, false, false) -- define font

s = "Test<" .. string.char (0xe5) .. ">"

width   = WindowTextWidth (win, "f", s, false)  -- width of text

print (width)  --> 71 pixels

WindowText (win, "f", 
                s,   -- text
                5, 20, 0, 0,        -- rectangle
                ColourNameToRGB ("darkgreen"), -- colour
                false)              -- not Unicode


It's only for characters with a code point of 0x100 (256) upwards you need to use UTF-8 encoding.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Mr.lundmark   (46 posts)  [Biography] bio
Date Reply #3 on Fri 15 Oct 2010 11:47 AM (UTC)
Message
Ah great. My issue though is that I'm matching text from the world so actually I get the unicode-string from a trigger. Is there anyway to decode it properly so that the WindowTextWidth-method can calculate it? Currently I have to do a replace on all the letters that I know cause issues, which is both time consuming and ugly.

Thanks for the fast answers!
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #4 on Fri 15 Oct 2010 09:22 PM (UTC)
Message
When you say "Unicode string" do you mean the MUD is sending UTF-8? Or just characters in the range 0x80 to 0xFF? They aren't really Unicode because Unicode and ASCII only share the values 0x00 to 0x7F. After that it has to be UTF-8 encoded (or sent as two bytes per character, or some other method).

But it they are just sending stuff like 0xE5 for the a-with-a-circle-on-top character, just tell WindowTextWidth that it isn't Unicode. The Unicode argument really means "is it UTF-8?".

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Mr.lundmark   (46 posts)  [Biography] bio
Date Reply #5 on Sat 16 Oct 2010 07:39 AM (UTC)
Message
Yeah it's UTF-8. When I try to send it to WindowTextWidth with the unicode argument to false, it says that it can't decode those. I think that the string received from a trigger is actually in a python-unicode format? (u"blah" instead of "blah") I can't do an utf-8 decode on that because it complains about the 0x80 to 0xff range.
[Go to top] top

Posted by Worstje   Netherlands  (899 posts)  [Biography] bio
Date Reply #6 on Sat 16 Oct 2010 02:32 PM (UTC)
Message
An u"Something" string is not in any specific encoding. It merely represents unicode codepoints. If you want it as UTF8, which is a way to represent unicode codepoints, you'll want to .encode to UTF-8. Not decode, as that applies to a normal "" string which is in a certain encoding.
[Go to top] top

Posted by Mr.lundmark   (46 posts)  [Biography] bio
Date Reply #7 on Sat 16 Oct 2010 02:57 PM (UTC)
Message
That works perfectly Worstje, thanks!
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


7,994 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at FutureQuest]