[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  International
. . -> [Subject]  Full Unicode support

Full Unicode support

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1  2  3  4 5  

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #45 on Mon 02 Jun 2008 11:30 PM (UTC)
Message
I just use alt+left shift to switch between languages.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #46 on Tue 03 Jun 2008 12:36 AM (UTC)

Amended on Sun 05 Jun 2011 09:39 PM (UTC) by Nick Gammon

Message
Ah OK I get the picture now. You type "say" in EN mode, switch to RL or whatever and type what you want to say in Russian? It all becomes clearer now. :)

I think this plugin below might help. The basic problem is to get the input window to send Unicode, which it isn't designed to do.

What this plugin does is take the code-page characters, and turn them into UTF-8 for sending. It builds up a table on-the-fly from the conversion downloaded from www.unicode.org. In this particular case I used the 1251 code page (Cyrillic) however the general idea could be used for any code page, as you just access the correct table from www.unicode.org.

For example, taking the entry for:


0xC0 0x0410 #CYRILLIC CAPITAL LETTER A


The script parses the line, and extracts the 0xC0 and 0x0410. The 0xC0 is turned into a single byte which is the key of a table entry (the entry if you type "CYRILLIC CAPITAL LETTER A" on the keyboard). Then the 0x0410 is converted into a UTF-8 sequence by calling utils.utf8encode. This is the value that 0xC0 "maps to". In this case it is 0xD0 0x90.

Now we are ready to roll. The plugin then intercepts all text sent to the MUD by using OnPluginSend. It does a table lookup to convert the code-page values into UTF-8. The original text is dropped (by returning false) and the new text is sent instead.

To make this work, you need to configure Windows to display the correct code page, by using Control Panel -> Regional and Language Options -> Advanced. Set the "Language for non-Unicode programs" (such as MUSHclient) to the appropriate language (I used Russian for my test).

Now, when you enable the keyboard to be Russian mode (Alt+Left-Shift), as you type you see Russian characters in the input box. With the plugin installed they are converted to UTF-8 on their way out to the MUD.

To see them correctly displayed on the way back (eg. when you say something and the MUD echoes the said text), you need to check the UTF-8 (Unicode) check box in the Output window configuration.

Copy between the lines below and save this text as Translate_Unicode.xml - then use File -> Plugins to install it as a MUSHclient plugin. For a different language than Russian simply find the correct table from the Unicode web site.


<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>
<!-- Saved on Tuesday, June 03, 2008, 10:16 AM -->
<!-- MuClient version 4.25 -->

<!-- Plugin "Translate_Unicode" generated by Plugin Wizard -->

<muclient>
<plugin
   name="Translate_Unicode_RU"
   author="Nick Gammon"
   id="bb1c8d004c596b19748fc66c"
   language="Lua"
   purpose="Translate sent text into UTF-8 (for Russian)"
   date_written="2008-06-03 10:11:10"
   date_modified="2008-06-04 13:20:00"
   requires="4.25"
   version="1.1"
   >

</plugin>


<!--  Script  -->


<script>
<![CDATA[
-- see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

--  <------------- replace here for other languages ------------->
conversion = [[
0x80	0x0402	#CYRILLIC CAPITAL LETTER DJE
0x81	0x0403	#CYRILLIC CAPITAL LETTER GJE
0x82	0x201A	#SINGLE LOW-9 QUOTATION MARK
0x83	0x0453	#CYRILLIC SMALL LETTER GJE
0x84	0x201E	#DOUBLE LOW-9 QUOTATION MARK
0x85	0x2026	#HORIZONTAL ELLIPSIS
0x86	0x2020	#DAGGER
0x87	0x2021	#DOUBLE DAGGER
0x88	0x20AC	#EURO SIGN
0x89	0x2030	#PER MILLE SIGN
0x8A	0x0409	#CYRILLIC CAPITAL LETTER LJE
0x8B	0x2039	#SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C	0x040A	#CYRILLIC CAPITAL LETTER NJE
0x8D	0x040C	#CYRILLIC CAPITAL LETTER KJE
0x8E	0x040B	#CYRILLIC CAPITAL LETTER TSHE
0x8F	0x040F	#CYRILLIC CAPITAL LETTER DZHE
0x90	0x0452	#CYRILLIC SMALL LETTER DJE
0x91	0x2018	#LEFT SINGLE QUOTATION MARK
0x92	0x2019	#RIGHT SINGLE QUOTATION MARK
0x93	0x201C	#LEFT DOUBLE QUOTATION MARK
0x94	0x201D	#RIGHT DOUBLE QUOTATION MARK
0x95	0x2022	#BULLET
0x96	0x2013	#EN DASH
0x97	0x2014	#EM DASH
0x98	      	#UNDEFINED
0x99	0x2122	#TRADE MARK SIGN
0x9A	0x0459	#CYRILLIC SMALL LETTER LJE
0x9B	0x203A	#SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C	0x045A	#CYRILLIC SMALL LETTER NJE
0x9D	0x045C	#CYRILLIC SMALL LETTER KJE
0x9E	0x045B	#CYRILLIC SMALL LETTER TSHE
0x9F	0x045F	#CYRILLIC SMALL LETTER DZHE
0xA0	0x00A0	#NO-BREAK SPACE
0xA1	0x040E	#CYRILLIC CAPITAL LETTER SHORT U
0xA2	0x045E	#CYRILLIC SMALL LETTER SHORT U
0xA3	0x0408	#CYRILLIC CAPITAL LETTER JE
0xA4	0x00A4	#CURRENCY SIGN
0xA5	0x0490	#CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0xA6	0x00A6	#BROKEN BAR
0xA7	0x00A7	#SECTION SIGN
0xA8	0x0401	#CYRILLIC CAPITAL LETTER IO
0xA9	0x00A9	#COPYRIGHT SIGN
0xAA	0x0404	#CYRILLIC CAPITAL LETTER UKRAINIAN IE
0xAB	0x00AB	#LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC	0x00AC	#NOT SIGN
0xAD	0x00AD	#SOFT HYPHEN
0xAE	0x00AE	#REGISTERED SIGN
0xAF	0x0407	#CYRILLIC CAPITAL LETTER YI
0xB0	0x00B0	#DEGREE SIGN
0xB1	0x00B1	#PLUS-MINUS SIGN
0xB2	0x0406	#CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0xB3	0x0456	#CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0xB4	0x0491	#CYRILLIC SMALL LETTER GHE WITH UPTURN
0xB5	0x00B5	#MICRO SIGN
0xB6	0x00B6	#PILCROW SIGN
0xB7	0x00B7	#MIDDLE DOT
0xB8	0x0451	#CYRILLIC SMALL LETTER IO
0xB9	0x2116	#NUMERO SIGN
0xBA	0x0454	#CYRILLIC SMALL LETTER UKRAINIAN IE
0xBB	0x00BB	#RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC	0x0458	#CYRILLIC SMALL LETTER JE
0xBD	0x0405	#CYRILLIC CAPITAL LETTER DZE
0xBE	0x0455	#CYRILLIC SMALL LETTER DZE
0xBF	0x0457	#CYRILLIC SMALL LETTER YI
0xC0	0x0410	#CYRILLIC CAPITAL LETTER A
0xC1	0x0411	#CYRILLIC CAPITAL LETTER BE
0xC2	0x0412	#CYRILLIC CAPITAL LETTER VE
0xC3	0x0413	#CYRILLIC CAPITAL LETTER GHE
0xC4	0x0414	#CYRILLIC CAPITAL LETTER DE
0xC5	0x0415	#CYRILLIC CAPITAL LETTER IE
0xC6	0x0416	#CYRILLIC CAPITAL LETTER ZHE
0xC7	0x0417	#CYRILLIC CAPITAL LETTER ZE
0xC8	0x0418	#CYRILLIC CAPITAL LETTER I
0xC9	0x0419	#CYRILLIC CAPITAL LETTER SHORT I
0xCA	0x041A	#CYRILLIC CAPITAL LETTER KA
0xCB	0x041B	#CYRILLIC CAPITAL LETTER EL
0xCC	0x041C	#CYRILLIC CAPITAL LETTER EM
0xCD	0x041D	#CYRILLIC CAPITAL LETTER EN
0xCE	0x041E	#CYRILLIC CAPITAL LETTER O
0xCF	0x041F	#CYRILLIC CAPITAL LETTER PE
0xD0	0x0420	#CYRILLIC CAPITAL LETTER ER
0xD1	0x0421	#CYRILLIC CAPITAL LETTER ES
0xD2	0x0422	#CYRILLIC CAPITAL LETTER TE
0xD3	0x0423	#CYRILLIC CAPITAL LETTER U
0xD4	0x0424	#CYRILLIC CAPITAL LETTER EF
0xD5	0x0425	#CYRILLIC CAPITAL LETTER HA
0xD6	0x0426	#CYRILLIC CAPITAL LETTER TSE
0xD7	0x0427	#CYRILLIC CAPITAL LETTER CHE
0xD8	0x0428	#CYRILLIC CAPITAL LETTER SHA
0xD9	0x0429	#CYRILLIC CAPITAL LETTER SHCHA
0xDA	0x042A	#CYRILLIC CAPITAL LETTER HARD SIGN
0xDB	0x042B	#CYRILLIC CAPITAL LETTER YERU
0xDC	0x042C	#CYRILLIC CAPITAL LETTER SOFT SIGN
0xDD	0x042D	#CYRILLIC CAPITAL LETTER E
0xDE	0x042E	#CYRILLIC CAPITAL LETTER YU
0xDF	0x042F	#CYRILLIC CAPITAL LETTER YA
0xE0	0x0430	#CYRILLIC SMALL LETTER A
0xE1	0x0431	#CYRILLIC SMALL LETTER BE
0xE2	0x0432	#CYRILLIC SMALL LETTER VE
0xE3	0x0433	#CYRILLIC SMALL LETTER GHE
0xE4	0x0434	#CYRILLIC SMALL LETTER DE
0xE5	0x0435	#CYRILLIC SMALL LETTER IE
0xE6	0x0436	#CYRILLIC SMALL LETTER ZHE
0xE7	0x0437	#CYRILLIC SMALL LETTER ZE
0xE8	0x0438	#CYRILLIC SMALL LETTER I
0xE9	0x0439	#CYRILLIC SMALL LETTER SHORT I
0xEA	0x043A	#CYRILLIC SMALL LETTER KA
0xEB	0x043B	#CYRILLIC SMALL LETTER EL
0xEC	0x043C	#CYRILLIC SMALL LETTER EM
0xED	0x043D	#CYRILLIC SMALL LETTER EN
0xEE	0x043E	#CYRILLIC SMALL LETTER O
0xEF	0x043F	#CYRILLIC SMALL LETTER PE
0xF0	0x0440	#CYRILLIC SMALL LETTER ER
0xF1	0x0441	#CYRILLIC SMALL LETTER ES
0xF2	0x0442	#CYRILLIC SMALL LETTER TE
0xF3	0x0443	#CYRILLIC SMALL LETTER U
0xF4	0x0444	#CYRILLIC SMALL LETTER EF
0xF5	0x0445	#CYRILLIC SMALL LETTER HA
0xF6	0x0446	#CYRILLIC SMALL LETTER TSE
0xF7	0x0447	#CYRILLIC SMALL LETTER CHE
0xF8	0x0448	#CYRILLIC SMALL LETTER SHA
0xF9	0x0449	#CYRILLIC SMALL LETTER SHCHA
0xFA	0x044A	#CYRILLIC SMALL LETTER HARD SIGN
0xFB	0x044B	#CYRILLIC SMALL LETTER YERU
0xFC	0x044C	#CYRILLIC SMALL LETTER SOFT SIGN
0xFD	0x044D	#CYRILLIC SMALL LETTER E
0xFE	0x044E	#CYRILLIC SMALL LETTER YU
0xFF	0x044F	#CYRILLIC SMALL LETTER YA
]]
--  <------------- end of part to be replaced for other languages ------------->


-- convert from above code page into UTF-8

unicode_table = {}

function OnPluginInstall ()

  require "getlines"

  for line in getlines (conversion) do
    from, to = string.match (line, "^0x(%x+)%s+0x(%x+)")
    if from and to then
      from = tonumber (from, 16)  -- convert from hex to decimal
      to = tonumber (to, 16)  -- ditto
      unicode_table [string.char (from)] = utils.utf8encode (to)
    else  -- look for an undefined code point
      from = string.match (string.lower (line), "^0x(%x+)%s+%#undefined")
      if from then
        from = tonumber (from, 16)  -- convert from hex to decimal
        unicode_table [string.char (from)] = "?"  -- don't want bad UTF-8
      end -- if undefined code
    end -- if found
  end -- for
  
  ColourNote ("white", "green", 
              GetPluginInfo (GetPluginID (), 1) .. " plugin installed")
  
end -- OnPluginInstall 


-- replace bytes with high-order bit set with UTF-8 equivalents

function OnPluginSend (sText)
  Send ((string.gsub (sText, "[\128-\255]", unicode_table)))
  return false
end -- OnPluginSend 


]]>
</script>

</muclient>

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #47 on Tue 03 Jun 2008 12:59 AM (UTC)
Message
Neat, thanks! (Though while it's not a complete fix, any sort of progress is progress :D)
I'm not actually Russian. (though I do know a bit of it) For an all-Russian MUD I'd imagine people would at least attempt to change commands to be more native.

On a game that I work on we've been looking at using UTF8 for various stuff and have a working implementation of it MUD-side, but many of us use MUSHclient so it's a bit tricky. :)

Have you taken a look at Uniscribe? http://www.microsoft.com/typography/developers/uniscribe/default.htm

Sadly, I have no GUI experience to speak of, so I'm not really of much help.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #48 on Tue 03 Jun 2008 01:19 AM (UTC)
Message
I hadn't read it but am doing so now.

Internationalization of MUDs is somewhat of a tricky issue - I had to recompile SMAUG to even test my code, as it assumed that every character over 0x7F should be discarded.

Anyway, my suggested solution has raised its own issues. It seems that once you turn on UTF-8 in the Output display configuration, any text entered into the command window, other than straight ASCII (ie. less than 0x80) then fails alias processing (if you have any aliases at all), with a message about "Error execution regular expression: Bad UTF8".

I hadn't noticed this before, because I normally type English text.

It seems I have to release a new version of MUSHclient that doesn't both attempting to match UTF-8 in the *command* window (that is, aliases), because it won't be UTF-8, it will be text localized to a particular code page.

There is also an issue of copying and pasting from the output window to the command window (eg. copying some text to echo it, or a player's name) - if the text in the output window is UTF-8 then it turns into gibberish in the command window. So, the conversion from code page to UTF-8 has to be reversed when copying and pasting.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #49 on Tue 03 Jun 2008 01:21 AM (UTC)

Amended on Wed 04 Jun 2008 02:50 AM (UTC) by Nick Gammon

Message
The plugin below lets you copy selected text from the output window, if it is in UTF-8 format, and converts it back to the appropriate code-page symbols.


<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>
<!-- Saved on vrijdag, augustus 03, 2007, 2:06  -->
<!-- MuClient version 4.14 -->

<!-- Plugin "CopyScript" generated by Plugin Wizard -->

<!-- Amended slightly by Nick Gammon, from Worstje's version, on 17 Feb 2008 -->

<!-- Also amended on 3rd June 2008 to convert UTF-8 back to non-UTF-8 for use in the command window -->


<muclient>
<plugin
   name="Unicode_Copy_Output"
   author="Worstje"
   id="29ce226131a0af3140c35141"
   language="Lua"
   purpose="Allows you to use CTRL+C for the output window if 'All typing goes to command window' is turned on."
   save_state="n"
   date_written="2007-08-03 02:04:12"
   requires="4.00"
   version="2.0"
   >

</plugin>

<aliases>
  <alias
    match="^Copy_Output:Copy:29ce226131a0af3140c35141$"
    enabled="y"
    regexp="y"
    omit_from_output="y"
    sequence="100"
    script="CopyScript"
  >
  </alias>
</aliases>


<!--  Script  -->

<script>
<![CDATA[

-- THIS VERSION CONVERTS UTF-8 back to code-page text
-- See: http://www.gammon.com.au/forum/?id=2681&page=4

-- Thank you, Shaun Biggs, for taking your time to write the CopyScript
-- (formerly Copy2) function below. It was slightly altered by me to suit
-- my usage (wordwrapped lines and no \r\n at start of selection).

-- See forum: http://www.gammon.com.au/forum/?id=8052


-- see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

--  <------------- replace here for other languages ------------->
conversion = [[
0x80	0x0402	#CYRILLIC CAPITAL LETTER DJE
0x81	0x0403	#CYRILLIC CAPITAL LETTER GJE
0x82	0x201A	#SINGLE LOW-9 QUOTATION MARK
0x83	0x0453	#CYRILLIC SMALL LETTER GJE
0x84	0x201E	#DOUBLE LOW-9 QUOTATION MARK
0x85	0x2026	#HORIZONTAL ELLIPSIS
0x86	0x2020	#DAGGER
0x87	0x2021	#DOUBLE DAGGER
0x88	0x20AC	#EURO SIGN
0x89	0x2030	#PER MILLE SIGN
0x8A	0x0409	#CYRILLIC CAPITAL LETTER LJE
0x8B	0x2039	#SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C	0x040A	#CYRILLIC CAPITAL LETTER NJE
0x8D	0x040C	#CYRILLIC CAPITAL LETTER KJE
0x8E	0x040B	#CYRILLIC CAPITAL LETTER TSHE
0x8F	0x040F	#CYRILLIC CAPITAL LETTER DZHE
0x90	0x0452	#CYRILLIC SMALL LETTER DJE
0x91	0x2018	#LEFT SINGLE QUOTATION MARK
0x92	0x2019	#RIGHT SINGLE QUOTATION MARK
0x93	0x201C	#LEFT DOUBLE QUOTATION MARK
0x94	0x201D	#RIGHT DOUBLE QUOTATION MARK
0x95	0x2022	#BULLET
0x96	0x2013	#EN DASH
0x97	0x2014	#EM DASH
0x98	      	#UNDEFINED
0x99	0x2122	#TRADE MARK SIGN
0x9A	0x0459	#CYRILLIC SMALL LETTER LJE
0x9B	0x203A	#SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C	0x045A	#CYRILLIC SMALL LETTER NJE
0x9D	0x045C	#CYRILLIC SMALL LETTER KJE
0x9E	0x045B	#CYRILLIC SMALL LETTER TSHE
0x9F	0x045F	#CYRILLIC SMALL LETTER DZHE
0xA0	0x00A0	#NO-BREAK SPACE
0xA1	0x040E	#CYRILLIC CAPITAL LETTER SHORT U
0xA2	0x045E	#CYRILLIC SMALL LETTER SHORT U
0xA3	0x0408	#CYRILLIC CAPITAL LETTER JE
0xA4	0x00A4	#CURRENCY SIGN
0xA5	0x0490	#CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0xA6	0x00A6	#BROKEN BAR
0xA7	0x00A7	#SECTION SIGN
0xA8	0x0401	#CYRILLIC CAPITAL LETTER IO
0xA9	0x00A9	#COPYRIGHT SIGN
0xAA	0x0404	#CYRILLIC CAPITAL LETTER UKRAINIAN IE
0xAB	0x00AB	#LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC	0x00AC	#NOT SIGN
0xAD	0x00AD	#SOFT HYPHEN
0xAE	0x00AE	#REGISTERED SIGN
0xAF	0x0407	#CYRILLIC CAPITAL LETTER YI
0xB0	0x00B0	#DEGREE SIGN
0xB1	0x00B1	#PLUS-MINUS SIGN
0xB2	0x0406	#CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0xB3	0x0456	#CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0xB4	0x0491	#CYRILLIC SMALL LETTER GHE WITH UPTURN
0xB5	0x00B5	#MICRO SIGN
0xB6	0x00B6	#PILCROW SIGN
0xB7	0x00B7	#MIDDLE DOT
0xB8	0x0451	#CYRILLIC SMALL LETTER IO
0xB9	0x2116	#NUMERO SIGN
0xBA	0x0454	#CYRILLIC SMALL LETTER UKRAINIAN IE
0xBB	0x00BB	#RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC	0x0458	#CYRILLIC SMALL LETTER JE
0xBD	0x0405	#CYRILLIC CAPITAL LETTER DZE
0xBE	0x0455	#CYRILLIC SMALL LETTER DZE
0xBF	0x0457	#CYRILLIC SMALL LETTER YI
0xC0	0x0410	#CYRILLIC CAPITAL LETTER A
0xC1	0x0411	#CYRILLIC CAPITAL LETTER BE
0xC2	0x0412	#CYRILLIC CAPITAL LETTER VE
0xC3	0x0413	#CYRILLIC CAPITAL LETTER GHE
0xC4	0x0414	#CYRILLIC CAPITAL LETTER DE
0xC5	0x0415	#CYRILLIC CAPITAL LETTER IE
0xC6	0x0416	#CYRILLIC CAPITAL LETTER ZHE
0xC7	0x0417	#CYRILLIC CAPITAL LETTER ZE
0xC8	0x0418	#CYRILLIC CAPITAL LETTER I
0xC9	0x0419	#CYRILLIC CAPITAL LETTER SHORT I
0xCA	0x041A	#CYRILLIC CAPITAL LETTER KA
0xCB	0x041B	#CYRILLIC CAPITAL LETTER EL
0xCC	0x041C	#CYRILLIC CAPITAL LETTER EM
0xCD	0x041D	#CYRILLIC CAPITAL LETTER EN
0xCE	0x041E	#CYRILLIC CAPITAL LETTER O
0xCF	0x041F	#CYRILLIC CAPITAL LETTER PE
0xD0	0x0420	#CYRILLIC CAPITAL LETTER ER
0xD1	0x0421	#CYRILLIC CAPITAL LETTER ES
0xD2	0x0422	#CYRILLIC CAPITAL LETTER TE
0xD3	0x0423	#CYRILLIC CAPITAL LETTER U
0xD4	0x0424	#CYRILLIC CAPITAL LETTER EF
0xD5	0x0425	#CYRILLIC CAPITAL LETTER HA
0xD6	0x0426	#CYRILLIC CAPITAL LETTER TSE
0xD7	0x0427	#CYRILLIC CAPITAL LETTER CHE
0xD8	0x0428	#CYRILLIC CAPITAL LETTER SHA
0xD9	0x0429	#CYRILLIC CAPITAL LETTER SHCHA
0xDA	0x042A	#CYRILLIC CAPITAL LETTER HARD SIGN
0xDB	0x042B	#CYRILLIC CAPITAL LETTER YERU
0xDC	0x042C	#CYRILLIC CAPITAL LETTER SOFT SIGN
0xDD	0x042D	#CYRILLIC CAPITAL LETTER E
0xDE	0x042E	#CYRILLIC CAPITAL LETTER YU
0xDF	0x042F	#CYRILLIC CAPITAL LETTER YA
0xE0	0x0430	#CYRILLIC SMALL LETTER A
0xE1	0x0431	#CYRILLIC SMALL LETTER BE
0xE2	0x0432	#CYRILLIC SMALL LETTER VE
0xE3	0x0433	#CYRILLIC SMALL LETTER GHE
0xE4	0x0434	#CYRILLIC SMALL LETTER DE
0xE5	0x0435	#CYRILLIC SMALL LETTER IE
0xE6	0x0436	#CYRILLIC SMALL LETTER ZHE
0xE7	0x0437	#CYRILLIC SMALL LETTER ZE
0xE8	0x0438	#CYRILLIC SMALL LETTER I
0xE9	0x0439	#CYRILLIC SMALL LETTER SHORT I
0xEA	0x043A	#CYRILLIC SMALL LETTER KA
0xEB	0x043B	#CYRILLIC SMALL LETTER EL
0xEC	0x043C	#CYRILLIC SMALL LETTER EM
0xED	0x043D	#CYRILLIC SMALL LETTER EN
0xEE	0x043E	#CYRILLIC SMALL LETTER O
0xEF	0x043F	#CYRILLIC SMALL LETTER PE
0xF0	0x0440	#CYRILLIC SMALL LETTER ER
0xF1	0x0441	#CYRILLIC SMALL LETTER ES
0xF2	0x0442	#CYRILLIC SMALL LETTER TE
0xF3	0x0443	#CYRILLIC SMALL LETTER U
0xF4	0x0444	#CYRILLIC SMALL LETTER EF
0xF5	0x0445	#CYRILLIC SMALL LETTER HA
0xF6	0x0446	#CYRILLIC SMALL LETTER TSE
0xF7	0x0447	#CYRILLIC SMALL LETTER CHE
0xF8	0x0448	#CYRILLIC SMALL LETTER SHA
0xF9	0x0449	#CYRILLIC SMALL LETTER SHCHA
0xFA	0x044A	#CYRILLIC SMALL LETTER HARD SIGN
0xFB	0x044B	#CYRILLIC SMALL LETTER YERU
0xFC	0x044C	#CYRILLIC SMALL LETTER SOFT SIGN
0xFD	0x044D	#CYRILLIC SMALL LETTER E
0xFE	0x044E	#CYRILLIC SMALL LETTER YU
0xFF	0x044F	#CYRILLIC SMALL LETTER YA
]]
--  <------------- end of part to be replaced for other languages ------------->


-- convert from above code page into UTF-8

unicode_table = {}

function OnPluginInstall ()

  require "getlines"
 
  for line in getlines (conversion) do
    from, to = string.match (line, "^0x(%x+)%s+0x(%x+)")
    if from and to then
      from = tonumber (from, 16)  -- convert from hex to decimal
      to = tonumber (to, 16)  -- ditto
      unicode_table [utils.utf8encode (to)] = string.char (from)
    end -- if found
  end -- for
  
end -- OnPluginInstall 



-- some long alias that no-one will ever want to type
Accelerator ("Ctrl+C", "Copy_Output:Copy:29ce226131a0af3140c35141")

function CopyScript(name, line, wildcs)

  -- find selection in output window, if any
  local first_line, last_line = GetSelectionStartLine(), 
                                math.min (GetSelectionEndLine(), GetLinesInBufferCount ())

  local first_column, last_column = GetSelectionStartColumn(), GetSelectionEndColumn()
  
  -- nothing selected, do normal copy
  if first_line <= 0 then
    DoCommand("copy")
    return
  end -- if nothing to copy from output window
  
  local copystring = ""
  
  -- iterate to build up copy text
  for line = first_line, last_line do
  
    if line < last_line then
      copystring = copystring .. GetLineInfo(line).text:sub (first_column)  -- copy rest of line
      first_column = 1
      
      -- Is this a new line or merely the continuation of a paragraph?
      if GetLineInfo (line, 3) then
        copystring = copystring .. "\r\n"
      end  -- new line
      
    else
      copystring = copystring .. GetLineInfo(line).text:sub (first_column, last_column - 1)
    end -- if
        
  end  -- for loop
  
  -- Get rid of a spurious extra new line at the start.
  if copystring:sub (1, 2) == "\r\n" then
    copystring = copystring:sub (3)
  end   -- if newline at start
  
  -- correct UTF-8
  -- see: http://www.gammon.com.au/forum/?id=2681&page=4

  copystring = string.gsub (copystring, "[\192-\247][\128-\191]+", unicode_table)

  -- finally can set clipboard contents
  SetClipboard(copystring)
  
end -- function CopyScript
]]>
</script>

</muclient>



- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #50 on Tue 03 Jun 2008 01:27 AM (UTC)
Message
Quote:

I'm not actually Russian.


Me neither. ;)

If the general idea works, you should be able to publish customized plugins for the languages which are in use on your MUD. I presume there aren't hundreds.

Basically look up the code-page translations on unicode.org, insert them into the plugin at the appropriate place, and save with some sort of suffix (eg. Translate_Unicode.RU.xml).

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #51 on Tue 03 Jun 2008 01:36 AM (UTC)
Message
I should point out that this is, fairly obviously, a hybrid "work around" solution.

The main problem really is that MUSHclient is not Unicode-enabled, and despite my best efforts previously, it was hard to get Unicode out of the command window.

What you will have here is, if you set the code page appropriately (in International settings), one language you can use in the Command window (presumably your native language), however by enabling UTF-8 in the Output window, all languages can be displayed (with a suitable font).

The proposed "copy from output window" plugin would let you copy your *own* language from the output window, for editing and resending. Copying a different code set would result in gibberish still, because the Command window is not Unicode.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #52 on Tue 03 Jun 2008 04:02 AM (UTC)
Message
Version 4.26 has now been released, which should work properly with copying and pasting from the output window, if you install the above plugin.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Castamir   (2 posts)  [Biography] bio
Date Reply #53 on Tue 03 Jun 2008 08:44 PM (UTC)
Message
Alas, pasting from other programs is still broken, and what's most important, so is typing things from the keyboard.

My knowledge of Windows is sketchy (I'm an Unix guy), but I would tackle the problem the following way:
* no 16-bit internal strings. That's a pain in the rear, and as you said, there's 6k strings you would have to change, not to mention all function calls and what not...
* the input box itself needs to be an Unicode window, though
* when taking data from it, you would call GetWindowTextW() (always!) then WideCharToMultiByte() to CP_UTF8 if mushclient is in UTF-8 mode and to CP_ACP otherwise.
* ... and the same with MultiByteToWideChar() and SetWindowTextW() the other way
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #54 on Tue 03 Jun 2008 11:05 PM (UTC)
Message

You should at least be able to get typing into the window working, as I am no expert but I managed after trying a couple of things.

First, I added a second language as a keyboard option:

Then, I set my code page to Russian, so that when I used my extra keyboard settings, Russian is what I would see:

Now in the bottom-right corner I see a keyboard code (EN) which shows what I type will appear in English:

I type "say " into the command window (which appears as such), and then hit the keyboard modifier to switch to Russian (Alt+Left_Shift which is the default). Now the keyboard code changes:

Now what I type appears in Russian, which is what I want. However when I hit <Enter> to send it, the plugin switches the code page data into UTF-8, which is what arrives at the MUD.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #55 on Tue 03 Jun 2008 11:06 PM (UTC)
Message
Applocale is a much simpler way to change the code page, as an aside.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #56 on Tue 03 Jun 2008 11:10 PM (UTC)
Message
Quote:

the input box itself needs to be an Unicode window, though


Well that is the hard bit. I tried for a few days previously, but I don't think you can make individual windows in a non-Unicode program to be Unicode.

The source code is freely available, if someone can make it work I would be pleased to hear from them.

Quote:

Alas, pasting from other programs is still broken,


To make that work, as I presume the code on the clipboard is UTF-8, would be to make a plugin that does what my "copy from the output window" plugin does. You would hit a function key (eg. F8), the plugin kicks in, grabs the clipboard contents, switches it from UTF-8 to the code page, and puts it back. Or that is the theory at least.

I know this isn't perfect, but my initial tests with Russian seemed to show it worked smoothly enough, providing you didn't try to get too fancy and copy and paste from one application to another.

I suspect the copying/pasting thing might be related to clipboard "types" where the UTF-8 data is not necessarily stored in the TEXT data type (but I could be wrong).

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #57 on Wed 04 Jun 2008 02:46 AM (UTC)
Message
As an example of another language, if you wanted Greek encoding, you could go to this page:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT

Grab the mappings for 0x80 to 0xFF (see below) and paste them into the two plugins instead of the Russian ones:


0x80	0x20AC	#EURO SIGN
0x81	      	#UNDEFINED
0x82	0x201A	#SINGLE LOW-9 QUOTATION MARK
0x83	0x0192	#LATIN SMALL LETTER F WITH HOOK
0x84	0x201E	#DOUBLE LOW-9 QUOTATION MARK
0x85	0x2026	#HORIZONTAL ELLIPSIS
0x86	0x2020	#DAGGER
0x87	0x2021	#DOUBLE DAGGER
0x88	      	#UNDEFINED
0x89	0x2030	#PER MILLE SIGN
0x8A	      	#UNDEFINED
0x8B	0x2039	#SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C	      	#UNDEFINED
0x8D	      	#UNDEFINED
0x8E	      	#UNDEFINED
0x8F	      	#UNDEFINED
0x90	      	#UNDEFINED
0x91	0x2018	#LEFT SINGLE QUOTATION MARK
0x92	0x2019	#RIGHT SINGLE QUOTATION MARK
0x93	0x201C	#LEFT DOUBLE QUOTATION MARK
0x94	0x201D	#RIGHT DOUBLE QUOTATION MARK
0x95	0x2022	#BULLET
0x96	0x2013	#EN DASH
0x97	0x2014	#EM DASH
0x98	      	#UNDEFINED
0x99	0x2122	#TRADE MARK SIGN
0x9A	      	#UNDEFINED
0x9B	0x203A	#SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C	      	#UNDEFINED
0x9D	      	#UNDEFINED
0x9E	      	#UNDEFINED
0x9F	      	#UNDEFINED
0xA0	0x00A0	#NO-BREAK SPACE
0xA1	0x0385	#GREEK DIALYTIKA TONOS
0xA2	0x0386	#GREEK CAPITAL LETTER ALPHA WITH TONOS
0xA3	0x00A3	#POUND SIGN
0xA4	0x00A4	#CURRENCY SIGN
0xA5	0x00A5	#YEN SIGN
0xA6	0x00A6	#BROKEN BAR
0xA7	0x00A7	#SECTION SIGN
0xA8	0x00A8	#DIAERESIS
0xA9	0x00A9	#COPYRIGHT SIGN
0xAA	      	#UNDEFINED
0xAB	0x00AB	#LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC	0x00AC	#NOT SIGN
0xAD	0x00AD	#SOFT HYPHEN
0xAE	0x00AE	#REGISTERED SIGN
0xAF	0x2015	#HORIZONTAL BAR
0xB0	0x00B0	#DEGREE SIGN
0xB1	0x00B1	#PLUS-MINUS SIGN
0xB2	0x00B2	#SUPERSCRIPT TWO
0xB3	0x00B3	#SUPERSCRIPT THREE
0xB4	0x0384	#GREEK TONOS
0xB5	0x00B5	#MICRO SIGN
0xB6	0x00B6	#PILCROW SIGN
0xB7	0x00B7	#MIDDLE DOT
0xB8	0x0388	#GREEK CAPITAL LETTER EPSILON WITH TONOS
0xB9	0x0389	#GREEK CAPITAL LETTER ETA WITH TONOS
0xBA	0x038A	#GREEK CAPITAL LETTER IOTA WITH TONOS
0xBB	0x00BB	#RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC	0x038C	#GREEK CAPITAL LETTER OMICRON WITH TONOS
0xBD	0x00BD	#VULGAR FRACTION ONE HALF
0xBE	0x038E	#GREEK CAPITAL LETTER UPSILON WITH TONOS
0xBF	0x038F	#GREEK CAPITAL LETTER OMEGA WITH TONOS
0xC0	0x0390	#GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0xC1	0x0391	#GREEK CAPITAL LETTER ALPHA
0xC2	0x0392	#GREEK CAPITAL LETTER BETA
0xC3	0x0393	#GREEK CAPITAL LETTER GAMMA
0xC4	0x0394	#GREEK CAPITAL LETTER DELTA
0xC5	0x0395	#GREEK CAPITAL LETTER EPSILON
0xC6	0x0396	#GREEK CAPITAL LETTER ZETA
0xC7	0x0397	#GREEK CAPITAL LETTER ETA
0xC8	0x0398	#GREEK CAPITAL LETTER THETA
0xC9	0x0399	#GREEK CAPITAL LETTER IOTA
0xCA	0x039A	#GREEK CAPITAL LETTER KAPPA
0xCB	0x039B	#GREEK CAPITAL LETTER LAMDA
0xCC	0x039C	#GREEK CAPITAL LETTER MU
0xCD	0x039D	#GREEK CAPITAL LETTER NU
0xCE	0x039E	#GREEK CAPITAL LETTER XI
0xCF	0x039F	#GREEK CAPITAL LETTER OMICRON
0xD0	0x03A0	#GREEK CAPITAL LETTER PI
0xD1	0x03A1	#GREEK CAPITAL LETTER RHO
0xD2	      	#UNDEFINED
0xD3	0x03A3	#GREEK CAPITAL LETTER SIGMA
0xD4	0x03A4	#GREEK CAPITAL LETTER TAU
0xD5	0x03A5	#GREEK CAPITAL LETTER UPSILON
0xD6	0x03A6	#GREEK CAPITAL LETTER PHI
0xD7	0x03A7	#GREEK CAPITAL LETTER CHI
0xD8	0x03A8	#GREEK CAPITAL LETTER PSI
0xD9	0x03A9	#GREEK CAPITAL LETTER OMEGA
0xDA	0x03AA	#GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0xDB	0x03AB	#GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0xDC	0x03AC	#GREEK SMALL LETTER ALPHA WITH TONOS
0xDD	0x03AD	#GREEK SMALL LETTER EPSILON WITH TONOS
0xDE	0x03AE	#GREEK SMALL LETTER ETA WITH TONOS
0xDF	0x03AF	#GREEK SMALL LETTER IOTA WITH TONOS
0xE0	0x03B0	#GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0xE1	0x03B1	#GREEK SMALL LETTER ALPHA
0xE2	0x03B2	#GREEK SMALL LETTER BETA
0xE3	0x03B3	#GREEK SMALL LETTER GAMMA
0xE4	0x03B4	#GREEK SMALL LETTER DELTA
0xE5	0x03B5	#GREEK SMALL LETTER EPSILON
0xE6	0x03B6	#GREEK SMALL LETTER ZETA
0xE7	0x03B7	#GREEK SMALL LETTER ETA
0xE8	0x03B8	#GREEK SMALL LETTER THETA
0xE9	0x03B9	#GREEK SMALL LETTER IOTA
0xEA	0x03BA	#GREEK SMALL LETTER KAPPA
0xEB	0x03BB	#GREEK SMALL LETTER LAMDA
0xEC	0x03BC	#GREEK SMALL LETTER MU
0xED	0x03BD	#GREEK SMALL LETTER NU
0xEE	0x03BE	#GREEK SMALL LETTER XI
0xEF	0x03BF	#GREEK SMALL LETTER OMICRON
0xF0	0x03C0	#GREEK SMALL LETTER PI
0xF1	0x03C1	#GREEK SMALL LETTER RHO
0xF2	0x03C2	#GREEK SMALL LETTER FINAL SIGMA
0xF3	0x03C3	#GREEK SMALL LETTER SIGMA
0xF4	0x03C4	#GREEK SMALL LETTER TAU
0xF5	0x03C5	#GREEK SMALL LETTER UPSILON
0xF6	0x03C6	#GREEK SMALL LETTER PHI
0xF7	0x03C7	#GREEK SMALL LETTER CHI
0xF8	0x03C8	#GREEK SMALL LETTER PSI
0xF9	0x03C9	#GREEK SMALL LETTER OMEGA
0xFA	0x03CA	#GREEK SMALL LETTER IOTA WITH DIALYTIKA
0xFB	0x03CB	#GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0xFC	0x03CC	#GREEK SMALL LETTER OMICRON WITH TONOS
0xFD	0x03CD	#GREEK SMALL LETTER UPSILON WITH TONOS
0xFE	0x03CE	#GREEK SMALL LETTER OMEGA WITH TONOS
0xFF	      	#UNDEFINED

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #58 on Wed 04 Jun 2008 03:39 AM (UTC)
Message

An example of the first plugin (on this page) in operation is here:

This demonstrates how (with a Greek version, using the codes just above), I was able to type in the command window using Greek characters, and send them. The characters arrived in the MUD converted to UTF-8, which were then echoed back correctly in the output window.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #59 on Sun 08 Jun 2008 10:22 PM (UTC)
Message
It looks like defining UNICODE would (technically) do it, but it'll cough up a whole slew of errors (about 5000 or so) when you attempt this.
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


199,597 views.

This is page 4, subject is 5 pages long:  [Previous page]  1  2  3  4 5  [Next page]

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]