[Home] [Downloads] [Search] [Help/forum]

Gammon Forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  Lua
. . -> [Subject]  Hyperlink_URL2(SVD) -help?
Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?

Hyperlink_URL2(SVD) -help?

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by ThumpieBunnyEve   (10 posts)  [Biography] bio
Date Mon 18 May 2009 07:48 PM (UTC)

Amended on Mon 18 May 2009 11:11 PM (UTC) by Nick Gammon

Message
Ok, I'm trying to work on a plug-in provided by Sketch, modified by Nick.

But i'm noob to Lua..
trying to make logical sense of this plug-in is some what, beyond me at this point. I'm scratching at the edges striving to make this work out. I've made a little headway but. i really think it's just beyond me at this point. not to mention it's a night mare to even look at...

I need some help from higher up.

Heres my modifications, Yes it's an xml but really it's just a standard trigger, with a Lua script. (Having it be xml plug-in appears to do little more then make it easier to disable and enable, or tweak and what not. and also provides room for comments on usage.) So don't be afraid of it if you've never beaten open a plug-in before.


Now before you scroll down, understand that i've got most of this working, accept that the Variable "cut" with relation to it's OR evaluations, defaults to the first evaluation found, then when it finds something matching the second evaluation in never again searches for the first in the list of OR's. Actually... it's harder to explain it, just go try out this plugin, and then in any world SAY the line following line:


fission www.fusion.org starscape void ftp://folding.quantium.emersion singularity http://fantastic.voyage.exostential www.xor.com WWW.www.Wow interdimensional mailto://conceptual@therom.isometric http://www.worlds.density.wwww.interface inditerminate hypothetical wwww.water.earth.fire.air ftp://archive.entity.theroy WWWW.www.Wow www.womd.wrong


you'll note that it is searching backwards, and finds all the first www. matches. but as soon as it encounters a ftp:// match, in never again looks for a www. match, (or any others between it like http://)

Here is the plug-in that I've spent the last 3 days trying to work with for the benefit of us all.

[oh and for the record, the MXP/Pueblo link parser thingie doesn't work for Tapestries Muck, so yes i need to use this script.]

Heres the code.
PS: Nick Gammon I tried to put your name in the author section but it we are limited to 32 characters there. :p
[Go to top] top

Posted by ThumpieBunnyEve   (10 posts)  [Biography] bio
Date Reply #1 on Mon 18 May 2009 07:48 PM (UTC)

Amended on Mon 18 May 2009 11:12 PM (UTC) by Nick Gammon

Message

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>
<!-- Saved on Saturday, 1 April 2006, 12:29 PM -->
<!-- MuClient version 3.73 -->
<muclient>
<plugin 
  name="Hyperlink_URL2(SVD)" 
  author="Sketch_and_ThumpieBunnyEve" 
  id="520bc4f29806f7af003d175d" 
  language="Lua" 
  purpose="Makes URLs on a line into hyperlinks." 
  date_written="2006-04-01" 
  date_modified="2009-05-18"
  requires="3.72" 
  version="2.0">

<description trim="y">

<![CDATA[
Hyperlink_URL2(SVD) (S.cripting V.ariable D.ependent)
Detects text starting with HTTP:xxx and makes that part into a hyperlink.
Limits: HTTP:// and the following character must be the same color

You will need to add 2 values to your  WorldProperties-Scripting-Variables  section.
All values indicated must be in Hexadecimal. 
If you are new to Hexadecimal color values, please visit the following link,
http://en.wikipedia.org/wiki/WebColor#HTML_color_names

I recommend the following values, for link color. 
name: urlbackcolour 
content: #000000
name: urltextcolour
content: #0080ff

]]>
</description>

</plugin>

<!--  Triggers  -->

<triggers>
  <trigger
    enabled="y"
    match="(.*)(((https?|mailto|ftp)://(?:[\w\d\.\?\/\%#@!&quot;&amp;_]+[/\w\d#]))|([wW]{3}+\.))(.*)$"
    omit_from_output="y"
    ignore_case="y"
    regexp="y"
    script="OnHyperlink"
    sequence="1"
  >
  </trigger>
</triggers>

<!--  Script  -->
<script>
<![CDATA[

function OnHyperlink (name, line, wildcards, styles)
 
  local hyperlinks = {}
  local newstyle = {}
  local i = 1
  local hyperlinkcount = 0
  local doingURL = 0
  
  while i <= table.getn(styles) do -- Doesn't use pairs() because of problems with field-injection.
    if doingURL == 0 then
     -- **** Not a URL **** --
	 local vader = styles[i].text:lower()
      cut = vader:find("ftp://") or vader:find("http://") or vader:find("mailto://") or vader:find("[/w]-www\.")

      if cut == nil then -- If there's nothing to cut, copy the whole line
        table.insert(newstyle, {textcolour = styles[i].textcolour
                               ,backcolour = styles[i].backcolour
                               ,style = styles[i].style
                               ,text = styles[i].text})
      else
        table.insert(newstyle, {textcolour = styles[i].textcolour
                               ,backcolour = styles[i].backcolour
                               ,style = styles[i].style
                               ,text = string.sub(styles[i].text, 1, cut - 1)})
        table.insert(styles, i + 1, {textcolour = styles[i].textcolour
                                ,backcolour = styles[i].backcolour
                                ,style = styles[i].style
                                ,text = string.sub(styles[i].text, cut)})
        doingURL = 1
        hyperlinkcount = hyperlinkcount + 1
      end -- if

    else -- **** IS a URL **** --
      -- Search for a URL. If the string is completely a URL...
      -- Jump to the next table field. And keep doing such.
      cut, length, temp = string.find(styles[i].text, "^([%S]*[%w#/])")
      if cut ~= nil then
        if hyperlinks[hyperlinkcount] ~= nil then
          hyperlinks[hyperlinkcount] = hyperlinks[hyperlinkcount] .. temp
        else
          hyperlinks[hyperlinkcount] = temp
        end -- if
        table.insert(newstyle, {textcolour = styles[i].textcolour
                               ,backcolour = styles[i].backcolour
                               ,style = styles[i].style
                               ,text = string.sub(styles[i].text, 1, length)
                               ,hypernumber = hyperlinkcount})
        styles[i].text = string.sub(styles[i].text, length + 1)
        if styles[i].text ~= "" then
          i = i - 1    -- The first hyperlink was cut, so scan the same field for more.
          doingURL = 0
        else
          doingURL = 1
        end
      else 
        doingURL = 0
        i = i - 1
      end -- if (cut)
    end -- if (doingURL)
    i = i + 1
  end -- while
  
  for x, y in ipairs (newstyle) do -- x is the style number, y is the style-data table.
   NoteStyle (y.style)
   if y.hypernumber ~= nil then
     Hyperlink(hyperlinks[y.hypernumber], y.text, "Go to " .. hyperlinks[y.hypernumber]
               , GetPluginVariable ("", "urltextcolour"), GetPluginVariable ("", "urlbackcolour"), 1)
   else
     ColourTell (RGBColourToName(y.textcolour), RGBColourToName(y.backcolour), y.text)
   end
  end -- while
  Note ("") -- Insert a true newline at the end of the string.
  
end -- of hyperlink
]]>
</script>
</muclient>
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #2 on Mon 18 May 2009 11:22 PM (UTC)

Amended on Mon 18 May 2009 11:24 PM (UTC) by Nick Gammon

Message
Hmmm, yes I see your problem.

This follows on from an earlier thread, doesn't it? You were trying to test for this OR that in a regexp. The solution offered was to use an or in conjunction with string.find, or use the rex library.

The "or" solution works in a way, but not the way you want. What is happening is that it finds the first instance of ftp:// (at column 39), thus skipping all the earlier http things.

You really need to use the rex library, which lets you build up a PCRE regular expressions, which lets you pass the this|that expression into a single regexp, thus stopping it at the first instance.

The example code below shows the difference, the first one, using what you are doing, returns column 39, my rex example (which is shorter anyway) stops at column 9, which is the first match.



vader = "fission www.fusion.org starscape void ftp://folding.quantium.emersion singularity http://fantastic.voyage.exostential www.xor.com WWW.www.Wow interdimensional mailto://conceptual@therom.isometric http://www.worlds.density.wwww.interface inditerminate hypothetical wwww.water.earth.fire.air ftp://archive.entity.theroy WWWW.www.Wow www.womd.wrong"


cut = vader:find("ftp://") or vader:find("http://") or vader:find("mailto://") or vader:find("[/w]-www\.")

print (cut)  --> 39

cut = rex.new ("(ftp://)|(http://)|(mailto://)|(www\.)"):match (vader)

print (cut)  --> 9


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #3 on Tue 19 May 2009 04:50 AM (UTC)
Message
Quote:

Nick Gammon I tried to put your name in the author section but it we are limited to 32 characters there. :p


Yes, ThumpieBunnyEve *is* a long name, isn't it? ;)

You could add "& NJG" to it.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by ThumpieBunnyEve   (10 posts)  [Biography] bio
Date Reply #4 on Wed 20 May 2009 11:53 PM (UTC)
Message
Looks like it's working :}
Thank you Nick <3

Now before i wrap this up as a done deal,

Is there any way to capture the color of the text to the output window -after- the other triggers have done their checking?

[standard triggers]
I have all incoming pages using custom colour 16
I have all incoming page poses using custom colour 16

I have all incoming pages "Brenda" using custom colour 13
I have all incoming page poses "Brenda" using custom colour 12

I have all outgoing page using custom colour 4
I have all outgoing page poses using custom colour 4

all says from me using custom colour 1
all poses from me using custom colour 2

[plugin trigger]
Now when this plugin finds a link, it changes the output color of the line to the default color for incoming text, and it changes the color for the links to WorldOptions > Scripting > Variable urltextcolour and urlbackcolour


Which clearly cancels any changes made by [standard triggers].

How can i go about capturing the color set by [standard triggers] for use with this plugin?

Should i be trying to capture the [Standard Triggers] colour changes in something like scripting variables "msgtextcolour" and "msgbackcolour"? and set the [plugin trigger]'s sequence to "1000" ? (i presume a higher number means it gets processed later on.)

What do you think?
~tbe

[Go to top] top

Posted by ThumpieBunnyEve   (10 posts)  [Biography] bio
Date Reply #5 on Thu 04 Jun 2009 03:14 PM (UTC)

Amended on Thu 04 Jun 2009 03:15 PM (UTC) by ThumpieBunnyEve

Message
made a good bit of headway sofar.
still stuck on this line however.

It causes MUSHclient to crash in a resource hogging recursive loop of some sort or another.
cut = rex.new ("(ftp://)|(http://)|(https://)|(mailto://)|([^AaWw]+www+[.])"):match (vader)



Specifically the line of evil, is this
([^AaWw]+www+[.])


i do -not- want it to pick up:
awww.
or
wwww.
or
awwwwwwwww.
or
wwwwwwwwwwwwwwww.

only, www.

I'm not sure how else to word it. what do you recommend?
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #6 on Thu 04 Jun 2009 07:12 PM (UTC)
Message
If you don't want it pick up wwwwwww you shouldn't have a + after the w:

([^AaWw]+www+[.])

this means:

Any number of: anything except for A, a, W, w
followed by: ww
followed by: at least one w
followed by: a period


I would try something like:

[^A-Za-z0-9_](www[.])

which means:
a non-alphanumeric character
follwed by: www.

note that you probably don't want to capture the first character; your earlier pattern was capturing it.

Of course this won't work for something like:

"www.foobar"
because there is no non-alphanumeric character in front of it...


what exactly are you trying to match here? I.e., do you have any sample lines you want to match?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by ThumpieBunnyEve   (10 posts)  [Biography] bio
Date Reply #7 on Thu 04 Jun 2009 07:47 PM (UTC)
Message
I'm trying to match more forms of hyperlinks. (URLs)

Originally with the prior plug-ins provided, you could only match http:// and https:// and mailto:

That left out links like:

tinyurl.com/13451

or
ftp://555.911.800.42/apachejunk/mork.jpeg


It left out

www.disney.com


And

fishing.net


I'm working to incorporate such things.

knowing the difference between
awww.Kute!

and
a www.cute.com

is kinda important.

also I'd like it to -not- highlight the whole line and use it if it encounters something like
wwwww.malformed.url

instead only highlighting the first 3 w's
such as
wwwww.malformed.url


wWw.WoW.eq/pizza


should work also.

I can post the entire plug in code if you like. It works with 4 script variables and at least one necessary trigger in mush-client.

I've redesigned it to forward color changes made to text by triggers, to the plugin, for use with found links, thus preserving the colors those triggers made the text into.
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #8 on Fri 05 Jun 2009 05:08 PM (UTC)
Message
Just to be clear, you don't want it to match "www.kute" here:

awww.Kute!

but you do want it to match "www.malformed.url" here:

wwwww.malformed.url

Is the fact that it's an a vs. a w important here? I don't really understand why the second is an acceptable match (a URL in the middle of a word) but the first isn't.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by ThumpieBunnyEve   (10 posts)  [Biography] bio
Date Reply #9 on Sat 06 Jun 2009 07:12 PM (UTC)

Amended on Mon 08 Jun 2009 08:57 PM (UTC) by Nick Gammon

Message
ok maybe i wasn't expressing myself well enough. i dont have a college education in the terminology required to easily and concisely express my problem. I do however, have the reasoning power to figure out how to solve the problem myself, without colorful terminology. And, here, are the fruits of that labor.

so here is the code line for detecting URL's from a cut of text so far. (Note this is not the same as the trigger strings filter. It's just for a sub-string after we've triggered this script.)

cut = rex.new ("(ftp://[a-zA-Z0-9-_]+)|(http://[a-zA-Z0-9-_]+)|(https://[a-zA-Z0-9-_]+)|(mailto:[a-zA-Z0-9-_]+[@])|(www[.]([a-zA-Z0-9-_]+)[.])|([0-9]{1,3}[.]+[0-9]{1,3}[.]+[0-9]{1,3}[.]+[0-9]{1,3})|([a-zA-Z0-9][a-zA-Z0-9-_]+[.]+[a-zA-Z0-9-_][a-zA-Z0-9-_]+[.]+[a-zA-Z][a-zA-Z]+[:]+[0-9]*?[^ ])|(tinyurl.com/)"):match (vader)

It's working pretty good sofar. And i should note, after the detected portions of the sub string are found, the whole string is passes on to a function that detects the end of the URL in the string. (from the starting location found by cut). Said function goes from the start of cut, to the first instance of space, or any character in the following string "'\|" Then dumps that data to the hyperlink generator.

Right now i'm activating the trigger with this testing line.
posted in a world:

p <SELF>=: nom nom, http://www.wichway.gov/bailout/flop.avi awwww. tinyurl.com/xw68xw mailto://James@Tiberius.Kirk "awwww." 111.222.333.444 www.wet.net 111.222.333.444:5555 awww.oneway.org 1.1.1.1 mailto:luke@leia.han wwww.whathappened.edu 1.2.3.4:5 mailto:the.dark.side wwwwww.nom http://zoool www.what-is-up.org tinyurl/12345 mailto://bill@nhy.edu wwwwa.tter.plz zod.selfip.com:8080 10.80.222.44:203012 www.selfip.com:77 www.n0_ip.com 1111.2222.3333.4444

Am i missing anything? should i be including more in my URL and Hyperlink detection?

Also, Nick Gammon, I'm getting the following errors:
popup: MUSHclient
Hyperlink action "www.n0_ip.com" - permission denied.
Hyperlink action "1.1.1.1" - permission denied.
Hyperlink action "www.oneway.org" - permission denied.
Hyperlink action "10.80.222.44:203012" - permission denied.
Hyperlink action "tinyurl/12345" - permission denied.


I think i have an idea why.
http:// may not be explicitly assigned to the strings when they are being passed as hyperlinks. Which I'll work on later after I'm sure I'm doing my detection of links properly.

David Haley, thank you for your effort to assist me thus far.
I hope for continued replies.
~tbe
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #10 on Mon 08 Jun 2009 03:39 PM (UTC)
Message
I have to admit that when I see that regex, my mind kind of boggles. ;) So I'm not sure if you've covered all possible cases. The problem with wanting to accept things that you know are hyperlinks but are really missing parts is that you can easily end up with lots of noise. I'm thinking of things like this: tinyurl.com/xw68xw. And actually I note that you have a special case in the regex precisely for tinyurl.com.

Before being able to talk about regular expressions, I would need to know what exactly you're trying to match and what you consider valid URLs to be. You seem to want to apply a form of "noisy" matching, where you accept lots of things that you "know" are URLs but are really malformed URLs.

If I wanted to have "noisy" matching, I would do something like this:

If I see a protocol identifier that I know, just use standard well-formed URL definitions until the end. That is, if I see e.g. http://, just assume that everything until the next non-URL character is part of the URL.

If I see "www." starting a word, I would make sure that I have a string of the form "www.XXX.YYY", where XXX and YYY are sequences of letters, numbers, dashes etc.. If you wanted to reduce noise (but throw out things that are in fact legitimate URLs according to the standard) you could test YYY against a known list of domain suffixes.

If I didn't see a www. in front, life gets pretty complicated, but basically I would do what I just described but definitely check the YYY against a list of suffixes, because this category of matches is likely to be very messy.

---

One thing I try to do is to test things separately to the extent that they can be tested separately, because monster expressions are very hard to understand, modify, test, share, etc. To really understand what your expression does would take me quite a lot of time, but I think you could simplify it by breaking it into pieces.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #11 on Mon 08 Jun 2009 09:05 PM (UTC)
Message
I edited your post, ThumpieBunnyEve, to change the code tag to the mono tag, so the lines wrapped a bit better.

I am inclined to agree that the regexp is a bit boggling, but if it works, well and good. One thing you might consider changing is the frequent use of
[a-zA-Z0-9-_]
to \w. From the regular expression documentation:


Generic character types

Another use of backslash is for specifying generic character types. The
following are always recognized:

         \d     any decimal digit
         \D     any character that is not a decimal digit
         \s     any whitespace character
         \S     any character that is not a whitespace character
         \w     any "word" character

...

 A "word" character is an underscore or any character less than 256 that
is a letter or digit.


Similarly you might use \d instead of [0-9].

Quote:

Also, Nick Gammon, I'm getting the following errors:
popup: MUSHclient
Hyperlink action "www.n0_ip.com" - permission denied.


In the help for Hyperlink it says "If true, the action must start with "http://", "https://", or "mailto:" and if so, it is sent to the web browser.".

So to avoid those errors you must make sure the string starts with "http://" if it doesn't already. A simple string prefix test would assure that.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


8,633 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at FutureQuest]