newLISP Fan Club

Forum => newLISP in the real world => Topic started by: TedWalther on May 08, 2009, 06:50:23 PM

Title: Multiline support for regexes not working, 10.0.2 -> 10.0
Post by: TedWalther on May 08, 2009, 06:50:23 PM
I did this:



(setq x "hee heenhi hinho ho ho")
(find-all "hee.*hi" (| 2 4 8 512))
=> nil


Long and short, I tried almost every PCRE option and every combination, but nothing seemed to allow the . to match a newline, not even the option (4) that says it allows . to match newline.



Perhaps it is related to another thing I stumbled on, (trim "n     foo   ")
=> "n    foo"
 Is it supposed to work like that?



I am running OpenBSD,  Ubuntu, and Debian.  All are on 64bit AMD or Intel platforms with dual or better core.  All three show the problem.
Title:
Post by: HPW on May 09, 2009, 12:06:40 AM
Didn't you mean:

(setq x "hee heenhi hinho ho ho")
(find-all "hee.*hi" x $0 5)

("hee heenhi hi")


Mabe the doc is not clear:



find-all

syntax: (find-all str-pattern str-text [expr [int-option]])



It can be that [expr [int-option]] means that when int-option is needed you have to provide both.
Title:
Post by: xytroxon on May 09, 2009, 01:15:15 AM
trim only removes characters of one value (default = space) at a time...



I use replace with a regex that looks for whitespace s and replaces it with "" (empty string)...



newLISP v.10.0.4 on Win32 IPv4, execute 'newlisp -h' for more info

> (set 'raw_str "  n     foorntbar    n ")
"  n     foorntbar    n "
> (replace "^s+|s*$" raw_str "" 0)
"foorntbar"
>


-- xytroxon



P.S. Or use this form to select which whitespace characters to strip...

(replace "^[ trn]+|[ trn]+$" raw_str "" 0)
Title:
Post by: cormullion on May 09, 2009, 02:17:42 AM
It's a common pattern in newLISP to have a function described in the documentation as, eg:


(func arg1 [arg2 [arg3]])

This can be called as:


(func arg1)

(func arg1 arg2)

(func arg1 arg2 arg3)


where arg2 is optional unless arg3 is needed. What you can't do is:


(func arg1 arg3)

because arg3 will be treated as if it was arg2.



find-all is worth studying in detail - Jeff wrote a good article about it: //http://www.artfulcode.net/articles/using-newlisps-find-all/.



I don't think this nested bracket notation is described in the newLISP manual - it might be a useful addition one day. Although I think it's fairly standard...
Title:
Post by: TedWalther on May 09, 2009, 07:02:46 AM
Quote from: "cormullion"It's a common pattern in newLISP to have a function described in the documentation as, eg:


(func arg1 [arg2 [arg3]])

This can be called as:


(func arg1)

(func arg1 arg2)

(func arg1 arg2 arg3)


where arg2 is optional unless arg3 is needed. What you can't do is:


(func arg1 arg3)

because arg3 will be treated as if it was arg2.



find-all is worth studying in detail - Jeff wrote a good article about it: //http://www.artfulcode.net/articles/using-newlisps-find-all/.



I don't think this nested bracket notation is described in the newLISP manual - it might be a useful addition one day. Although I think it's fairly standard...


Thanks,  I did try with and without the option previous to the last option.  Now I try the example again with that missing option, it sort of works. Now I'm wondering if PCRE has a "SUPERUNGREEDY" option, because with just UNGREEDY (512) it matches "hee heenhi" instead of "heenhi".  But the fault is mine, I probably just need to get deeper knowledge of regexps.
Title:
Post by: Lutz on May 09, 2009, 07:15:41 AM
Cormullion:



How about adding this (improvements welcome ;-) ):



"Arguments enclosed in brackets [ and ] are optional. When arguments are separated by a vertical bar | then one of them must be chosen."



in section "2. Data types and names" of the manual:



http://www.newlisp.org/newlisp_manual.html#type_ids



as a third paragraph.



TedWalther:



many options in PCRE can also be expressed inside the regular expression pattern instead of a number. So this:


(find "(?i)newlisp" "the newLISP lanuage" 0)

is the same as this:


(find "newlisp" "the newLISP lanuage" 1)

people accustomed to regex in other languages may find this easier to read.



These are other option letters:



i  for PCRE_CASELESS

m  for PCRE_MULTILINE

s  for PCRE_DOTALL

x  for PCRE_EXTENDED



See also here for a complete reference:



http://www.newlisp.org/downloads/pcrepattern.html