Multiline support for regexes not working, 10.0.2 -> 10.0

Started by TedWalther, May 08, 2009, 06:50:23 PM

Previous topic - Next topic

TedWalther

I did this:



(setq x "hee heenhi hinho ho ho")
(find-all "hee.*hi" (| 2 4 8 512))
=> nil


Long and short, I tried almost every PCRE option and every combination, but nothing seemed to allow the . to match a newline, not even the option (4) that says it allows . to match newline.



Perhaps it is related to another thing I stumbled on, (trim "n     foo   ")
=> "n    foo"
 Is it supposed to work like that?



I am running OpenBSD,  Ubuntu, and Debian.  All are on 64bit AMD or Intel platforms with dual or better core.  All three show the problem.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

HPW

#1
Didn't you mean:

(setq x "hee heenhi hinho ho ho")
(find-all "hee.*hi" x $0 5)

("hee heenhi hi")


Mabe the doc is not clear:



find-all

syntax: (find-all str-pattern str-text [expr [int-option]])



It can be that [expr [int-option]] means that when int-option is needed you have to provide both.
Hans-Peter

xytroxon

#2
trim only removes characters of one value (default = space) at a time...



I use replace with a regex that looks for whitespace s and replaces it with "" (empty string)...



newLISP v.10.0.4 on Win32 IPv4, execute 'newlisp -h' for more info

> (set 'raw_str "  n     foorntbar    n ")
"  n     foorntbar    n "
> (replace "^s+|s*$" raw_str "" 0)
"foorntbar"
>


-- xytroxon



P.S. Or use this form to select which whitespace characters to strip...

(replace "^[ trn]+|[ trn]+$" raw_str "" 0)
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

cormullion

#3
It's a common pattern in newLISP to have a function described in the documentation as, eg:


(func arg1 [arg2 [arg3]])

This can be called as:


(func arg1)

(func arg1 arg2)

(func arg1 arg2 arg3)


where arg2 is optional unless arg3 is needed. What you can't do is:


(func arg1 arg3)

because arg3 will be treated as if it was arg2.



find-all is worth studying in detail - Jeff wrote a good article about it: http://www.artfulcode.net/articles/using-newlisps-find-all/">//http://www.artfulcode.net/articles/using-newlisps-find-all/.



I don't think this nested bracket notation is described in the newLISP manual - it might be a useful addition one day. Although I think it's fairly standard...

TedWalther

#4
Quote from: "cormullion"It's a common pattern in newLISP to have a function described in the documentation as, eg:


(func arg1 [arg2 [arg3]])

This can be called as:


(func arg1)

(func arg1 arg2)

(func arg1 arg2 arg3)


where arg2 is optional unless arg3 is needed. What you can't do is:


(func arg1 arg3)

because arg3 will be treated as if it was arg2.



find-all is worth studying in detail - Jeff wrote a good article about it: http://www.artfulcode.net/articles/using-newlisps-find-all/">//http://www.artfulcode.net/articles/using-newlisps-find-all/.



I don't think this nested bracket notation is described in the newLISP manual - it might be a useful addition one day. Although I think it's fairly standard...


Thanks,  I did try with and without the option previous to the last option.  Now I try the example again with that missing option, it sort of works. Now I'm wondering if PCRE has a "SUPERUNGREEDY" option, because with just UNGREEDY (512) it matches "hee heenhi" instead of "heenhi".  But the fault is mine, I probably just need to get deeper knowledge of regexps.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

Lutz

#5
Cormullion:



How about adding this (improvements welcome ;-) ):



"Arguments enclosed in brackets [ and ] are optional. When arguments are separated by a vertical bar | then one of them must be chosen."



in section "2. Data types and names" of the manual:



http://www.newlisp.org/newlisp_manual.html#type_ids">http://www.newlisp.org/newlisp_manual.html#type_ids



as a third paragraph.



TedWalther:



many options in PCRE can also be expressed inside the regular expression pattern instead of a number. So this:


(find "(?i)newlisp" "the newLISP lanuage" 0)

is the same as this:


(find "newlisp" "the newLISP lanuage" 1)

people accustomed to regex in other languages may find this easier to read.



These are other option letters:



i  for PCRE_CASELESS

m  for PCRE_MULTILINE

s  for PCRE_DOTALL

x  for PCRE_EXTENDED



See also here for a complete reference:



http://www.newlisp.org/downloads/pcrepattern.html">http://www.newlisp.org/downloads/pcrepattern.html