Files in attachment are test.lsp b.txt c.txt
test.lsp:
(set 's (read-file "c.txt"))
(println (find-all {(?s)target=_blank>(?:(?!target=_blank>).)*?在线观看_百度视频} s ) )
(exit
)
while b.txt and c.txt are actually html source code.
E:newlisp>newlisp
newLISP v.10.4.7 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for options.
> (load "test.lsp")
And newlisp terminated abnormal.
change in test.lsp:
(set 's (read-file "b.txt"))
E:newlisp>newlisp
newLISP v.10.4.7 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for options.
> (load "test.lsp")
("target=_blank>銆?em>鍟﹀暒鍟﹀痉鐜涜タ浜?/em>銆嬪姩婕紙2瀛e叏锛夐珮娓呭湪绾
胯鐪媉鐧惧害瑙嗛")
Now see the correct string.
in utf8 env the string is :
("target=_blank>《啦啦啦德玛西亚》动漫(2季全)高清在线观看_百度视频")
Testing it in v10.4.5 is the same result.
D:newlisp>newlisp
newLISP v.10.4.5 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for more inf
o.
> (load "test.lsp") ;;read c.txt
D:newlisp>newlisp
newLISP v.10.4.5 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for more inf
o.
> (load "test.lsp") ;;read b.txt
("target=_blank>銆?em>鍟﹀暒鍟﹀痉鐜涜タ浜?/em>銆嬪姩婕紙2瀛e叏锛夐珮娓呭湪绾
胯鐪媉鐧惧害瑙嗛")
can anyone help?
I also try to increase the newlisp stack like :
E:newlisp>newlisp -s 100000 test.lsp
E:newlisp>newlisp -s 1000000 test.lsp
but seems change nothing.
This is a problem in the PCRE library routines. See also here:
http://stackoverflow.com/questions/3613121/regular-expression-crashes-apache-due-to-pcre-limitations-need-some-help-optimis
and here:
http://newlispfanclub.alh.net/forum/viewtopic.php?f=16&t=3724&p=18722&hilit=regex+crash#p18722
On OSX this causes a crash, which occurs in pcre_exec(). It seems to have to do with nesting of HTML blocks.
Thanks Lutz.