Find-all + Replace = Crash

Started by Fritz, October 14, 2009, 07:16:27 AM

Previous topic - Next topic

Fritz

I meet some problems during attepts to create (html-parse) function. Here are two functions, which do the same: both function suppose to extract data from "td" tags.


(define (sacar-td linea)
  (set 'alveolos (find-all "(<td)(.*?)(</td>)" linea $0 1))
  (map (fn (x) (replace "</?td(.*?)>" x "" 1)) alveolos))

(define (crash-td linea)
    (find-all "(<td)(.*?)(</td>)" linea (replace "</?td(.*?)>" $0 "" 1) 1))

(set 'testrow "<tr><td class='kin'>Alpha</td><td>Gamma</td></tr>")


Longer one, "(sacar-td testrow)" works ok. Shorter one, "(crash-td testrow)", crashes the shell:



> (sacar-td testrow)

("Alpha" "Gamma")

> (crash-td testrow)

*** glibc detected *** /usr/bin/newlisp: double free or corruption (fasttop): 0x080cc808 ***

======= Backtrace: =========

/lib/tls/i686/cmov/libc.so.6[0xb7e31a85]

/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7e354f0]

...



It is the first strange thing.



Second problem -- my repexps works ok only if I replace all "n" to " " before searching.



Update: accidentaly found solution for the second problem, it is "(?s)" key. Now my "parse-html" function works:


; Usage (parse-html (get-url "http://www.newlisp.org/downloads/newlisp_manual.html"))
(define (parse-html texto)
  (map sacar-table (find-all "(?s)(<table)(.*?)(</table>)" texto $0 1)))

(define (sacar-td linea)
  (set 'alveolos (find-all "(<t[dh])(.*?)(</t[dh]>)" linea $0 1))
  (map (fn (x) (replace "</?t[dh](.*?)>" x "" 1)) alveolos))

(define (sacar-table linea)
  (map sacar-td (find-all "(?s)(<tr)(.*?)(</tr>)" linea $0 1)))

Lutz

#1
Do it this way:



(define (crash-td linea)
    (find-all "(<td)(.*?)(</td>)" linea (replace "</td>" (copy $0) "" 1) 1))


> (set 'testrow "<tr><td>Alpha</td><td>Gamma</td></tr>")

> (crash-td testrow)
("Alpha" "Gamma")
>


Replace is trying to make replacement in $0 while at the same copying to it the piece to replace. This will throw a protection error in the future.



(define (mangle str)
    (replace "</td>" str "" 1)

(define (crash-td linea)
    (find-all "(<td)(.*?)(</td>)" linea (mangle str) 1))




Use (copy $0) or (copy $it).

cormullion

#2
Quote from: "Lutz"In a future version $0 only the anaphoric system variable $it will contain the found piece. Trying to change $it will cause a protection error. You would then use (copy $it). Today both $0 and $it contain the found piece.


Not sure what this means? Are you proposing to change the operation of $0 in replace?

Lutz

#3
sorry I mistyped, now corrected.



Nothing will change for 'replace' or 'find' and all other functions doing using regular expressions.



Currently in 'set-ref', and 'set-ref-all', both $0 and $it are set to the found item. For the next version I only mention the usage of $it for these  functions and took the usage of $0 for these functions out of the documentation. They will work, but are deprecated and usage of $0 for  'set-ref', and 'set-ref-all' be removed in 10.2 or 10.3, sometime 2010 or 2011. They will be mentioned in the deprecation chapter (2) of the manual.



When doing 'replace' on $0 this can cause a crash and will be flagged with a protection error in the future.



In other words in the future the usage of $0 to $15 will be limited to regular expression searches, all other situations will use the anaphoric $it.



There is one other usage of $0, as a count in 'replace' and 'read-expr', and I haven't decided yet if this good or not good. Perhaps a more descriptive $count should be introduced?

Fritz

#4
Thank you, now it is a bit shorter


(define (parse-html texto)
  (map (fn (x) (map (fn (y)
    (find-all "(?si)(<t[dh])(.*?)(</t[dh]>)" y
      (replace "(?si)</?t[dh](.*?)>" (copy $it) "")))
    (find-all "(?si)(<tr)(.*?)(</tr>)" x)))
  (find-all "(?si)(<table)(.*?)(</table>)" texto)))