Print Page - a standard way to parse the newlisp manual

Title: a standard way to parse the newlisp manual
Post by: Tim Johnson on October 30, 2006, 05:13:03 PM

Hi:

Lutz and I have been having a conversation of off the forum and we've

decided that it is worth sharing.

FYI: I have written an application in elisp to create a newlisp syntax mode

for emacs. Part of this project was to provide on-demand documentation

from a keyword context in three different forms:

1)Verbose documentation in a popup window.

2)Verbose documentation in a temporary buffer

3)One-liner description of the interface in the echo area.

To do this I had to make some modifications to newlisp_manual.html.

Here is what I wrote to Lutz:

"""

Lutz said:

> Regarding the doc strings: it should not be very difficult to write a script

> for extracting them from newlisp_manual.html, which strictly uses a limited

> HTML subset. You also could just use 'lynx -dump newlisp_manual.html' to

> generate a pure text file, which then is even easier to parse.

I said:

Actually, I've used that method. But I couldn't really find a

pattern to use as a delimiter, so I manually inserted some

markup - specifically a pseudo-tag <break>.

Do you have any ideas for a search pattern?

I think that, given some time

(:-) and guidance),

I could come up with a method that generated a file that could

be used as an intermediary for any number of editors. Certainly

emacs and vim, and for emacs a text file parsed by emacs itself

would get around some of the problems with escaping.

Alternatively here's an approach that would use the website and

the documentation directly:

Rebol has a feature called load/markup that auto-parses a document or

a string into an array of alternating tag and string datatypes - all

it takes is:

load/markup http://www.newlisp.org/downloads/newlisp_manual.html

from the command line.

Anything like that in newlisp?

Maybe you would like to take this discussion to the forum?

:-) I've got the time.

" " "

Lutz replied again:

"""

finding some kind of standard way to parse the newlisp_manual.html into

pieces sounds like a wonderful idea. This idea should definitely be brougth

to the discussion group.

Some short newLISP script using regular expressions should do the thing. As

you mentioned, perhaps some addtions/changes to the manual will facilitate

it further.

Mention this idea on the discussion group.There are several people

experimenting with newLISP development environments based on emacs, vi, gtk,

etc. They all could benefit on a method to quickly extract relevant help

from the manual.

"""

I'll add a couple of other thoughts:

1)There is one - situation where keyword documentation is combined and that is for the arithmetic operators.

2)One should be thinking about ways to include documentation for

user libraries, third-party contributions etc.

I'm sure there's many other ideas.

thanks

tim

Title:
Post by: cormullion on October 31, 2006, 12:30:47 AM

Sounds good. I've had some skirmishes with this myself. For the TextMate bundle I tinkered with, I wrote something like this:

Code Select Expand
; load the whole manual. Gulp.

(set 'file (read-file  "/usr/share/newlisp/doc/newlisp_manual.html"))

; we're looking for the selected text

(set 'func-name "atan2")

; find the matching bit with regex

(set 'doc-section (find (string {(<h2><span>)(} func-name {)(</span></h2>)(.*?)(<h2><span>)} ) file 4))

; found it, output it to Show as HTML

(if doc-section
   (println  $1 $2 $3 $4)
   (println "couldn't find it"))
(exit)

TextMate has a nice HTML window available for online documents, so no need to strip out the markup.

It's obviously a sledgehammer approach. I'm looking forward to seeing the scalpel version.

[/code]

Title:
Post by: Tim Johnson on October 31, 2006, 09:07:05 AM

<GRIN> That easy huh?

I don't grok it all, but you have a pattern to parse on right?

Title:
Post by: cormullion on October 31, 2006, 09:38:17 AM

it was looking for five stretches of text:

<h2><span>

atan2

</span></h2>

.*?

<h2><span>

and giving you back the first four. The fifth pattern was the start of the next function so unwanted.

In fact, looking at the manual again, the source text is different now, there's a span class="function" to cater for now.

I dunno, I'm no regex wiz...

Title:
Post by: Tim Johnson on October 31, 2006, 11:44:21 AM

~~Quote from: "cormullion"~~
In fact, looking at the manual again, the source text is different now, there's a span class="function" to cater for now.

I'm seeing this this pattern consistently:

Code Select Expand
<a></a>
<h2><span>*function-name-here*</span></h2>

NOTE: This forum is obfuscating the anchor name and span class attributes, but I see a usable patern emerging

~~Quote~~
I dunno, I'm no regex wiz...

:-) Me neither, but count yer blessings, regexes are more of a headache

in elisp

Title:
Post by: Tim Johnson on November 01, 2006, 04:48:35 PM

See http://www.johnsons-web.com/demo/newlisp/parse-nl-docs.r.txt

The following labels=>

char-entities: ;; data structure

clean: ;; subroutine

parse-all: ;; subroutine

are the operational components. It's kind of quick-and-dirty, but

I tried to write it in a way that a newlisper could easily follow,

and provided some documentational comments.

If one can follow the logic:

1)I'd appreciate Lutz evaluating the accuracy of the logic.

2)It should be easy to write a newlisp script to accomplish the same.

See http://www.johnsons-web.com/demo/newlisp/newlisp-docs.txt

For the output

newLISP Fan Club

Forum => newLISP newS => Topic started by: Tim Johnson on October 30, 2006, 05:13:03 PM