Literate programming renewed - and newlisp?

Started by unixtechie, December 01, 2009, 01:31:15 AM

Previous topic - Next topic

unixtechie

I posted this first as a response in a previous post by Kazimir about some newlisp project at github, but let me repost this question.



So, regarding "projects in newlisp on Github":

Could you please look at http://github.com/unixtechie/Literate-Molly/">http://github.com/unixtechie/Literate-Molly/

This is a perl script for literate programming, which extends it with HTML and Javascript "folding" to help manage programmer's attention span and scale the previously unscalable flat literate source files.



It can be recoded in newlisp to become a totally standalone tool, when turned into a newlisp pseudo-executable, without reliance on any "system instllations" of any other tools, nor a web server.



Give me your opinion, Kazimir and other readers, do you think it's worth it?



Read the "MOLLY.html" file first, it's both an example of the weaver output and a sort of documentation of the script and its usage.

cormullion

#1
Hey, that's an impressive piece of work! I'm going to play with it this week when I get time.



Should I already know what 'tangling' is? I can't find a definition of it anywhere... Or is it obvious if one's a literate programmer?

unixtechie

#2
Yes, the terms "weaving" (producing formatted document from the Literate Source file) and "tangling" (producing source code to be run, if it is a script language, or compile) are standard in Literate Programming.



If you'd like an overview, read a Wikipedia article (which, incidentally, was written by me, too ;))))) ) at

http://en.wikipedia.org/wiki/Literate_programming">http://en.wikipedia.org/wiki/Literate_programming



Basically, "L.P." is "writing programs in pseudocode" with phrases in a human language, which stand for other phrases and/or machine code.

Literate programming tools allow one to write source and his thoughts in _arbitrary order_ (not the one imposed on you by the machine).



But basically, look at the MOLLY.html as an example of "weaved" formatted document, and then try yourself to get documentation, MOLLY.html and the code for the module, MOLLY.pl, by running the tool on the Literate Source file (MOLLY.weave in the distribution). Do not clobber the existing MOLLY.pl, though, give it a different name.



The sample file "MOLLY.weave" is written in "dot-html", which I'd discourage. Write your LitSource in regular HTMl, which is default now. Use the project template as your guide.



The Molly tool is not programming-specific, though, and the approach can be used for any project, even writing literature, if you wish ;).



-------------------



Basically, re-coding this in newlisp will make a truly standalone pseudo-executable with an embedded web server, which it will be possible to generate for any major platform.



so, yes, please look at it and tell me if in your opinion the tool is worth it.

Kazimir Majorinc

#3
Molly.html looks better than I expected. I'd like that my code looks that way. My concern is idea of LP - is it natural to write programs of significant size and documentation in the same time? My instinct would be code first - documentation later, if needed? I don't know, I never tried it. Unixtechie, do you feel it is good way of programming? I guess you do - but I must ask for subjective experience.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

unixtechie

#4
Kazimir, it is not "documentation"



Let me give you another definition of "literate programming":



1. It is writing code from specifications in a human language

It's not documentation that you produce. You begin programming by thinking of what a piece of code should do, then writing it down:

loop over lines of a target file

filter out non-code sections

process the code



Now,  you continue to think in "human":

mm.. all I have to distinguish is headings, ends, body and references (I am talking about the program that implements literate programming parsing itself, of course):



<<process the code>>=

<<process headings>>

<<...>>

...

@



While you specify and implement, you add your commentaries alongside the code itself ("real" code and pseudocode references).

But those commentaries are NOT DOCUMENTATION. They are notes for you, the programmer, of your own thoughts, the references you need, some points you'll need to add, alternative ideas of how to implement etc. etc.



Another point is that you do it in the order your thinking imposes, top-down, bottom-up, or any mad sequence your brain demands. You do not bend your mind to think in the order of machine codes.



One more big point is that, your notes not being documentation, you are NOT OBLIGATED in any way to produce them. If a piece of code is obvious, you just write it out as code, without employing pseudocode insertions or comments alongside. You start breaking code when some piece interferes with the general flow, to postpone it.

So you can write down some idea more fluently, without getting bogged in the details of implementation of some of its parts.

You can always do them later.



I basically try to break code into pieces (a) to postpone, or (b) when I do not understand clearly how to write some snippet and flip back to the "writing from descriptions in a human lang" mode.



So: "programming from specs in a human language" or "programming directly in pseudocode" is good. We all know that pseudocode is good and highly understandable, as most if not all textbooks are written in this way. We learned the discipline of programming by reading "literate programs" in some sense on the first place.



2. Now, the demand that a literate program looked like a "polished essay" is a big psychological block and must be disposed with from the beginning.

However  your haphazard notes DO BECOME good documentation as you go along and refine the code.



They remain great documentation for you yourself, the author, all the time - you can restart your thinking immediately at any point. There is no situation when, a week or two later, you have to remember what the heck it all meant. Many programmers on the Net state that a month after writing a piece of code they are unable to really read it.



But to return to the original point of this paragraph, these haphazard notes do become good documentation as you refine your code, delete helpless question, and leave only what is relevant. And you NEVER actually WROTE any documentation at all - a task again many (myself included) feel is psychologically discouraging. It all got produced "by itself", as a by-product of your thinking.



3. This way you drastically increase the SCOPE of your ATTENTION. A "regular" program becomes a strain at roughly 1000-1200 lines of code. A programmer has to keep a mental picture of his code to navigate and know which parts need to be changed when  he is further developing the existing design.

 

The outline prevents that mental strain from happening.



I look at my code and notes, immediately produced, in a web browser, while I am doing my thinking. Then I get to the referred line in the editor, and do changes, reload the page (Ctrl-R), and look and think again.

It was a surprise for me how much my brain got relieved from "housekeeping tasks", remembering, straining etc. when the script was functional enough for me to switch to using it.



4. In a way that answers your  question how applicable L.P. could be to larger programs.

(a) outlining added to "traditional" LP greatly increases scalability. Everything not relevant to the task at hand is folded out of way, even if it is megabytes of code (as long as your web browser does not choke)

(b) A Literate Source file, like the file you were reading, "MOLLY.html", can be split mechanically into parts if code grows uncomfortably large (e.g. if the TOC section becomes unwieldy)

Just feed all spliter file names to the tangler, and your code will be produced correctly:



"MOLLY.pl/notangle -R 'root chunk name' splinter1.litsource splinter2.litsource ... > my.code.txt"



(c) L.P. was invented by Knuth who is famous for painstaking thoroughness in his programming. He created a huge program, TeX, by using L.P. techniques.



(d) In a way, pseudocode referencing is an alternative to creating artificial functions/procedures in languages, when and where they are used not because computing demands it (e.g.  for recursive calls), but just as a crutch for human memory limitations - "I'll have to stick this into a subroutine, or I won't remember how to use it in the sea of code my messy program will become".



In that sense, L.P. deserves to be called "another paradigm" of programming. In fact, it's just simple macro programming, allowing to use specifications in human, which become precise new operators of the introduced meta-language.

Kazimir Majorinc

#5
OK, unixtechie, I see. You are strong advocate of LP, and I have find another one http://airfoyle.blogspot.com/2006/05/anti-literacy-program.html">One interesting discussion was on Foyled blog. (Both debatants developed their own LP tool in Lisp.) It appears that some LP in Lisp projects (http://209.85.129.132/search?q=cache:hyaqgPd2ELEJ:axiom-wiki.newsynthesis.org/images/mathaction/SandBoxCL-WEB.ps+Yapp+Hebisch&cd=2&hl=en&ct=clnk&gl=hr&client=firefox-a">1, http://www.umcs.maine.edu/research/features/lit-prog-lisp.html">2) are quite alive and fresh. On the other side, I've visited http://www.literateprogramming.com">http://www.literateprogramming.com - and although site is OK, it appears that few people are visiting that. My Alexabar suggests that this site is only slightly more popular than my blog. Both side of coins have sense to me, so it might be one of the things one really need experience to say. Any case, I think that HTML and Javascript are better choice than Tex, everything is much more practical - so LP might has more chance to be adopted in future than in past.  



I have few more questions:



Do you plan it to be tool for all languages, or for Newlisp only?



Do you consider using standard <<header>>= notation or Lisp-like?
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

unixtechie

#6
Well, Kazimir, you sound as if it all is a big deal. It is not. Probably I myself am guilty: when I was writing the Molly script, I  was too wordy (testing how various tricks would look in the output). I should probably cut documentation (part II) to 20% of what it is.



Because using this stuff is incredibly easy.



I looked at your links; the google cache failed on me, and so I do not know if you also mentioned CLWEB, a lit prog add-on for Common Lisp.



Those guys twist the original idea to use it with REPL. I have not touched those tools at all.



To answer your question: I am not exactly planning, I've done it - ;)) -  and the tool does not depend on the language of machine code. Moreover, one can view it as an extension of the "noweb" tool, an alternative folding html weaver for it. (Although now it has a built-in tangler and is self-contained too).



And yes, I used the <<chunk name>>  for chunk names in definitions and references..



Well, look at the template file in the distribution (there is a button above the file list on the GitHub page that will download a full tar.gz archive of the depository).

It is a very, very simple notion and it's very simple to use.  A 10-minute trial with the template file will give you a good idea of how it feels.