Request for Closures - Contexts insufficient

Started by itistoday, March 27, 2009, 09:16:05 AM

Previous topic - Next topic

itistoday

This thread has taken a turn for the slightly different topic of "reference lists", see posts below..



Closures are a very useful thing, and unfortunately newLISP doesn't have support for real closures.



I'm aware of the existence of contexts, but they are, in my opinion, a poor replacement for closures. They solve some problems, but are not a proper solution for the sorts of problems that closures solve.



The main reason is that they must be named, and they cannot be created on-the-fly around code as closures can.  This has a huge impact on the readability of code, and on what newLISP can and cannot do.



Problems with regards to namespace as http://www.alh.net/newlisp/phpbb/viewtopic.php?p=11920">this thread on streams.  The current solution is poor as it involves doing a lookup in a red-black tree, and makes it inefficient to memoize functions where the arguments can be switched (and still have the same result, i.e. (+ 1 2) vs (+ 2 1)).



Contexts have to be named. If a context with that name already exists somewhere, you get a conflict. You can solve this by numbering them, but that's just an ugly hack and pollutes the namespace and readability of code. Contexts cannot have contexts within them. Closures would fix all of these problems, and make newLISP far more appealing to many people.



Is there any possibility of this happening? Perhaps something similar, like "anonymous contexts"? Granted, I may be wrong on some things here (it's been a while since I looked into the stream stuff), but I think most of these reasons still stand. Thoughts?
Get your Objective newLISP groove on.

Kazimir Majorinc

#1
We have mutable functions. I think mutable functions can do everything as closures can, but not vice versa. Because, mutable functions can be changed from outside, not only "from inside."



Problem is that functions do not know their names, so, they cannot change themselves. Except if we define them that way:



(set 'mn1 (lambda((my-name 'mn1))(println "my name is " my-name)))

(mn1)




This is horrible weapon! This function can change itself, and it can change its name, and it can change other functions and their names.



We just need some practice to learn how to use it, eventually develop some interface that reminds on closures.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

itistoday

#2
That does not provide the functionality of closures, and actually illustrates another problem of newLISP, that of dynamic scope, which means that in newLISP you're pretty much always dealing with the global scope.



It's almost as if you were to write C++ code and declare global variables all over the place and never use classes.  People would look at you like you were crazy.  This makes writing large projects in newLISP difficult, and that in turn puts off a lot of people to the language.



Really this is a problem of namespace and newLISP's Once-Only-Referencing.  If newLISP supported lexical scope, that would be a huge step toward more adoption, and would increase the utility and power of the language.
Get your Objective newLISP groove on.

itistoday

#3
Would it not be possible for example, to define newLISP's "set" family of functions in such a way, such that whenever it's invoked, it checks to see if the symbol was previously referencing a closure, and then decrements the closure's ref-count?



Admittedly, I haven't thought this through very much, so you may treat me as if I'm speaking out of my ass, but could you do something like this:


(set 'a (closure (b) (setf b 5)))
(a) ; => 5
(println b) ; => nil
(set 'a 5)


Upon the fourth line, the closure would be released from memory, not for the reasons it is now, but because its refcount would be decremented to 0.  And you could also check for things such as closures referencing themselves to prevent situations where they'd stick around in memory forever.  The basic idea though is that you could make anonymous closures, and whenever they would be passed around, it would always be by-reference (no copying).
Get your Objective newLISP groove on.

Kazimir Majorinc

#4
itistoday, the function I described really can do everything closures can, one just needs to write some interface around it. That is why all these Newlisp metaprogramming features are here.



For lexical scope, perhaps you can use my genlet, genlocal, gensym, ... these give you functionality of lexical scope, and they still work with macros. Real lexical scope cannot work with Newlisp macros.


(load "http://www.instprog.com/Instprog.default-library.lsp")

(set 'outer
     (lambda-macro()
       (set 'i 1)))

(genlet((i 8))
      (outer i)
      (println i)) ;=> 8
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

itistoday

#5
Perhaps you can help me out, I'm trying to wrap my head around all of this but I'm always pressed for time... :-(



Can you define this in newLISP without using contexts:


; Scheme code, written using newLISP names like 'setf' and 'fn':

(setf f (let ((var 0)) (fn () (setf var (+ 1 var)) var)))
(f) ; => 1
(f) ; => 2
Get your Objective newLISP groove on.

itistoday

#6
Thought about this whole thing some more, and I think what I'm trying to get at really doesn't have so much to do with closures as it does with the concept of an anonymous object that can be passed around without copying it.



In otherwords, a list that doesn't get copied.  Is this possible to do in newLISP using the referencing scheme that I described above? I guess this question really goes to Lutz since he knows the internals of the language.



You could then do "FOOP", except now it wouldn't be "purely functional", you could have real objects in newLISP.



Closures would be nice too though, as they get rid of the need to use the letex function, and generating crazy symbol names.
Get your Objective newLISP groove on.

Lutz

#7
- Nested contexts: I have used nested namespaces in an earlier language project. RuleTalk (a kind of Prolog) could nest contexts to any to level you wished.



When starting newLISP I deliberately cut out nesting of namespaces. They introduced unnecessary complexity.



As a side note: The language also had tail recursion elimination because recursion was the main flow control mechanism. Limiting flow control to recursion and continuations was an unnatural limitation in newLISP, perhaps beautiful from a mathematician's point of view, but a pain to use and understand for people from other disciplines.



- Functions with state in newISP and state serialization:


> (define (foo:foo) (inc foo:cnt))
(lambda () (inc foo:cnt))
> (foo)
1
> (foo)
2
> (foo)
3
> foo:cnt
3
> (save "foo.lsp" 'foo)
true
>


start a new newLISP session later:


> (load "foo.lsp")
MAIN
> (foo)
4
>


How would you do serialization (to a human readable file) in Scheme or Common Lisp? Without extra code this would not be possible.



see also here: http://www.newlisp.org/index.cgi?Closures">http://www.newlisp.org/index.cgi?Closures

and here about RuleTalk: http://www.donlucio.net/index.cgi?page=Projects">http://www.donlucio.net/index.cgi?page=Projects further down on the page.

itistoday

#8
In the post I made above I'm not really referring to nested contexts, but the concept of a list that can be passed by reference.



Don't you think that'd be a neat and useful addition to the language?



By using some low-level implementation of the 'set' function, could you use automatic reference counting to keep track of it?


(set 'x (list-ref '(0 1 2))) ; reference count of one, because symbol of 'x'
(define (modify-list L) (setf (L 1) 2))
(modify-list x)
x ; => (0 2 2)
(setf x nil) ; => deallocates the list


You'd be able to do real OOP in newLISP then, with objects maintaining state, passing them around, keeping arrays of them, etc. and it would allow quickly passing around large lists without having to come up an entire context for them. There's probably some obvious memory-related issues to this that I'm forgetting, but perhaps it could be done?
Get your Objective newLISP groove on.

Lutz

#9
I realize this thread is now more about closures, OOP and reference passing, but nested context where mentioned in the first post and have been mentioned before by other posters, so I wanted to talk about it shortly.



Also some words about dynamic scoping and globals: If you mostly avoid globals (as most newLISP programmers do, except when writing short trivial scripts), then the rules of dynamic scoping in newLISP offer the same isolation as lexical scoping because their variable environment is only defined by the local function parameters and let/letn/letx/local/doxxxx/for -parameters. Variable capture is avoided by never using quoted symbols for reference passing, but using contexts.



But now to OOP and and reference counting:



Reference counting memory management algorithms take significant computing resources in memory and speed, and it is difficult to weave the work they perform smoothly with the main language VM process. There is no modern garbage collection algorithm, which doesn't display unexpected pauses in code execution.



ORO is based on the observation that for most memory objects created, you can already define their future life and point of deletion at the moment of creation. ORO performance is synchronous and predictable.



Lexical scoping in combination with garbage collection makes reference passing easier in a programming language. But there is reference passing and maintenance of state in newLISP too. It just works differently and demands a different programming style but is independent of traditional garbage collection and it's drawbacks.



By default newLISP does reference  passing only in and out nested, built-in functions, but for user-defined functions passing copies is the default. To pass data by reference to user-defined functions, either newLISP macros or namespaces must be used.



There is no such things as "real OOP" there are just different ways to encapsulate data and behavior ;-) ... different flavors of OOP. What you call "an entire context" is just the small overhead of a symbol holding a symbol-tree pointer. The overhead of a namespace doesn't require more resources than the overhead of a lexical function closure. The following variation of your example adds minimal overhead for wrapping the data in a context and the modifying function can stay the same.



> (set 'x:x '(0 1 2))
(0 1 2)
> (define (modify-list l) (setf (l 1) 2))
(lambda (l) (setf (l 1) 2))
> (modify-list x)
2
> x:x
(0 2 2)
>


What you are looking for: anonymous objects maintaining state, collected in arrays and lists and automatically memory-managed, can all be accomplished with FOOP.



Since 10.0.1 FOOP objects can be held in default functors too for reference passing. If you look into real world applications, there are few data-objects big enough to really need reference passing in the first place.



FOOP is closer to many other OOP flavors than it looks at first. A FOOP object is a data object with a special pointer to the collection of methods which can be used for it. The same is true for many other OOP system implementations.



The "F" in FOOP illustrates the fact, that the data object is at the same time the constructor functional expression to create the object:



The first member of the data expression can be interpreted as either a pointer to the context holding the class methods applicable to the object or at the same time can be interpreted as the default functor of that class holding the constructor function for objects of that class. Evaluating the data expression is evaluating the constructor to make this data object.



This whole idea is captured in the following object constructor definition (built into newLISP since 10.0.0):


(define (Class:Class) (cons (context) (args)))

Michael Michaels came up with this beautiful definition about a year ago and it basically defines all of FOOP! The only other thing necessary was the definition of the colon ":" operator implementing polymorphism which at that time existed as a newLISP macro and now is built-in natively. The colon operator takes the first part of the data objects to select a method from a specific class.

itistoday

#10
On FOOP, I'm aware of it and I do like it, but it doesn't allow certain common OOP paradigms, at least ones that are common in most other languages.



For example, if you have a "large object" (many methods and capabilities) that's used throughout your code by many entities, then quite often it will have methods that change it.  However, you want these changes to persist with that object *itself*, so that when other parts of your code talk to that *specific* object, they get the latest changes.



With FOOP you would need to replace the entire object with a modified version of itself.  And actually you wouldn't be able to have a "centralized object" that  other objects talk to/reference.  You can't do this sort of basic OOP in newLISP. The reason is because you can't have a reference to this object without devoting an entire context to it, without that, you'd have different copies of the object all over the place, all with different values.



And I say "entire context" not because of any performance issues, but because there is only one place where contexts live, and that is the global scope, and as such you're limited on available names.  If you're project is huge, and has multiple such objects, then it puts a huge burden on the developer to make sure that any code that he uses, and all of the libraries that he includes, don't result in name conflicts between the contexts.  They are unique, and that's a huge problem with newLISP.



Your code example for the 'x:x variable/context, is a perfect example of this issue.  Who knows what could have been in 'x:x already? Who knows what sort of havoc you could have wreaked on some third-party library just by doing that?  Multiply this by several thousand, and that's the situation that you have with a large project using newLISP (as I'm potentially considering embarking upon).



FOOP is neat, I'm not bashing it, I like it. But it's not a solution to all OOP problems, it restricts you to a minimal set of capabilities, and as I love the language you've created, I'm trying to argue for something that would improve it and also result in increased adoption (I think).



As for garbage collection, I'm honestly not sure if what I was saying about the "reference list" requires it.  If you implemented it the way that I described it there would be no garbage collector.  If you could guarantee that variables are set by only one or two functions, then all of the code to do memory management would be put into those one or two low-level "set" functions.



With one version of "set", it would increment/decrement the reference count (potentially resulting in the deallocation of the list), and another version of the "set" function would not affect the reference count at all, thereby allowing weak-references to these "reference lists".



I know that there may be other memory related issues here, and it's true that you would then be able to write newLISP code that could have memory leaks, but I think it's worth it because it would greatly improve the capabilities of the language.



These three things would greatly improve the capabilities of the language:



- Reference lists (would allow more OOP, faster execution, avoid need for contexts in many situations)

- Nested contexts (would make it easier to avoid name conflicts)

- Lexical "let"-type function (would allow lexical scope in newLISP, make it easier to write code that needs it, gets rid of need for confusing letex, makes solutions to certain types of problems faster)



The first two are probably more significant. Please let me know if I've gone wrong somewhere here, but so far I don't think that what I'm suggesting is crazy...
Get your Objective newLISP groove on.

Lutz

#11
QuoteIf you're project is huge, and has multiple such objects, then it puts a huge burden on the developer to make sure that any code that he uses, and all of the libraries that he includes, don't result in name conflicts between the contexts.


Not at all: except for the names of other contexts, you are free to choose any new symbol names inside a namespace. You can even overwrite the global built-in primitives by prefixing them with the context name (but will not be able to serialize the context).



The context system has had a thorough workout in a programming team/startup three years ago and some subtle improvements in the context system were made during that time.



The group was a mixture of experienced and unexperienced programmers building a big, distributed sentence search engine. We never had the feeling that there were dangers to step on each others feet. Code was organized in namespace modules. Module names had a convention. There was one module per file and they where pre-declared in a central location. Each programmer had responsibility for his own modules, but a few modules had multiple authors. Development was controlled using Subversion (SVN) customized with newLISP syntax highlighting. We had a small QA team and people working from remote locations too.



newLISP dosn't want to and never will be a full-blown OO language with all the paradigms you find in C++ or Java. There is no inheritance to share or overwrite methods higher in a class hierarchy, but you can do simple mix-ins using 'new'. Its main paradigms are functional programming and namespaces. Over the years both paradigms have been fused together quite well (i.e. in FOOP).



To your last tree points (in different order):



I talked about my bad experience with nested contexts earlier in this thread (-> RuleTalk).



About reference counting: its not as trivial, as it looks at first. Here are few of the problems, you have to deal with when implementing such a system:



Reference counting GC is memory intense and leaves a lot of referenced cells in memory, which are never used; there are several experimental studies about this. There is an overhead of a reference counter in each cell. If the counter is not wide enough, it may overflow. In long-running systems (like our sentence search machine, mentioned above) there is always that danger. Reference garbage (non-zero counter, but not used) can only be avoided when knowing about the semantics of the program and analyzing it otherwise. Processing time has reported to be high: each time a symbol has a reassignment, the counters in old and new target cells have to be updated and checked for zero or overflow.



Reference counting is one of the older garbage collection systems and many optimizations have been developed to deal with it's disadvantages. All these have negative impact on performance and memory usage. The main optimization idea is, to exclude cells, which can be reclaimed early. This is a big part of what ORO does. So ORO does leverage some of the techniques developed as optimization for other garbage collection algorithms. The only advantage of ref-counting is easy interleaving with the main VM process.



The average, real world impact on value-copy passing versus reference passing is greatly exaggerated by most proponents of classic garbage collection. After doing measurements in newLISP done not only on constructed examples but also real application code, I realized that the issue could be neglected. Memory and performance saving of ORO more than compensate for the negatives of value-copy passing.



Although 10.0 introduced reference passing in and out of all built-in functions by default, the overall impact on speed (measured with the Debian language shootout suite) was only about 5%. Of course in isolated examples the difference can be thousands of % on bigger objects, but in a typical code mixture its little. When dealing with user-defined functions, which still do value-copy by default, use context-wrappers for reference passing. Typically these cases are rare (in my own code anyway) amd I have not seen naming confusions becuase of this.



Lexical 'let'-functions don't have any advantage over dynamically scoped 'let'-functions, unless they contain free variables and you want to have closures based on this. 'letex' is a whole different animal, it's main usage is merging macro-like expansion and evaluation into one built-in function.



I know that in principal you like newLISP and you are always among the first to defend it :-). Perhaps, if you change your newLISP programming style, adapting to the way newLISP does certain things, you will see, that it very well can be a programming-team language and that newLISP can deal with bigger systems too. Perhaps you will see that using the functional and namespace paradigms, problems can be solved in an elegant way; problems on which OO would have been applied traditionally.



The method of using dynamic scoping inside namespaces opens up a whole new world of little explored programming paradigms. Things like Aspect Oriented Programming and Programming for Separation of Concerns come to mind. I was invited to ILC-2009 and wanted to talk about these topics in relation to newLISP, but had to decline because of time and scheduling conflicts. There will be an article about this on newlisp.org one day. Namespaces and dynamic scoping gain new value when discussed from these angles.

cormullion

#12
Hey Lutz - I said you should start blogging! These would make excellent blog posts. :)



Also, I remember you saying somewhere that the old way of writing large monolithic programs wasn't newLISP's way; instead we should be thinking about small agile agents running in their dozens and hundreds on multiple computer nodes. That would be too much of a challenge for some of us (I'm a scripter, not a programmer :-), but perhaps another way of approaching the design of a large project...

Kazimir Majorinc

#13
is there a chance for system variable, say $me that returns *reference* of the function or macro? (By reference I think on "any expression that evaluates to given function?")


(set 'x (lambda() $me)
(x)  => x ; i.e. symbol x


Or


(setf (L 1) (lambda() $me)
(L (- 2 1)) => (L 1) ; i.e. list (L 1)


Maybe I asked that already but I forgot ...
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

itistoday

#14
Quote from: "Lutz"
QuoteIf you're project is huge, and has multiple such objects, then it puts a huge burden on the developer to make sure that any code that he uses, and all of the libraries that he includes, don't result in name conflicts between the contexts.


Not at all: except for the names of other contexts


That's exactly my point, it's the names of the contexts that you have to worry about, not the names of symbols.  As I mentioned above, a simple and seemingly harmless line like this:


> (set 'x:x '(0 1 2))

Can become a real liability if some third-party library uses 'x:x to do important work. It could lead to hard to find bugs and puts a burden on the developer and team to be very careful when using contexts.  It means that each time you want to use a context to pass data by reference (and in a large project you'll do this all the time), you've got to make sure you're not using a context you shouldn't be.


QuotenewLISP dosn't want to and never will be a full-blown OO language with all the paradigms you find in C++ or Java. There is no inheritance to share or overwrite methods higher in a class hierarchy, but you can do simple mix-ins using 'new'. Its main paradigms are functional programming and namespaces. Over the years both paradigms have been fused together quite well (i.e. in FOOP).


I'm not calling for full-blown OOP, just reference lists, and I'm not even saying that all lists should be reference lists, there would be special functions to create them, perhaps a special syntax as well, for example, instead of the quote, you could use a backtick or something like that:


(set 'a-ref-list `(0 1 2))

QuoteI talked about my bad experience with nested contexts earlier in this thread (-> RuleTalk).


Not in much detail though... They're really not that important though if reference lists are implemented.


QuoteAbout reference counting: its not as trivial, as it looks at first. Here are few of the problems, you have to deal with when implementing such a system:



Reference counting GC is memory intense and leaves a lot of referenced cells in memory, which are never used; there are several experimental studies about this. There is an overhead of a reference counter in each cell. If the counter is not wide enough, it may overflow. In long-running systems (like our sentence search machine, mentioned above) there is always that danger. Reference garbage (non-zero counter, but not used) can only be avoided when knowing about the semantics of the program and analyzing it otherwise. Processing time has reported to be high: each time a symbol has a reassignment, the counters in old and new target cells have to be updated and checked for zero or overflow.



Reference counting is one of the older garbage collection systems and many optimizations have been developed to deal with it's disadvantages. All these have negative impact on performance and memory usage. The main optimization idea is, to exclude cells, which can be reclaimed early. This is a big part of what ORO does. So ORO does leverage some of the techniques developed as optimization for other garbage collection algorithms. The only advantage of ref-counting is easy interleaving with the main VM process.


I have to admit here that I don't know enough on the subject to really put up a strong defense.  However, based only on what you've said above, I'm still not convinced that it would impact performance or memory, or have any of the side-effects that you're describing.



First, there wouldn't be a GC since the reference counting is done in the 'set' function, therefore I would imagine you'd still have consistent execution times.



Second, I'm really skeptical when you say that it would impact performance.  I bet that those studies were done on languages where everything was reference counted, which is quite different from what I'm proposing, which would be to have only a single special type that was reference counted.  Your 'set' function would look (very roughly) like this:


ptr_t set(ptr_t symbol, ptr_t value)
{
if ( typeof(symbol) == TYPE_REFLIST ) {
releaseRefList(symbol);
}
if ( typeof(value) == TYPE_REFLIST ) {
retainRefList(value);
}
_set(symbol, value);
}

// set_weakRef defined here - does not release or retain


Seeing as this code would only be run on reference lists, I doubt it would impact performance at all, you'd still be able to write the same kinds of newLISP programs you can now, without ever using reference lists.  That functionality would simply be there when you need it.
Get your Objective newLISP groove on.