Request for Closures - Contexts insufficient

Started by itistoday, March 27, 2009, 09:16:05 AM

Previous topic - Next topic

Kazimir Majorinc

#15
You can try this


(set 'reflist74 '(0 1 2))
(set 'a-ref-list 'reflist74)


So, reflist74 is the reference. Code can be bit more complicated, because always some "dereference" is needed, for example, (push 3 (eval a-ref-list)) instead of (push 3 a-ref-list) but it doesn't look like big deal. Advantage is that semantics of the language stays simple.  With time, the functions like following one can be defined:


(set 'PUSH (lambda (x L)(push x (eval L)))))
(PUSH 10 a-ref-list) =>(10 0 1 2)


Adding new type, especially if similar to existing, make language more complicated and less comprehensible. Every general purpose function should then take care about both cases, "reference" and "nonreference" lists.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

itistoday

#16
Kazimir, here is a really simple, common example of why that's inferior to reference lists:



Say you have a system that keeps track of some sort of a "node" (a chunk of various information), for example, a list of users, or a list of nodes representing various locations on a map (cities, restaurants, etc.).



Now, you're developing a very complicated system that uses these nodes all over the place, and has hundreds of functions for manipulating them.  But, you only have a single list of nodes in the system.



For example, let's stick with the "users" example and say our system is this forum.  You do not want to have multiple copies of users all over the place. All functions are given a "global user" object, and when they modify that user, that change is global.



Sure, you could do this in newLISP by storing them in a "UsersContext", perhaps either as huge list of lists, or maybe a red-black tree that uses a userID to refer to the user, but then you are not able to temporarily de-couple the individual users from that list.  In your example, you are forced to associate a list with a specific symbol because you can't use reference lists.



In this example of a forum, anytime a user needed to be modified, you would have to pass around the entire context to functions that normally would simply take the user itself, and then these functions would have to do a lookup on the context/list to modify the user.  This could result in a significant performance hit, because you've got 10,000 users, and each lookup is *at best* O(log(n)) (red-black tree, using the userID as a key).



Plus, if you don't want all of your user-modifying functions to look like this:


(define (modify-username user-list user-id new-name) ... )

Then you end up having to make sure that all functions make outgoing references to the UsersContext context, a *very* odd thing to do because it means that all of your functions are handicapped in that they can only be used to modify users that are stored in that particular context.



Thus, the lack of reference lists means that you're forced to write poor code that's unnecessarily slow too.



This is just a trivial example, I could give you thousands of others because the concept of "node" is so common in large projects.


Quote from: "Kazimir Majorinc"Adding new type, especially if similar to existing, make language more complicated and less comprehensible.


I do not think so.  Reference lists will behave exactly as normal lists do, it's just up to the programmer to make sure not to do something silly that could cause a memory leak, as many are already used to doing with languages like C, C++, and Objective-C.  You could also add functions like 'ref-list?' so that if for some reason you needed to distinguish between a list and reflist, you could.


QuoteEvery general purpose function should then take care about both cases, "reference" and "nonreference" lists.


They already distinguish amongst many other types, this wouldn't be difficult, and the two would be represented internally very similarly.  I think the advantages of reference lists would far outweigh (as far as I can tell), any disadvantages.



Edit: One thing you could do, is generate 10,000 symbols to represent your users and store them in the list and pass them around doing the silly eval trick on them.  This is not only poor programming style, but it would slow down the performance of the system greatly due to the constant eval function calls, not to mention because it would fill up symbol tree with 10,000 additional symbols (what if you had a million nodes??), slowing down every symbol lookup.  And then you'd still have to make sure that nobody else accidentally uses your symbols for some different purpose (again, part of the poor programming practice).
Get your Objective newLISP groove on.

Kazimir Majorinc

#17
QuoteEdit: One thing you could do, is generate 10,000 symbols to represent your users and store them in the list and pass them around doing the silly eval trick on them. This is not only poor programming style, but it would slow down the performance of the system greatly due to the constant eval function calls, not to mention because it would fill up symbol tree with 10,000 additional symbols (what if you had a million nodes??), slowing down every symbol lookup. And then you'd still have to make sure that nobody else accidentally uses your symbols for some different purpose (again, part of the poor programming practice).


Well, difference is probably that I think that it is not silly trick but adequate approach for Lisp. O(log(n)) is theoretically satisfactory, and in practice look up in the symbol table almost doesn't depend on size of table.



symbols: 1

expression: (begin (setf j0 1) (setf j0 1) (setf j0 1))

times: 1 000 000

total time: 316



symbols: 10

expression: (begin (setf j8 1) (setf j5 1) (setf j4 1))

times: 1 000 000

total time: 289



symbols: 100

expression: (begin (setf j35 1) (setf j89 1) (setf j82 1))

times: 1 000 000

total time: 288



symbols: 1000

expression: (begin (setf j746 1) (setf j174 1) (setf j858 1))

times: 1 000 000

total time: 319



symbols: 10000

expression: (begin (setf j7105 1) (setf j5135 1) (setf j3039 1))

times: 1 000 000

total time: 295



symbols: 100000

expression: (begin (setf j1498 1) (setf j9140 1) (setf j36445 1))

times: 1 000 000

total time: 326



symbols: 1000000

expression: (begin (setf j147312 1) (setf j165898 1) (setf j988525 1))

times: 1 000 000

total time: 321



symbols: 10000000

expression: (begin (setf j4456923 1) (setf j1190832 1) (setf j46693 1))

times: 1 000 000

total time: 297
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

itistoday

#18
Those are impressive benchmarks, I'm going to write some of my own benchmarks to verify these results as I suspect there may be some unreality to them, but if there's really virtually no performance hit then this would be an ugly (IMO) but viable way of doing things.



Right now I'm refreshing my newLISP skills (haven't programmed in it in a bit) by studying your http://kazimirmajorinc.blogspot.com/2009/03/gensym-and-genlet.html">genlet function (very impressive stuff), and then I'll be on to the benchmarks, so hopefully you'll hear from me soon. :-)
Get your Objective newLISP groove on.

Lutz

#19
Although Kazimir's deliberations about gensym etc. are enlightening and interesting because they reveal more about the inner workings and performance of newLISP, there is too much worry about variable capture in newLISP.



For the less initiated here some rules and explanations regarding variable capture and dangers perceived when there are none.



- there is no danger of variable capture when reusing variable names in nested functions, 'let' and nested loop expressions. All parameter variables are internally saved on an environment stack and restored after use. You can even do the following without danger:


(dotimes (i 3) (println "->" i)
    (dotimes (i 3) (println i)))


allthough the inner loop uses the same variable name than the outer loop, it is totally safe in newLISP and unsafe in many other languages.



The same is true for functions made with 'define'. Even the following somewhat crazy code will work flawlessly with 'let' reusing the the same variable names:


(define (foo x y)
(println x ":" y)
(let ( (x (div x 2)) (y (div y 2)) )
(println x "::" y))
(println x ":" y))

> (foo 3 4)
3:4
1.5::2
3:4
4  ;  return value of foo
>
)


the 'let' expression knows, that the left x is not the same as the right x. The left x is from the inner 'let' and the right x from the 'foo' parameter. After leaving 'let' everything is just like it was before. You also could call a function 'bar' with x and y parameter names from inside 'foo' or 'let' and nothing bad would ever happen.



When using normal functions made with 'define', variable capture can only occur when passing quoted symbols. Passing quoted symbols is something very rarely happening in newLISP anyway. If you have to do it, then either isolate that function in a context/namespace or use 'args' to retrieve the parameters. Both methods are safe against variable capture.



- the only real danger of variable capture occurs in fexpr's (define-macro in newLISP). Using something like 'gensym' and 'eval' to work around this will work, but you still affect performance and on repeated calling more and more symbols will be created by 'gensym' and waste memory. They could be deleted after use, but that would take even more CPU cycles. It is better to use 'args' and have no parameter variables at all or to enclose the function-macro and parameters in its own context:


(define-macro (foo:foo foo:x foo:y)
....
)

; or as an alternative

(context 'foo)
(define-macro (foo x y)
    ...
)
(context MAIN)

; and in either case you can call with the simple name

(foo ...)


'foo' is now a global macro (actually fexpr) and completely safe against variable capture, even when passing quoted symbols. There is no performance loss doing it this way, even when writing millions of macros.



- also any other function or macro (fexpr) in a module 'foo:this', 'foo:that' is completely safe against variable capture/confusion, even when passing quoted symbols.



- last not least: why does newLISP call fexprs macros? Because from a usage point of view they are used to accomplish the same thing: write functions which do not follow the usual parameter evaluation rules of evaluating all parameters at first but control parameter evaluation yourself using 'eval'. I find the application oriented definition more useful.

Kazimir Majorinc

#20
I cannot but agree with Lutz ...
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

itistoday

#21
I disagree on several points:


Quote from: "Lutz"- there is no danger of variable capture when reusing variable names in nested functions, 'let' and nested loop expressions. All parameter variables are internally saved on an environment stack and restored after use. You can even do the following without danger:


(dotimes (i 3) (println "->" i)
    (dotimes (i 3) (println i)))


allthough the inner loop uses the same variable name than the outer loop, it is totally safe in newLISP and unsafe in many other languages.


While I agree with you, I don't think anyone claimed that there was a danger of variable capture with normal functions. I also think that this is not only true for newLISP but for other LISPs as well, and other languages too (even C can do this). So there is nothing spectacular about the nested dotimes loop...



As you say, the real danger is with macros/fexprs, and in my studies I've discovered that this is doubly-so for newLISP (will provide further details on this below).


QuoteIt is better to use 'args' and have no parameter variables at all or to enclose the function-macro and parameters in its own context:


(define-macro (foo:foo foo:x foo:y)
....
)

; or as an alternative

(context 'foo)
(define-macro (foo x y)
    ...
)
(context MAIN)

; and in either case you can call with the simple name

(foo ...)


'foo' is now a global macro (actually fexpr) and completely safe against variable capture, even when passing quoted symbols. There is no performance loss doing it this way, even when writing millions of macros.


Here I have to strongly disagree.  The context solution that you present is not a real solution, it may be something to use when writing short, single-file programs in newLISP, but because newLISP doesn't support nested contexts you're begging for trouble.



If a language has aspirations of expansion and wider adoption (as newLISP does), then it had better assure the legions of developers out there that their code will work as expected, and that it won't be broken due to name conflicts with any of the hundreds of third-party libraries available to it.



C has this problem, but even it is equipped to handle this through its use of header files and the 'static' keyword.  The only solution available to this problem in newLISP is naming conventions and contexts.  You can't just blow away millions of available contexts, as that's exactly what you're proposing.



In semi-large project, or even a smaller one that links to many libraries, there are thousands of functions/macros.  If for every macro you dedicated an entire namespace you would almost be guaranteed to run into a conflict.  Run your code it works fine, load this library and suddenly everything is broken.  Even if by some fluke you managed to avoid any name conflicts, you would be under tremendous pressure for the rest of your development cycle to make sure that you watch which names you pick for your macros/contexts.  Personally, I do not accept the above as a real solution to the variable capture problem.


Quote- also any other function or macro (fexpr) in a module 'foo:this', 'foo:that' is completely safe against variable capture/confusion, even when passing quoted symbols.


I'm fairly certain that is not true when it's called by another function/macro in the same context...



From what I can tell, when writing safe macros (as most macros should be), you should never name your arguments and always access them through the args function.  That immediately avoids one obvious form of variable capture:


> (define-macro (foo x) (println (eval x)))
(lambda-macro (x) (println (eval x)))
> (setq x 5)
5
> (foo x)
x


However, there are many different kinds of variable capture, and I recently ran into a great example of one while studying a http://www.ccs.neu.edu/home/dorai/t-y-scheme/t-y-scheme-Z-H-10.html#node_chap_8">Scheme tutorial:


; Scheme
(define-macro my-or
(lambda (x y)
`(if ,x ,x ,y)))

; newLISP equivalent
(define-macro (my-or)
(let (temp (eval (args 0)))
(if temp temp (eval (args 1)))))


'my-or' is a function that should only evaluate its first argument once, if that evaluates to true then that value is returned, otherwise the second argument is returned.



my-or doesn't use any named arguments, but it creates a symbol called 'temp', and that leads to trouble:


> (setq temp 5)
5
> (my-or nil temp)
nil ; Ack! Should be 5!


I spent a while studying Kazimir's genlet function (http://www.alh.net/newlisp/phpbb/viewtopic.php?p=15205#15205">wrote my own version of it), but then I discovered that even genlet won't save you here because of the use of (args).  Because newLISP macros are fexprs, you cannot use the Scheme solution either:


(define-macro my-or
  (lambda (x y)
    (let ((temp (gensym)))
      `(let ((,temp ,x))
         (if ,temp ,temp ,y)))))


I eventually did figure out how to write my-or in newLISP, but that's a subject for http://www.alh.net/newlisp/phpbb/viewtopic.php?p=15200#15200">another thread.



While I do love many things about newLISP, I think there are valid points to be brought up about some of its weaknesses.  To me they are potential pit-falls of the language.  At the very least I think these are issues that people looking to learn the language should be aware of.
Get your Objective newLISP groove on.

Lutz

#22
QuoteI also think that this is not only true for newLISP but for other LISPs as well, and other languages too (even C can do this). So there is nothing spectacular about the nested dotimes loop


Other languages allow this only if you create a new scope using a new variable declaration, e.g. in JavaScript: for(var i; ...) or Java: for(int i; ...). In C you would have to create a new scope using braces: {int i; for(i; ...) {}}.



The special thing about newLISP is, that it creates the new scope for looping variables automatically. This is basically a "must" for a dynamically scoped language. If you wouldn't have that, you would have to enclose each 'dotimes', 'dolist' etc. in an extra 'let' to create a local scope for the looping variable.

itistoday

#23
I still fail to see how there's anything spectacular about dotimes creating a new scope "automatically"... With almost the same amount of characters, most other languages do that too, whether through the use of braces or what-not.



Regardless, there's still the unaddressed issue of name conflicts.  It would be nice if there were something in the manual or CodePatterns document that addressed this issue and perhaps advocated the use of organizational prefixes if default functors are to be the def-facto way of writing macros.



Edit: I think I'm now more with you on the macro situation, see http://www.alh.net/newlisp/phpbb/viewtopic.php?p=15224#15224">this post.
Get your Objective newLISP groove on.