RFC: A reference counted newLISP (and how to implement it).

Started by itistoday, October 28, 2009, 05:08:23 PM

Previous topic - Next topic

itistoday

I've been thinking about this for a while now, as I had a hint that it would be possible. Because of a discussion spawned from http://newlispfanclub.alh.net/forum/viewtopic.php?f=8&t=2961">$this thread (pun intended), I've finally sat down and have written out exactly how this can be done, and why it will likely result in a faster and more flexible newLISP. I do not ask that you accept it, all I ask is that you consider it. There may very well be a good reason not to do this, but first please hear me out.



The argument that is frequently brought up against reference counting is that it can result in complicated code, and memory leaks through cyclical referencing (AKA retain cycles).  I think that both of these two arguments can be avoided if reference counting is implemented in a certain way.



Remember, newLISP's ORO, while great in many ways, is slower than incrementing/decrementing integers (which is what reference counting is) because it results in a lot of copying of data, and in turn in a lot of allocation and deallocation of data. It also means that it's hard to do certain things in newLISP, like implement complicated object-oriented structures and relationships.



But first let's make sure we understand what the argument is against reference counting.



Circular references, or retain cycles, occur in other languages like so:



1) I have a reference to a "Car" object.
Car a = new Car(); // 'a' has a retain count of 1
2) The Car class, when instantiated, has an private member called "Filter" that it retains:
public Car() {
    this.filter = new Filter(this); // 'this' car has retains 'filter'
}

3) Filter, when instantiated as above, takes a Car and retains it too! So actually, I was wrong, in this line 'a' has a retain count of 2:
Car a = new Car(); // 'a' has retain count of 2!
release(a); // 'a' is not destroyed!


Even though 'a' should be deallocated there, it's not, because the car effectively retained itself.



Is there a way to fix this? Let's try:
public Car() {
    this.filter = new Filter(this); // 'this' car has retains 'filter'
    release(this); // we're back down to 1
}


Close, now when we create a new Car its retain count will be 1 instead of 2, but trouble starts if we try to destroy the car:
Car a = new Car(); // this time, 'a' does have a retain count of 1
release(a); // *car crash!*


When we release it the car, the car gets an RC of 0 and calls its deallocation method (finalize, dealloc, whatever), the car releases it reference to the filter, and the filter in turn releases its reference to the car again! That's not good because you've double-freed memory!



----------------------------------------------------------------------------------



So, is an alternative possible? I believe I can implement this exact scenario in a special dialect of newLISP that uses reference counting in:



1) A clean and efficient way.

2) Without worrying about retain cycles.



How this would work is shown in an example program below that does the same thing with the Car and the Filter, but without the retain cycle. Please remember that this is *not* newLISP code, but something similar:


; while reading this, you may wish to simultaneously read the first couple of comments
; down below after the function definitions.

(define (Car:Car this , filter)
; in this function, the symbols 'this' and 'filter' are not associated with the 'Car'
; context, but with an anonymous context created for each invocation of this function.
; The anonymous context disappears when all of its symbols are deallocated.
(set 'filter (Filter 'filter this))
(list (context) this filter)
; at the end of this function imagine: (delete 'filter 'this) which results in a decrement
; to the reference/retain count (RC) associated with the *symbols* "this" and "filter"
; which, again, exist in an anonymous context. When the function exists, their RC of "this" will
; be 2, because it's retained by the list and by the symbol in the list returned by Filter.
; The RC of the symbol "filter" will be 1 because it's only retained by the "this" symbol in
; the list returned by the call to Filter. The last thing in the list returned by this function is
; a *list* returned by the function Filter. Its retain count will be 1 only, because only the
; list here retains it. Notice that the list's RC is 1 and not 2 as you might expect. That is
; because when a list is created its RC is  "in limbo". It's really 0, but it's held for a short
; period after the list is created. If it isn't immediately assigned to another symbol it will be
; deallocated. This equally applies to the list returned by this function.
)
(define (Filter:Filter this car)
; here again "this" and "car" are symbols in a different anonymous context created when this function
; is called. they start out with a retain count of 1 and they retain their values! (other symbols passed in).
(list (context) this car)
; they are retained when they are passed into the list (RC:2). And then they are released when the
; function exits (RC:1)
)
(define (Car:filter car)
; here again, "car" is a different symbol from the one passed in, but it "points to" the same value
; as whatever symbol that's passed in. That value is the list from Car:Car!
(car 2)
; implicit indexing fetches the value from the symbol (which should point to a list) and then
; performs indexing on that *value*!
)
(define (Filter:car filter)
(filter 2)
)

(set 'car (Car 'car))
; => (Car <anonCtx2343>:this <anonCtx2343>:filter)
; Note! At this point MAIN:car's retain count is 2! It had an initial retain count for being brought into existence
; as a symbol, and then <anonCtx2343>:this retained it as well! BUT! The value pointed to by the symbol, the list, its
; RC is 1! Because the only thing holding onto it is the symbol MAIN:car!
(:filter car)
; => <anonCtx2343>:filter
(eval (:filter car))
; => (Filter <anonCtx223>:this <anonCtx223>:car)

; now we do what the other languages can't!
(set 'car nil)
; it's important to note that this call is *different* from what happens at the end of a function call! It is
; *not* equivalent to (delete 'car). The symbol MAIN:car still exists after (set 'car nil) is done.
; What happens instead is that the *value* pointed to by the symbol is fetched, and *its* retain count
; is decremented. That value was the list created by Car:Car, which only had a retain count of 1
; (because it was retained by MAIN:car). So the list is deallocated, and in the process it goes through
; and releases everything inside of it. Let's ignore the first thing in the list (the context Car)
; The next thing is <anonCtx2343>:this. The *symbol*'s RC is decremented, it is now 1. Next is actually
; not a symbol but a list! Its RC is 1! It's decremented! It's 0! As it is destroyed, everything inside of
; it is released too! So <anonCtx223>:this is released and destroyed! What it pointed to (the symbol
; <anonCtx2343>:filter), is released as well and it *too* is destroyed because its RC was only 1!
; Finally, <anonCtx223>:car, which only has an RC of 1 is released, and in its destruction the final
; symbol to survive, <anonCtx2343>:this, is destroyed as well!

; Memory Managed!


The code again (to show how clean it is) without gratuitous comments and exclamation points:


(define (Car:Car this , filter)
(set 'filter (Filter 'filter this))
(list (context) this filter)
)
(define (Filter:Filter this car)
(list (context) this car)
)
(define (Car:filter car)
(car 2)
)
(define (Filter:car filter)
(filter 2)
)
(set 'car (Car 'car))
(:filter car)
(eval (:filter car))
(set 'car nil)


It could be made even cleaner actually if certain "syntactic sugar" was adopted, like having a special way of creating instances that automatically passes in the self-referential symbol to the function/constructor.



Thoughts?
Get your Objective newLISP groove on.

cormullion

I really wish I had something more intelligent to say about this, but I'm really impressed by the obvious thought you've put in. (I have some recollection of some similar ideas from the Objective-C lectures I started watching (but didn't finish watching) on iTunes earlier this year...)



As a scripter rather than a programmer, I don't understand too much of what you're saying here. So, to stick to the simple and practical considerations first: is this a proposed addition or extension to newLISP or is it a change to the way it currently works? In other words, will an implementation of these ideas sit on top of what we already have or is it a pervasive rewrite of everything  such that all current code is no longer likely to work?



If newLISP moves in the direction of 'complicated object-oriented structures and relationships', is it also moving away from the purposes and uses that it currently professes (it's a 'pragmatic and casual scripting language')? Or do you think it's possible to embrace both these approaches in a single language?

itistoday

Quote from: "cormullion"As a scripter rather than a programmer, I don't understand too much of what you're saying here. So, to stick to the simple and practical considerations first: is this a proposed addition or extension to newLISP or is it a change to the way it currently works? In other words, will an implementation of these ideas sit on top of what we already have or is it a pervasive rewrite of everything  such that all current code is no longer likely to work?


It would probably result in a rewrite of a decent chunk of newLISP, and yes, code that was written for newLISP originally would probably need to be partially rewritten. However looking at the Car/Filter example, you can see that it's still surprisingly similar, so I think porting code from newLISP to this dialect wouldn't be too difficult.


QuoteIf newLISP moves in the direction of 'complicated object-oriented structures and relationships', is it also moving away from the purposes and uses that it currently professes (it's a 'pragmatic and casual scripting language')? Or do you think it's possible to embrace both these approaches in a single language?


This is an excellent question, and I'm pretty sure the answer is that you can embrace both of these approaches in a single language. I *love* newLISP *because* it allows for "pragmatic and casual scripting".  This dialect would allow for that as well!  If my suggestions are practically realizable then it should result in a language that is a super-set of newLISP. Meaning, you would be able to do everything that newLISP can do (and in a similar manner), but in addition, you would have the capability of delving into complex object relationships, and you wouldn't be using hacks to pass data by reference. I thought contexts were designed to group data and functions together, not to serve as a poor alternative to pointers!



Another amazing thing that I've noticed is that I'm pretty sure this approach would allow for safe macros! Remember the http://newlispfanclub.alh.net/forum/viewtopic.php?f=5&t=2712">contest on writing 'my-or' in newlisp?  If fexpr parameters were always in their own anonymous context the problem should disappear! :-)
Get your Objective newLISP groove on.

itistoday

Lutz suggested that I try and implement this to get an idea of the sorts of issues that this could present, so I think I will go ahead and attempt that to satisfy my curiosity. But first I just need to figure out a good name for it...
Get your Objective newLISP groove on.

itistoday

Might not be possible. See here for an attempt at this that uses a 5 node doubly-linked list:



https://gist.github.com/taoeffect/6847427">https://gist.github.com/taoeffect/6847427
Get your Objective newLISP groove on.

Astrobe

Maybe that work could be reused for a different thing.



I didn't really run into the issue as of yet, but a potential memory management problem may happen with external libraries that return allocated object, and you are responsible for freeing them. It seems there exist no way to "link" with Newlisp's memory management. That is, get the signal that a variable goes out of scope, so what it refers to should be deleted.



Maybe your work could find some reuse there.



If I run into that issue I would probably use some RAII and/or an idiom I saw once in some Lisp source:



(with-window (W 100 200) ; where 100 200 are dimensions of the window for instance
  (:show W))


In this example, one assumes that some GUI library provides a window creation primitive that return a pointer to a malloc'ed window object that must be freed when finished. The with-window calls this primitive, executes its body, and finally free the window object.



This protects from memory leaks, but the reference to the window object can still escape the local scope, and cause a crash if the user is reckless. While I don't imply that reckless users should be supported, supporting library authors is good for the ecosystem.

Lutz

C-library APIs are always very explicit about how to handle memory and there are two cases to handle.



In the first case the imported C-library allocates memory for certain resources and the imported API supplies a function to free those resources, e.g. in the module mysql.lsp "mysql_free_result" is imported to free memory in result handles.



The second case occurs if the C-API needs the caller to preallocate memory. As an example see the Windows SDK functions GetWindowsDirectoryA() and GetComputerNameA() in the reference manual for import. This type of memory would be managed automatically by newLISP.