Streams in newLISP

Started by itistoday, February 23, 2008, 03:51:06 PM

Previous topic - Next topic

itistoday

Note: This was originally http://www.alh.net/newlisp/phpbb/viewtopic.php?p=11844#11844">a response to Pavel in his thread, but I figure it deserves its own thread.


Quote from: "Elica"Now let me go back to newLisp. Does it support streams/sequences? (Not in the sense of I/O streams, but in the sense of infinite lists with delayed evaluation).


Thanks for asking this question.  I was planning on investigating this question myself a while ago but never got around to it.  Because of you I decided to spend the time finally. :-)



Short answer: Yes, but not elegantly like with Scheme.  I think this is mainly due to the newLISP's dynamic scope.  For example, trying to do memoization does not seem ... possible (actually it is, see several posts down).  You might be able to do something with contexts but I think it would turn out ugly...



Long answer](set 'the-empty-stream '())

(define-macro (cons-stream)
   (letex (x (args 0) y (args 1))
      (cons x (list (delay y)))
   )
)

(define-macro (delay)
   (letex (x (args 0)) (fn () x))
   ;(letex (x (args 0)) (memo-proc (fn () x)))
)

(define (memo-proc)
   (letex (func (args 0))
      (fn ()
         (if (not already-run?)
            (begin
               (set 'result (func))
               (set 'already-run true)
            )
         )
         result
      )
   )
)

(define (head s)
   (first s)
)

(define (tail p)
   (force (last p))
)

(define (force func)
   (func)
)

(define (nth-stream n s)
   (if (= n 0)
      (head s)
      (nth-stream (- n 1) (tail s))
   )
)

(define (empty-stream? s)
   (empty? s)
)

(define (print-stream s)
   (if (empty-stream? s)
      "done"
      (begin
         (print (head s) " ")
         (print-stream (tail s))
      )
   )
)

(define (enum-interval low high)
   (if (> low high)
      the-empty-stream
      (expand (cons-stream low (enum-interval (+ low 1) high)) 'low 'high)
   )
)

(define (integers-from n)
   (expand (cons-stream n (integers-from (+ n 1))) 'n)
)

(set 'integers (integers-from 1))
(println "nth 20: " (nth-stream 20 integers))

(while true
   (print "> ")
   (read-line)
   (if (= (current-line) "")
      (exit)
      (println (eval-string (current-line)))
   )
)
[/code]

I tried:


> (print-stream integers)

And it printed up to 678 and then crashed:


call stack overflow in function list : delay
called from user defined function integers-from
called from user defined function func
called from user defined function force
called from user defined function tail
called from user defined function print-stream
called from user defined function print-stream
called from user defined function print-stream
...




This is simply because newLISP doesn't have tail-call recursion.  You can change the stack size using the -s option when running newlisp.  Update]


(define (print-stream s)
(while (not (empty-stream? s))
(print (head s) " ")
(set 's (tail s))
)
"done"
)


You can have fun, here's an example using the above program's evaluator:


> (set 'ones (cons-stream 1 ones))
(1 (lambda () ones))
> (print-stream ones)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .... (it goes on.. and then crashes)
Get your Objective newLISP groove on.

itistoday

#1
One of the annoying things here is that it seems you have to use the letex or expand functions in most of the situations where you call (cons-stream).  I was trying to write map-stream but I'm encountering a weird problem:


(define (map-stream func s)
(if (empty-stream? s)
the-empty-stream
(letex ((a (func (head s)))
(b func)
(c (tail s)))
(cons-stream a (map-stream b c))
)
)
)


enum-interval works as demonstrated here:


> (set 'interval (enum-interval 10 100))
(10 (lambda () (enum-interval (+ 10 1) 100)))
> (print-stream interval)
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 done


But there's a problem when trying to use map-stream as demonstrated here:


> (set 'bigger (map-stream (fn (x) (* x 2)) interval))
(20 (lambda () (map-stream (lambda (x) (* x 2)) (11 (lambda () (enum-interval (+
       11 1) 100))))))
> (force (last bigger))
()


The problem occurs because of this letex expression from map-stream:


(letex ((a (func (head s)))
(b func)
(c (tail s)))
(cons-stream a (map-stream b c))
)


For some reason the symbol 'c' is being set to the empty list.  Help?
Get your Objective newLISP groove on.

itistoday

#2
nm - see below.
Get your Objective newLISP groove on.

itistoday

#3
Got it!



The problem is that this kind of list:


(11 (lambda () (enum-interval (+ 11 1) 100)))

Will get evaluated (duh!).  newlisp then interprets that as the command for doing an implicit slice of the list (everything from the 12th position onwards), and this returns '() obviously!  So I just quoted it and that fixed it:


(define (map-stream func s)
(if (empty-stream? s)
the-empty-stream
(letex (a (func (head s))
b func
c (tail s))
(cons-stream a (map-stream b 'c))
)
)
)


And now... ladies and gentlemen, map-stream in action!


> (set 'i (enum-interval 10 100))
(10 (lambda () (enum-interval (+ 10 1) 100)))
> (set 'b (map-stream (fn (x) (* x 2)) i))
(20 (lambda () (map-stream (lambda (x) (* x 2)) (quote (11 (lambda () (enum-interval
       (+ 11 1) 100)))))))
> (print-stream b)
20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 192 194 196 198 200 done


Tada!
Get your Objective newLISP groove on.

itistoday

#4
Here are the new functions.  With these you can do all sorts of powerful things.  The only problem that exists with doing streams in newLISP is that so far I don't think you can do memoization, but I would be very happy if someone proved me wrong on that.  Anyways, here are map-stream, filter-stream, and accumulate:


(define (map-stream func s)
(if (empty-stream? s)
the-empty-stream
(letex (a (func (head s))
b func
c (tail s))
(cons-stream a (map-stream b 'c))
)
)
)

(define (filter-stream pred s)
(cond
((empty-stream? s) the-empty-stream)
((pred (head s))
(letex (a (head s)
b pred
c (tail s))
(cons-stream a (filter-stream b 'c))
)
)
(true (filter-stream pred (tail s)))
)
)

(define (accumulate combiner init-val s)
(if (empty-stream? s)
init-val
(combiner (head s) (accumulate combiner init-val (tail s)))
)
)


Here's an example straight out of SICP Lecture 6a that you can find http://swiss.csail.mit.edu/classes/6.001/abelson-sussman-lectures/">here:


(set 'odd (fn (x) (= (& x 1) 0)))

(define (fib n)
(let (f '(0 1))
(dotimes (i (- n 1)) (push (+ (f -1) (f -2)) f -1))
(f -1)
)
)

(define (odd-fibs n)
(accumulate cons '()
(filter-stream odd
(map-stream fib (enum-interval 1 n))
)
)
)


Results of running this in newlisp:


> (odd-fibs 30)
(2 8 34 144 610 2584 10946 46368 196418 832040)
> (time (odd-fibs 30))
5
Get your Objective newLISP groove on.

Elica

#5
Quote from: "itistoday"Note: This was originally a response to Pavel in his thread, but I figure it deserves its own thread.


Yes, absolutely. I was thinking to do the same. Anyway, is there a way to post few images? I have the paper which I'm talking about only as scanned images.



I'd like to show you the paper, so that you could see all the nice examples in it. After this I'll demonstrate how each lisp example can be done in Logo (of course, I mean Elica Logo), and then it might be easier to reimplement them in newLisp ... or it might just give some ideas for other features... or in the worst case by reimplementing you will have some proof-of-concept that newLisp's streams are fine...



So, the first problem is how to give you the images... I can upload them temporarily somewhere on http://www.elica.net">www.elica.net, but it would be nice if they are available for longer than this temporarity(sic)

Elica

#6
Quote from: "Elica"Anyway, is there a way to post few images?


The scanned paper is about 1000* pages, their size is around 1MB (all of them). I can try modifying the images to reduce the size, but this could make reading harder...



The paper is called "Concept oriented learning environments - a project using sequences in Scheme", written by Herbert Loethe, presented at EuroLogo 2003. The images I have are scanned from the proceedings of the conference.

_________

[size=75]* binary, of course[/size]

itistoday

#7
I haven't read that paper, but yeah if you want to post some links up go for it. :-)



In http://www.alh.net/newlisp/phpbb/viewtopic.php?p=11918#11918">another thread Lutz showed me an interesting function that I was able to use to do memoization with delay (thanks Lutz!).  He wrote two versions of this 'memoize' function, one uses linked-lists and another uses B-trees I think.



First, the new version of delay:


(define-macro (delay)
(let (memo-func-name (sym (string (args 0 0) "-memo")))
(if (nil? (sym memo-func-name memo-func-name nil))
(letex (a (args 0 0) b memo-func-name) (memoize a b)))

(letex (params (rest (args 0)))
(letex (f (cons memo-func-name 'params))
(fn () f)
)
)
)
)


It calls memoize which takes a function and creates a new memoized function with "-memo" tacked onto its name as shown here:


> (delay (+ 1 1))
(lambda () (+-memo 1 1))


Here are the two versions of memoize:


(define-macro (memoize func mem-func)
(set (sym mem-func mem-func)
(letex (f func m (sym "mem" mem-func))
(lambda ()
(or (and m (lookup (args) m))
(last (push (list (args) (apply f (args))) m))
)
)
)
)
)

; using contexts to do lookup
(define-macro (memoize func mem-func)
(set (sym mem-func mem-func)
(letex (f func  c mem-func)
(lambda ()
(or (context c (string (args)))
(context c (string (args)) (apply f (args)))
)
)
)
)
)


I used the odd-fibs function to do benchmarks, and discovered some interesting peculiarities.  First, this kind of memoization severely degraded performance on the first run, however, it also improved it greatly if the same exact code was executed again:


> (time (odd-fibs 50))
16
> (time (odd-fibs 50))
2


Previously, without memoization, you would get these kinds of results:


> (time (odd-fibs 50))
10
> (time (odd-fibs 50))
10


Also, the second version of 'memoize' (the one that uses contexts to do lookups), is just a little slower than the first (by about a millisecond in my tests).  However, since it has a faster Big-O time, I think that it'll be faster once there are more stored solutions inside of the memoized functions memory.



Compare these results to running the standard (non-stream) version of odd-fibs:


(define (odd-fibs-standard n , temp fiblist)
(for (i 1 n)
(set 'temp (fib i))
(if (odd temp)
(push temp fiblist -1)
)
)
fiblist
)


Which consistently got very fast times:


> (time (odd-fibs-standard 50))
1
> (time (odd-fibs-standard 50))
1


Note that the memoized version comes close to this when run a second time.



Is it worth it?  I'm not sure on that.  I think it depends on how you're using the streams, and what kind of algorithm you have, because just changing parameters to your function calls will likely result in slowdowns as shown here:


> (time (odd-fibs 10))
3
> (time (odd-fibs 15))
5
> (time (odd-fibs 50))
18
> (time (odd-fibs 49))
20
> (time (odd-fibs 49))
2
Get your Objective newLISP groove on.

itistoday

#8
Playing around some more with this I discovered something interesting.


> (time (odd-fibs 50))
16
> (time (odd-fibs 49))
20
> (time (odd-fibs 48))
21
> (time (odd-fibs 47))
21
> (time (odd-fibs 46))
26


I wanted to see what happened if I kept going, so I wrote a function (fibs-test) to do this for me:


(define (fibs-test n)
(for (i n 1)
(let (duration (time (odd-fibs i)))
(println i " " duration)
)
)
)


This function calls odd-fibs starting from n and goes down to 1 (I used 50 for n).  I ran it using both memoization functions.  The results are shown below.  The blue line represents the list-based memoization function, and the green line is the one that uses the context-based memoization function:



http://www.kinostudios.com/images/memoization.png">



"Huh, that's interesting" I thought.  I did a second test, this time going the other way, from 1 to 50:



http://www.kinostudios.com/images/memoization2.png">



The results from the context-based memoization are practically identical (which makes sense since it's a self-balancing tree), but this time the list-based one was even worse, clearly showing exponential properties. Let's look at the list-based ones overlapped on one another:



http://www.kinostudios.com/images/memoization3.png">



Edit: updated explanation, previous one was flawed



So what's happening?  In the first run (running odd-fibs from 50 to 1), (enum-interval 1 50) is called.  This gets memoized to enum-interval-memo, and the result is added to the list with '(1 50) as the key.  As this iteration continues, '(2 50) is added to the list, '(3 50), and so on.  In other words, for the entire run of (odd-fibs 50), no cached result is used, and so it just computes it normally.  Keep in mind however, that enum-interval-memo:mem keeps growing on each call, and because no key is ever used again in the way we're running the test, it has to search the entire list on each call!  This is because next iteration, the keys will be '(1 49), '(2 49), etc.  Since, on the first run, we started with the largest number (50), at that point the list is very small (initially empty), thus it doesn't take very long to traverse it.   So it should be obvious why when you go in the opposite direction, it takes such a long time.  When you start from 1 and go to 50, by the time you get to computing the most difficult fib-interval (1-to-50), the list is gigantic!



When we switched over to the context-based memoize function, even though again no cached results were used from enum-interval (since the key depends on both the first parameter and the second one, which changed each time), not having to traverse a list helped a *LOT*.  The red-black tree that contexts use is self-balancing, and so the order in which we add keys to it doesn't really affect the height of the tree, thus you get nice O(log(n)) search times.



Don't forget though, it's not just enum-interval that's being memoized, map-stream and filter-stream are being memoized too!  This helps a lot, especially on the second run (see below).



So indeed, based on this, I'd say definitely go with the second version of memoize! :D



Update:



What's nice about the way (delay) is written is that you can customize the memoized function if you want!  For example, since enum-interval doesn't really do much computation anyways, you can just do this:


(set 'enum-interval-memo:enum-interval-memo enum-interval)

There is very little difference between the performance of doing that or sticking to the default context-based memoized version of enum-interval.  The real savings comes from memoizing map-stream and filter-stream (which contain the results of (fib n)), as seen when you run the 50-to-1 test again... (see the red line).



http://www.kinostudios.com/images/memoization4.png">



btw, I realize the edit count on this post is getting ridiculous... I'll try to keep it in the single digits ;-)
Get your Objective newLISP groove on.

Elica

#9
Quote from: "itistoday"I haven't read that paper, but yeah if you want to post some links up go for it. :-)


I'm more than willing to show the paper, but cannot find where to post the scanned pages. This is the only form of the paper which I have. I do not have it online. And did not manage to find it on the web.



Do you know any free repository where the images will stay for a long time?

itistoday

#10
Quote from: "Elica"Do you know any free repository where the images will stay for a long time?


http://www.putfile.com/">Putfile.com?



Btw, I updated the explanation above, so if you (or anyone else) read the old version, I recommend reading it again, as I woke up this morning and realized I had made a mistake in my previous explanation.
Get your Objective newLISP groove on.

Elica

#11
Thanks. I'm now waiting for a reply from the paper's author. I do not like to post the paper without permission... and I hope that he has the paper online.



Meanwhile I will post the examples and the expected result as text. I have them as comments in my Logo program...



Warning 1. The LISP code is typed by me by hand. I have not verified it. So, there might be some typos...



Warning 2. I will split the examples in several posts...

Elica

#12
INTRO



While waiting for permission to post the original paper, I will show you the examples. I hope that you may find them interesting -- specially the more complex ones at the end.



I am curious whether you can reimplement them in newLisp. Well, I'm sure you can, because I have done them in Elica Logo.



NOTE! The author's terminology is to use 'sequence' instead of 'stream'.



So, let me first enumerate the sequence functions:



  seq - creates a sequence

  head-el - returns the first element

  tail-seq - returns the subsequence without the first element

  shs - shows a sequence

  sshs - shows a sequence of sequences



The semantic relations between these functions are:

 

  (seq (head-el x) (tail-seq x))   is  x

  (head-el (seq h t))   is h

  (tail-seq (seq h t))   is t

Elica

#13
SIMPLE SEQUENCES



Constant sequence consist only a constant value.
(define (constant-seq c) (seq c (constant-seq c)))

(shs (constant_seq 1))
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...




Arithmetic sequence has starting element and a rule (function) to generate the next element.
(define (arithmetic next-el a)
  (seq a
    (arithmetic next-el (next-el a))))


Let's define multiples of 5 by the rule:
(define (add-5 n) (n + 5))
(define multiples-5 (arithmetic add-5 5))

(shs (multiples-5))
5, 10, 15, 20, 25, 30, 35, 40, 45, 50, ...


Sequence of natural numbers is arithmetic sequence:
(define nat (arithmetic (lambda (n) (n + 1)) 1))

(shs (nat))
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

Elica

#14
MAP SEQUENCES



General rule-based sequence generator:
(define (seq-by-rule next-el a)
  (seq a (seq-by-rule next-el (next-el a))))


Let's map a square function on the sequence of natural numbers:
(define (map-seq f x)
  (seq (f (head-el x))
    (map-seq f (tail-seq x))))
(define (square n) (n * n))
(define seq-of-squares (map-seq square nat))

(shs (seq-of-squares))
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...


Let's use the same thinking for elementary operations on sequences; sums, differences and products.

(define (seq-op op x y)
  (seq (op (head x) (head y)
    (seq-op op (tail-seq x) (tail-seq y))))
(define (seq+ x y) (seq-op sum x y))
(define (seq- x y) (seq-op difference x y))
(define (seq* x y) (seq-op product x y))

(shs (seq+ (seq-of-squares) (nat)))
2, 6, 12, 20, 30, 42, 56, 72, 90, 110, ...

(shs (seq- (seq-of-squares) (nat)))
0, 2, 6, 12, 20, 30, 42, 56, 72, 90, ...

(shs (seq* (seq-of-squares) (nat)))
1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, ...






Note! The test cases are translated from Logo to Lisp, and I'm not good in Lisp, so I'm not sure about the parentheses. For example, I do not know which one is the correct:

(shs (seq* (seq-of-squares) (nat)))

or

(shs (seq* seq-of-squares nat))

So, please, fix the examples if needed.