new dictionaries in 9.3.4

Started by Cyril, March 18, 2008, 07:50:56 PM

Previous topic - Next topic

Cyril

The dictionaries short syntax introduced in 9.3.4 is very handy but very dangerous also. I have noted this long ago in 9.2.4 times, but that time I was a complete newbee and therefore supposed that I just need to learn more. I am still supposing that I need to learn more, but now I am brave enough to claim that dictionaries implemented in the current way are too bug-appealing. Consider something like this:


(define table:table)

(while (read-line)
  (table (current-line) (length (current-line))))


Nice, eh? Only until the line containing one word "table" appears in input. This will lead to subtle errors on rare user data. (By the way, the data are not so rare: I often test the text processing scripts using their own source text as input).



Another problem here is that iterating over the dictionary seems unnatural for casual programmer:


(dotree (key table)
  (if (eval key) (println (name key) " " (eval key))))


Why must I deal with symbols if my task is about strings? And why do I need the additional check?



But, despite this, I opine that new dictionary syntax is a good idea. But it should be elaborated a bit. My suggestion:



1. When the default functor is 'nil', it should act almost as in the current version, but prepending underscore to the symbol name. I mean, (table "abc" 123) should create symbol table], not table]. Of course, (table "abc") should return the value of table] too. This is compatible with currently undervalued 'bayes-train' function.



2. New loop construction should be introduced. I suggest the name 'dokeys', but the exact name is not of much value. It should iterate only over the keys starting with underscore, and yield them as strings, not as symbols. With this operator my second example became:


(dokeys (key table)
  (println key " " (table key)))


In short: I suggest the data structure already used by the bayes functions family be made the standard way to implement string-to-anything dictionaries. Pro: simplifies user code, less error-prone. Contra: maybe a bit slower (depends of implementation details).



P.S. This way additional attributes can be hold in the dictionary, not confused with user data. Consider 'total' of bayes functions.
With newLISP you can grow your lists from the right side!

hsmyers

#1
Consider that the new implementation makes less work for me. Consider that your suggestions make vastly more work for me. Consider that you can code well enough to understand the undervalued  'bayes-train' function. Suggestion, roll your own. Not trying to be hostile here (really I'm not ;) ), but was un-aware that this new addition was just for you (well I suppose it could be; how much did it cost, because I've got some ideas of my own...). Solution offered as baseline HAS to be generic. Nothing else will allow maximal freedom for all users. I am particularly unimpressed with the under bar notion. Leave my keys and my data alone. And never arrange that two keys lead to same result. Likewise, there doesn't seem to be anything with dotree; anything familiar to VB programmers and Perl programmers wouldn't seem unnatural for casual programmers (what is a 'casual' programmer doing walking through a table in newLISP anyway?).
\"Censeo Toto nos in Kansa esse decisse.\"—D. Gale \"[size=117]ℑ♥λ[/size]\"—Toto

Cyril

#2
Quote from: "hsmyers"Solution offered as baseline HAS to be generic.


The problem here is that current solution is not generic enough. It has a strange edge-case:


> (table "abc" 123)
123
> (table "def" 456)
456
> (table "table" 7890)
nil


Of course when we see the code like this, we can just say «well, do not pass the string "table" as a key». But in reality the data are coming from the outside world. It is very bad idea for me that data from the outside world can be by accident mixed with identifier inside my program. There are just on different levels of abstraction.


Quote from: "hsmyers"Consider that your suggestions make vastly more work for me.


Can you please give an example? I have just failed to imagine any pattern of usage that leads to vastly more work under my suggestion. I am not trying to be hostile too, honest!, I am just curious. It seems that I have overlooked something important. Help me to grasp this, please!
With newLISP you can grow your lists from the right side!

hsmyers

#3
Don't want to code like you. Want to code like me. Having to do it your way may in fact not be more work (although I think it might be, let me think some more about that) but it will certainly feel as though it is. Not only does the new feature need to be generic, it also must have as little trace of style as possible. That is not to say that it can't be elegant or even stylish (not sure what stylish code is; elegance is usually only identified when you see it and there after); no--- here it means those thing that are idiosyncratic to the programmer (as I fear some of your suggestions are...). You shouldn't be able to tell at a glance who coded it (whatever it is). Hmmm seem to be wandering off point again, sorry!



I now understand your meta-problem with 'table'. I'd have to call it a BUG. Although in your short example you could certainly 'let' the line of text into a string with little loss (I think...)



OK. So we fix the bug. The rest, including under bars can be added to the bayes version of the code. After all, all you have to do is write a default function for table:table that does both the initializer as well as the dual fetch. Should be fine. Not sure what to tell you about DoKeys--- except that it has that macro feeling about it, so I wish you good luck if you go that way.



BTY, does this work? (I've not set up the new version yet)


(dotree (key table)
  (println key " " (table key)))
\"Censeo Toto nos in Kansa esse decisse.\"—D. Gale \"[size=117]ℑ♥λ[/size]\"—Toto

Cyril

#4
Quote from: "hsmyers"Don't want to code like you. Want to code like me.


Would you please show any example of "coding like you"? Before posting my suggestion I have tried to imagine any code that would be broken by it, and failed. Can you point me where I am wrong?


Quote from: "hsmyers"OK. So we fix the bug.


The point is "how"? If you can suggest any better way to fix this, I will be glad with it.


Quote from: "hsmyers"After all, all you have to do is write a default function for table:table that does both the initializer as well as the dual fetch.


I have wrote such a function! ;-) Five lines for getter-setter and six lines for 'dodict' (now I think this name is better than 'dokeys'). But I believe that this pattern is just too common and deserved to be default. I predict too many naive users falling into this... bug (or how you call it).


Quote from: "hsmyers"BTY, does this work? (I've not set up the new version yet)


(dotree (key table)
  (println key " " (table key)))


Nope! 'key' variable contains symbol, not string, and 'table' default funcion accepts strings, not symbols. You should write:


(dotree (key table)
  (println key " " (eval key)))

; or

(dotree (key table)
  (println key " " (table (name key))))


And in both cases the pair 'nil nil' will be printed among others.
With newLISP you can grow your lists from the right side!

hsmyers

#5
You miss point about bug---I was talking about how table in your example is used in two different ways that apparently is not handled correctly. This is a BUG---up to Lutz to find and fix, not us.



That you can't imagine any code that would be broken isn't the point. First of all this is a NEW feature, no one yet is using it to be broken by your changes should that be the case. Second, you just admitted to be new (at one point) to Lisp---what makes your imagination (based on intuition and experience) all that good at predicting what may or may not be? Third, I said that I'd have to go back and think about it---our discussion has made me skeptical of my own statements (two reasons to re-think).



About things that are to 'hard' for the novice; pointers in C are often difficult for the new programmer. The solution is called C# which I wouldn't touch with a 10 foot pole. The best way for beginners to handle hard things is to continue to bash their heads against them until the either quit and go away or see the light. It is how beginners become better.
\"Censeo Toto nos in Kansa esse decisse.\"—D. Gale \"[size=117]ℑ♥λ[/size]\"—Toto

Jeff

#6
Contexts are a mixed object.  They are multi-purpose binary trees that are also used for implementation of scopes.  dotree works the way it does because it is iterating over a tree.  By storing functions in that tree, you are also creating lexically scoped functions - Paul Graham's "a hash full of closures."



If you want to use the new syntax, it predicates on the default functor being a get/set function.  If you want to be able to store table:table as a regular key, use the regular context syntax.  It hasn't disappeared.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

Lutz

#7
- Regarding (table (current-line)), always check user input:



see here: http://www.youtube.com/watch?v=MJNJjh4jORY">http://www.youtube.com/watch?v=MJNJjh4jORY

and here: http://www.unixwiz.net/techtips/sql-injection.html">http://www.unixwiz.net/techtips/sql-injection.html



- The function 'context' will also fail when trying to set the default symbol, only 'sym' and 'define' can be used to change the default symbol. Now that we have protection with the new (<context> <key> <value>) syntax, we could take away protection from 'context' and allow it to change the default symbol too.




(context foo "foo" 999) => nil ; also protects

(set (sym "foo" foo) 999) => 999 ; only this works

(define foo:foo 999) => 999 ; this works too


- A solution for better iterating through keys and values is coming, and a way to set key-value pairs from a list (see Jeff's request) is done for 9.3.5.

cormullion

#8
Quote from: "Cyril"But in reality the data are coming from the outside world. It is very bad idea for me that data from the outside world can be by accident mixed with identifier inside my program.


Quote from: "hsmyers"what is a 'casual' programmer doing walking through a table in newLISP anyway?


Interesting discussion!  



By the way, the business of having to put underscores in front of symbol names in contexts to avoid them being confused with other symbols is sufficiently basic to make it into sections 7.5-6 of the http://newlisp.org/introduction-to-newlisp.html">Introduction to newLISP. I also remember hoping to find a way of going through all the entries in a context that weren't functions...



Also by the way, I'm interested in the mention of the 'casual programmer'. I think that newLISP is suited to casual programmers as well as other types (although what is meant by 'casual' here: relaxed, careless, temporary, informal?). I like the way that Lutz usually manages to accommodates both, although I also recognise that rigour is sometimes a good thing to be forced to learn.

Cyril

#9
Oops! The forum became down when I was typing the answer. But I have saved it in a file. ;-)


Quote from: "hsmyers"This is a BUG---up to Lutz to find and fix, not us.


Lutz is BDFL, but this should not prevent us to search for our own solutions, and even to suggest them to Lutz! ;-) Of course it is up to Lutz to accept or decline them.


Quote from: "hsmyers"You just admitted to be new (at one point) to Lisp


Since 1985 or so. ;-) But the true point is that I am new to newlisp. Everyone is new to newlisp, because newlisp is new! ;-) I believe (of course it is my humble opinion only) that the stench of newlisp is it's cross of elegance from lisp camp and easiness for everyday tasks from scripting camp. The mission of newlisp (again, it's my opinion only) is to deliver the strength of traditional lisp to users of perl, awk and even visual basic, and the practical usefulness of perl and visual basic to traditional lisp users.


Quote from: "hsmyers"I said that I'd have to go back and think about it---our discussion has made me skeptical of my own statements (two reasons to re-think).


Me too. In fact I am already seeing what is wrong in my suggestion. I am rethinking it now. Thank you for your critique!
With newLISP you can grow your lists from the right side!

Cyril

#10
Quote from: "Jeff"If you want to use the new syntax, it predicates on the default functor being a get/set function.


This makes sense for me. Now I think that, while my critique of current solution was relevant, my suggestion to change it was far too radical. Now I am working on a compromise suggestion, to be done in a few hours.



In fact the current solution is in a best spirit of newlisp: the data structure in a function position should work as a "most natural" getter and/or setter for this structure. (mylist 0) is a shorthand for (nth 0 mylist), and (mydict "abc") is a shorthand for (context mydict "abc"). I buy this. But, then, is it a good reason why this analogy is incomplete? Why (context mydict 'abc) works, but (mydict 'abc) doesn't? If we are using contexts as namespaces (objects, closures and so on), symbols are at least as natural choice for keys as strings.
With newLISP you can grow your lists from the right side!

Cyril

#11
Quote from: "Lutz"Regarding (table (current-line)), always check user input


There are two main reasons to check the user input: (1) when this input is later interpreted as a code in the same or another language (SQL injection case) (2) when some operation is undefined for some data (division by zero case). Using a string entered by the user as a key to a dictionary is neither of them. In fact this behavior introduces a strange and unnatural case of "undefined behavior", depending not on data itself, but on the program text. If, for example, some implementation of dicts will not allow an empty string as a key, this is annoying but reasonable: some data are good, some are bad, and it depends on data itself. But when the acceptance of data depends on variable names in a program, it is a very bad thing. Or at least I believe it is so.
With newLISP you can grow your lists from the right side!

Lutz

#12
QuoteBy the way, the business of having to put underscores in front of symbol names in contexts to avoid them being confused with other symbols is sufficiently basic to make it into sections 7.5-6 of the Introduction to newLISP.


In general I don't see a need for this. A namespace should be dedicated for  a specific usage. If you don't use program module contexts at the same time for hashing strings of unknown form, then there is no problem. Nobody would do this anyway.



Most of the time hashes will be used for keys, where the key has a known syntax (allowed characters etc.), where this is not the case the programmer should check for the default symbol not being allowed.



I would say it is enough to include warning about hashing the default symbol and in the case where the namespace is saved using 'save' the strings "set" and "sym" too.

cormullion

#13
I got the underscore thing from you, Lutz :)



http://www.alh.net/newlisp/phpbb/viewtopic.php?t=892">//http://www.alh.net/newlisp/phpbb/viewtopic.php?t=892



Perhaps I should revisit that whole thing now.

Lutz

#14
In that post it was about natural language tokens, where pretty much any kind of string can occur and then it is wise to prepend an underscore. 'bayes-train' is a very special usage function which creates many symbols from a list, so an individual check is not possible and the function is frequently used on natural language tokens, this is why and underscore is prepended by default.



I would say at the moment don't change any of the documentation in this area. Let us see how the discussion plays out on this topic and how development proceeds, implementing more features in this area.



These are development versions and nothing is casted in stone at this point ;-)