Revisiting namespaces as hashes and the underscore _ charact

Started by Lutz, March 20, 2008, 08:31:35 AM

Previous topic - Next topic

Lutz

This is a longer post. Necessary because associative memory access via hashes is such a central tool in any scripting language and many beginners of newLISP still have problems in understanding contexts in newLISP.



After rethinking the issue discussed in this thread: http://www.alh.net/newlisp/phpbb/viewtopic.php?t=2211">http://www.alh.net/newlisp/phpbb/viewtopic.php?t=2211 I am convinced now, that Cyril's suggestion to automatically prepend an underscore character to each key string as used in 'bayes-train' is the better solution for implementing hashes with contexts.



What convinced me at the end is the priority of ease-of-use from an application point of view versus internal consistency how context symbols are created and trated.



Here again the usage and also showing the introduction of two new forms of usage in 9.3.5:


(define Foo:Foo) ; create namespace and default functor

(Foo "bar" 123)
(Foo "bar") => 123

; create hashes from a batch - new

(Foo '(("var" 123) ("bar" 456) ("baz" 789)))

; return association list from hash namespace - new

(Foo) => (("bar" 456) ("baz" 789) ("var" 123))

; internally symbols are created with a prepended underscore

(symbols Foo) => (Foo:Foo Foo:_bar Foo:_baz Foo:_var)


Contexts or namespaces in newLISP are used for 4 different purposes:



(1) program modules organizing code

(2) static no-volatile objects packaging functions and data

(3) method classes in FOOP (functional object oriented programming)

(4) hash like associative memory access



(1), (2) and (3) are similar, because all of them use the namespace to package related functions together. (2) also packages data inside the namespace. This has been used in the past for object oriented programming, but the idea has been mostly dropped with the introduction of FOOP in version 9.3.0. FOOP holds object data in normal LISP lists which point to a class context of method functions. FOOP objects can be anonymous, volatile and are fully memory managed. Today (2) is only recommended for non-volatile objects, which rarely get deleted.



In (4) the namespace is used for pure associative memory access to data pieces referred to by string keys. This is different from the usage in (1), (2) and (3), where always function definitions are involved and where namespace contents is typically directly edited by programmers, versus the automatic creation of content when using the namespace for hash-like data access and creation.



The prepending of the underscore is completely transparent to the user and the old way of creating namespace symbols or hashes using the (context <context> <key> <value>) or (sym ...) syntax is still valid, but less convenient, because accidental overwriting of the default functor and the symbols 'set' and 'sym' is always a concern when creating hashes from anonymous sources and large quantities.



Hashes now can be used to work together with namespaces created by 'bayes-train', which tokens also get prepended with underscores. Using underscore in hashes too, now allows to use hash functions on dictionaries created with 'bayes-train':


(set 'txt [text]This is some text with many different words in it.
The 'bayes-train' function can be used to count the words and create
a namespace use as a ditionary with word frequencies[/text])

(bayes-train (parse txt "\W" 0) 'Lex)

> (Lex) ( ...
 ("to" (1))
 ("train" (1))
 ("use" (1))
 ("used" (1))
 ("with" (2)) ...)

> (Lex "words")
(2)
> Lex:total
(35)
>


Prepending the underscore is completely transparent to the programmer. As namespace usage as in (1), (2), (3) versus (4) gets never mixed (allthough possible) this is of no concern.



What is sill missing is an iterator function for convenient key<->value access. The followiong could be used as a workaround:



(dolist (item (Foo))
   (println "key: " (item 0) " value: " (item 1))
)


This would print a sorted list of all key - value pairs.

hsmyers

#1
Quote from: "Lutz"Prepending the underscore is completely transparent to the programmer. As namespace usage as in (1), (2), (3) versus (4) gets never mixed (allthough possible) this is of no concern.

That takes care of any objection I had when Cyril and I were going back and forth. What happens behind closed doors is none of my concern...
\"Censeo Toto nos in Kansa esse decisse.\"—D. Gale \"[size=117]ℑ♥λ[/size]\"—Toto