Main Menu

Unify

Started by Kazimir Majorinc, October 27, 2008, 02:03:33 AM

Previous topic - Next topic

Kazimir Majorinc

In my last blogpost I used built in function "unify". Unify tends to become bottleneck, and implementation is not trivial, so keeping it in the core language has alot of sense and it is advantage over CL and Scheme. Example:
> (unify '(X (a b) z) '(Y X Z))
((Y (a b)) (X (a b)) (Z z))

If Y is replaced with ((a b) 3), X with (a b) and (Z with z) then these two lists are equal.

Unification makes a difference between "variable" and "constant" symbols. In existing version "variable" symbols start with uppercase. I have low priority proposal for additional version with explicit list of "variable" symbols:
(unify1 '(X (a b) z) '(Y X Z) '() '(X Y Z)) ; third parameter is already used
Unify1 because unify as it is has its place and it might be better to keep many non-polymorphic alternatives like unify1, unify2, ... unify78 than to search for a "real thing". Low priority because although it is nice feature, it is still minor issue and not many Newlisp users have strong interest. So, v11 or v10.5 would be good enough.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

cormullion

#1
I don't think I've seen people use unify in their code. I don't myself understand how it can be used for something practical - I'd love to see some applications, since it looks cool...!

Dmi

#2
In simple words, unification is a method to match structured data to a template so

- binded symbols and constants will be compared to corresponding data structure elements

- unbinded symbols will be binded (assigned to) with corresponding data structure elements

If mathcing has no conflicts, them unification is successful.



The power of unification is that there can be several templates which are applied sequentally until success.



Some examples from Erlang world:

(note that in Erlang "Capitalized" words are variables and "regular" words are just constant names which means themselves)



Function arguments unification:
handle(create, What, Where) -> create_object(What, Where, other params);
handle(extract,sysdate, dual) -> now();
handle(extract,"hello", dual) -> string("Welcome dear " User);  
handle(extract, What,Where) -> find_object(Where, What).


next we can call:

X=sysdate,
Y=dual,
hanle(extract, X, Y)


This will result in the call to function now().



Real example:
start(mysql) ->
    % make db connection
    {};
start(mnesia) ->
    % make db connection
    {};
start(odbc) ->
    % make db connection
    {};


Example with case statement:
case Var of
  {"create", What, Where} -> create_object(What, Where, other params);
  {"extract","sysdate", "dual"} -> now();
  {"extract","hello", "dual"} -> string("Welcome dear " User);  
  {"extract", What,Where} -> find_object(Where, What)
end.

Var can be touple {"extract", some-object, some-table} etc.



While this technics are available the code can became very elegant sometimes...
WBR, Dmi

cormullion

#3
Thanks! I'll study this to try understand it better (although I would prefer it translated into newLISP - it's probably the only language I'm familiar with).

Dmi

#4
Thinking about unify example for newLISP, I found, that Erlang style unification can perfectly fit newLISP.

The very cool usage proposal for unification in Erlang is extracting parameters from complex list/touple structures.

I.e.:
{IP, Port, {Type, Request}} = {"127.0.0.1", 8658, {"GET", "something"}}
will result in bindings:

IP = "127.0.0.1", Port = 8658 and Type= "GET" and Request = "something".

In the other hand, if pattern structure and bound variables/constants will not match, the unification will fail.



But, let's return to newLISP...



newLISP have a good proposal for wide usage of similar unification because it's major and the best data structure is a list, which can be structured.

Say,
("127.0.0.1", 8658, ("GET", "something"))

Unfortunately, current unify syntax is not so laconic as Erlang.

In short, I propose the new unify engine where

- unification variables (that's are bounded or may be binded) will be represented as regular newLISP symbols, prefixed with "~" (or another one).

The "bound" state of this variables must be handled implicitly. And I suspect, that this variables possible should be local to current function in stack.

- all other newLISP symbols, used in unify will be treated as constant, given by value.

- the short name (say "~") for "unify" ;-)

- the form of define (say define~) which defines the function like an Erlang/Prolog predicate.

- the form of case (say case~) which have variants based on unification patterns.

Consider this virtual example:
(set 'ip "127.0.0.1")
(set 'rq '('test-assign 1234))
(~ (~ip ~port ~request) (ip 8658 ('SET rq)))
(print ~ip ":" ~port)
# will print 127.0.0.1:8658
(if (member ~ip allowed-ip-list)
  (execute-command ~request))
....
(set 'something '("def" 25 4))
(case~ something
  (("abc" ~second) (do-something ~second))
  (("def" ~second ~third) (+ ~second ~third))
# will return 29
....

(define~ (execute-command ('GET ~what))
  (lookup ~what global-prop-list))

# this one will be called for our example
(define~ (execute-command ('SET (~what ~to-what)))
  (push (list ~what ~to-what) global-prop-list))

(define~ (execute-command ('CHECK (~what ~to-what)))
  (u~ (~what ~to-what) (assoc ~what global-prop-list)))
# here identical to
# (= (list ~what ~to-what) (assoc ~what global-prop-list))
# but in common may be much powerful

I mean' that when we call execute-command, newLISP will pass through all versions of it, trying to unify arguments, and execute the first where unification succeeded.



To Lutz: Probably, this innovation may introduce the new interesting technics into a language, and also will not break other features and preserve backward compatibility.
WBR, Dmi

cormullion

#5
Interesting - although I'm not sure whether the abstract syntax would find favour (cf backtick...!) ... :)



If this is  Erlang:


{IP, Port, {Type, Request}} = {"127.0.0.1", 8658, {"GET", "something"}}

is this the newLISP equivalent?


(bind (unify '(IP Port (Type Request)) '("127.0.0.1" 8658 ("GET" "something"))))

It's only a bit less laconic...

Dmi

#6
Yes, they are equivalents... but only if we haven't already binded symbols.



If so, in Erlang this will be:
{IP, _}={"127.0.0.1", something},
....
{IP, Port, {Type, Request}} = {"127.0.0.1", 8658, {"GET", "something"}}

this is the actual form. Whether the unification fails, the exeception is thrown (excluding the "case" variants etc.)



and the actual form in nwelisp:
(set 'meaningless (unify '(IP Something) '("127.0.0.1" "something")))
....
(let (meaningless2 (unify
  '(IP Port (Type Request))
  '("127.0.0.1" 8658 ("GET" "something"))
  meaningless)))
  (if meaningless2 (bind meaningless2)
      (do-some-if-unify-fails)))


The one of points of my proposal is introducing of new namespace for binded values, which lives together by newlisp laws and by unify laws.



And the other points are the "case" and "define" forms which can be emulated with macros, but this isn't so useful.
WBR, Dmi

Kazimir Majorinc

#7
Making character ~ in the name significant breaks Lisp syntax. There is little doubt that special syntax is prettier, shorter, more readable, etc. Compare



x1:=(-b+sqrt(b*b-4*a*c))/(2*a) and



(set 'x1 (div (add (sub b) (sqrt (sub (mul b b) (mul 4 a c)))) (mul 2 a))).



But, s-expression syntax is easier for analysis and synthesis by programs and it is probably the main idea of Lisp. Also, even if one replaces, say, ~name with (÷ name), there is still ambivalence because (unify (÷ x) (÷ y)) can be iterpreted as (1.) x and y are variables for unification but also (2) x is variable that should be unified with expression (÷ y) etc.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

Dmi

#8
I don't see how usage of ~something symbols may break newlisp syntax.



I already can do (newlisp shell example)
(set '~something 1)
~something
1

The idea is that actually no syntax breaking will occurs: We already can legally use ~-prefixed symbols as regular symbols in newlisp (also $-prefixed, %-prefixed etc.).

But inside unify exactly thoose ~-prefixed symbols will be subject of binding, which is immediately affected outside unify (without bind function).

So newlisp engine in proposed case should maintain binding status for ~-prefixed symbols for usage inside unify function.



There is already conformism about prefixing in unify - binding variables a capitalized. Capitalising is common for Prolog and Erlang, because it is their syntax. But not for newlisp where capitalizing is not affected language symbols behavior.
WBR, Dmi

Kazimir Majorinc

#9
Quote from: "Dmi"I don't see how usage of ~something symbols may break newlisp syntax.


Because one who uses such unify should encode information (~) in variable name. It is not good. Variable names should be just variable names. Imagine language such that programs look like:


(eval2 '_10_x=1
       '_20_x=x+1
       '_30_print_x
       '_40_goto_20)


Formally, it is still Lisp, but essentially it is not Lisp any more. On the other hand, this one is Lisp:


(eval2 (10 (set 'x 1))
       (20 (set 'x (+ x 1)))
       (30 (print x))
       (40 (goto 20)))


QuoteThere is already conformism about prefixing in unify - binding variables a capitalized. Capitalising is common for Prolog and Erlang, because it is their syntax. But not for newlisp where capitalizing is not affected language symbols behavior.


Exactly.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

Dmi

#10
Quote from: "Kazimir Majorinc"
...Because one who uses such unify should encode information (~) in variable name. It is not good. Variable names should be just variable names. Imagine language such that programs look like:

Not so exactly because such (~) variables will not have any special meaning in newlisp code - just variables, like any others, and such prefix will have no special meaning for them in newlisp.



But inside unify forms there is no newlisp rules - there are unification rules, which require that variables must differ from constants and should maintain binding status. Currently for this capitalizing is used.

I only point an attention that prefixing is more useful here if we want unify to be interoperable with newlisp core engine - because ~-prefixing is more rarely used than capitalizing in regular newlisp code.



I suggest such result:

If U want to code without unify, U simply code without unify, even using ~prefixed symbols.

If U want to use unify forms, then using of ~prefixed symbols is the simplest way to:

- get results from unify directly into the code

- issue subsequent unify with implicit maintaining of binding status.
WBR, Dmi

Kazimir Majorinc

#11
I think we have two issues here. One is denoting unification variables with ~. I agree that it is better than uppercase. Still, it is encoding of information in symbol names. Imagine you have expressions F and G, both containing lot of symbols. Say,
F=(a (b (c ....)

G=(((z (y ...)))
[/list]
Suppose that the problem is: is it possible to unify F and G with substitution of only one symbol from F? How can one solve that? He has to generate
F0=(~a (b (c ...)

F0=(a (~b (c ...)

F0=(a (b (~c ...)
[/list]
and then to try (unify F0 G) for each F0. Don't you see that it is encoding information in symbol name?



Another issue is that you want that unification is associated with binding. It is special use of unification, that's why Erlang code is shorter. Not only that you bind results of unification, but existing binding influences unification.



Such unibind combination has lot of sense, but unify without binding - as it is now - has sense as well. One might want to see how unification looks like and analyze it. Or, if result of unification is ((x 7)) he might want to substitute x with 7, not to set value of x to 7. I think proposed combination of binding and unification has sense with any form of unification, one that exist, and one I proposed with explicit list of "unification variables" but it shouldnt be only form of unification.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

Dmi

#12
Quote from: "Kazimir Majorinc"I think we have two issues here. One is denoting unification variables with ~. I agree that it is better than uppercase. Still, it is encoding of information in symbol names. Imagine you have expressions F and G...

[skopped]

Don't you see that it is encoding information in symbol name?

Yes, it is encoding. Anyway there is three cases there:

1. To encode information in symbol names with capitalizing and providing a binding list (current engine).

pros: it is current

cons:

  - unify is not so useful for fast scripting (which is still the horse of newlisp in common),

  - capitalized names are frequent in regular practice, so there will be some problems to map unify bindings to newlisp symbols. (of course we can always use names like UnSymbolName to avoid interference, what means introducing prefix again ;)

2. To encode information in symblol names with rare used prefix (my proposal).

pros: we can map bindings directly to newlisp symbol values.

cons: we lost some virtual language purity (but note, that current unify implementation alog with "bind" suggests exactly this way).

3. To use only binding list for both setting initial binding and enumerating symbols which must be treated as variables. Someting like:
(unify '(A B C) '(B C A) '((A 1) (B unbound)))
so there C will be treated as constant.

pros: This is the most pure way.

cons: It is much for academic cases.



My proposal is based on my interest to newlisp: fast coding of complex practical systems.



And, in common analisis about the case I gree with you, Kazimir :)
WBR, Dmi