Unify

Kazimir Majorinc · October 27, 2008, 02:03:33 AM

In my last blogpost I used built in function "unify". Unify tends to become bottleneck, and implementation is not trivial, so keeping it in the core language has alot of sense and it is advantage over CL and Scheme. Example:

Code Select Expand
> (unify '(X (a b) z) '(Y X Z))
((Y (a b)) (X (a b)) (Z z))

If Y is replaced with ((a b) 3), X with (a b) and (Z with z) then these two lists are equal.

Unification makes a difference between "variable" and "constant" symbols. In existing version "variable" symbols start with uppercase. I have low priority proposal for additional version with explicit list of "variable" symbols:

Code Select Expand
(unify1 '(X (a b) z) '(Y X Z) '() '(X Y Z)) ; third parameter is already used

Unify1 because unify as it is has its place and it might be better to keep many non-polymorphic alternatives like unify1, unify2, ... unify78 than to search for a "real thing". Low priority because although it is nice feature, it is still minor issue and not many Newlisp users have strong interest. So, v11 or v10.5 would be good enough.

cormullion · October 27, 2008, 11:50:32 AM

I don't think I've seen people use unify in their code. I don't myself understand how it can be used for something practical - I'd love to see some applications, since it looks cool...!

Dmi · November 11, 2008, 03:56:27 AM

In simple words, unification is a method to match structured data to a template so

- binded symbols and constants will be compared to corresponding data structure elements

- unbinded symbols will be binded (assigned to) with corresponding data structure elements

If mathcing has no conflicts, them unification is successful.

The power of unification is that there can be several templates which are applied sequentally until success.

Some examples from Erlang world:

(note that in Erlang "Capitalized" words are variables and "regular" words are just constant names which means themselves)

Function arguments unification:

Code Select Expand
handle(create, What, Where) -> create_object(What, Where, other params);
handle(extract,sysdate, dual) -> now();
handle(extract,"hello", dual) -> string("Welcome dear " User);   
handle(extract, What,Where) -> find_object(Where, What).

next we can call:

Code Select Expand

X=sysdate,
Y=dual,
hanle(extract, X, Y)

This will result in the call to function now().

Real example:

Code Select Expand
start(mysql) ->
    % make db connection
    {};
start(mnesia) ->
    % make db connection
    {};
start(odbc) ->
    % make db connection
    {};

Example with case statement:

Code Select Expand
case Var of
  {"create", What, Where} -> create_object(What, Where, other params);
  {"extract","sysdate", "dual"} -> now();
  {"extract","hello", "dual"} -> string("Welcome dear " User);   
  {"extract", What,Where} -> find_object(Where, What)
end.

Var can be touple {"extract", some-object, some-table} etc.

While this technics are available the code can became very elegant sometimes...

cormullion · November 11, 2008, 08:33:06 AM

Thanks! I'll study this to try understand it better (although I would prefer it translated into newLISP - it's probably the only language I'm familiar with).

Dmi · November 14, 2008, 04:24:58 PM

Thinking about unify example for newLISP, I found, that Erlang style unification can perfectly fit newLISP.

The very cool usage proposal for unification in Erlang is extracting parameters from complex list/touple structures.

I.e.:

Code Select Expand
{IP, Port, {Type, Request}} = {"127.0.0.1", 8658, {"GET", "something"}}

will result in bindings:

IP = "127.0.0.1", Port = 8658 and Type= "GET" and Request = "something".

In the other hand, if pattern structure and bound variables/constants will not match, the unification will fail.

But, let's return to newLISP...

newLISP have a good proposal for wide usage of similar unification because it's major and the best data structure is a list, which can be structured.

Say,

Code Select Expand
("127.0.0.1", 8658, ("GET", "something"))

Unfortunately, current unify syntax is not so laconic as Erlang.

In short, I propose the new unify engine where

- unification variables (that's are bounded or may be binded) will be represented as regular newLISP symbols, prefixed with "~" (or another one).

The "bound" state of this variables must be handled implicitly. And I suspect, that this variables possible should be local to current function in stack.

- all other newLISP symbols, used in unify will be treated as constant, given by value.

- the short name (say "~") for "unify" ;-)

- the form of define (say define~) which defines the function like an Erlang/Prolog predicate.

- the form of case (say case~) which have variants based on unification patterns.

Consider this virtual example:

Code Select Expand
(set 'ip "127.0.0.1")
(set 'rq '('test-assign 1234))
(~ (~ip ~port ~request) (ip 8658 ('SET rq)))
(print ~ip ":" ~port)
# will print 127.0.0.1:8658
(if (member ~ip allowed-ip-list)
  (execute-command ~request))
....
(set 'something '("def" 25 4))
(case~ something
  (("abc" ~second) (do-something ~second))
  (("def" ~second ~third) (+ ~second ~third))
# will return 29
....

(define~ (execute-command ('GET ~what))
  (lookup ~what global-prop-list))

# this one will be called for our example
(define~ (execute-command ('SET (~what ~to-what)))
  (push (list ~what ~to-what) global-prop-list))

(define~ (execute-command ('CHECK (~what ~to-what)))
  (u~ (~what ~to-what) (assoc ~what global-prop-list)))
# here identical to
# (= (list ~what ~to-what) (assoc ~what global-prop-list))
# but in common may be much powerful

I mean' that when we call execute-command, newLISP will pass through all versions of it, trying to unify arguments, and execute the first where unification succeeded.

To Lutz: Probably, this innovation may introduce the new interesting technics into a language, and also will not break other features and preserve backward compatibility.

cormullion · November 15, 2008, 01:07:50 AM

Interesting - although I'm not sure whether the abstract syntax would find favour (cf backtick...!) ... :)

If this is Erlang:

Code Select Expand
{IP, Port, {Type, Request}} = {"127.0.0.1", 8658, {"GET", "something"}}

is this the newLISP equivalent?

Code Select Expand
(bind (unify '(IP Port (Type Request)) '("127.0.0.1" 8658 ("GET" "something"))))

It's only a bit less laconic...

Dmi · November 15, 2008, 02:20:18 AM

Yes, they are equivalents... but only if we haven't already binded symbols.

If so, in Erlang this will be:

Code Select Expand
{IP, _}={"127.0.0.1", something},
....
{IP, Port, {Type, Request}} = {"127.0.0.1", 8658, {"GET", "something"}}

this is the actual form. Whether the unification fails, the exeception is thrown (excluding the "case" variants etc.)

and the actual form in nwelisp:

Code Select Expand
(set 'meaningless (unify '(IP Something) '("127.0.0.1" "something")))
....
(let (meaningless2 (unify
  '(IP Port (Type Request))
  '("127.0.0.1" 8658 ("GET" "something"))
  meaningless)))
  (if meaningless2 (bind meaningless2)
      (do-some-if-unify-fails)))

The one of points of my proposal is introducing of new namespace for binded values, which lives together by newlisp laws and by unify laws.

And the other points are the "case" and "define" forms which can be emulated with macros, but this isn't so useful.

Kazimir Majorinc · November 15, 2008, 07:38:54 AM

Making character ~ in the name significant breaks Lisp syntax. There is little doubt that special syntax is prettier, shorter, more readable, etc. Compare

x1:=(-b+sqrt(b*b-4*a*c))/(2*a) and

(set 'x1 (div (add (sub b) (sqrt (sub (mul b b) (mul 4 a c)))) (mul 2 a))).

But, s-expression syntax is easier for analysis and synthesis by programs and it is probably the main idea of Lisp. Also, even if one replaces, say, ~name with (÷ name), there is still ambivalence because (unify (÷ x) (÷ y)) can be iterpreted as (1.) x and y are variables for unification but also (2) x is variable that should be unified with expression (÷ y) etc.

Dmi · November 15, 2008, 12:13:42 PM

I don't see how usage of ~something symbols may break newlisp syntax.

I already can do (newlisp shell example)

Code Select Expand
(set '~something 1)
~something
1

The idea is that actually no syntax breaking will occurs: We already can legally use ~-prefixed symbols as regular symbols in newlisp (also $-prefixed, %-prefixed etc.).

But inside unify exactly thoose ~-prefixed symbols will be subject of binding, which is immediately affected outside unify (without bind function).

So newlisp engine in proposed case should maintain binding status for ~-prefixed symbols for usage inside unify function.

There is already conformism about prefixing in unify - binding variables a capitalized. Capitalising is common for Prolog and Erlang, because it is their syntax. But not for newlisp where capitalizing is not affected language symbols behavior.

Kazimir Majorinc · November 15, 2008, 04:34:47 PM

~~Quote from: "Dmi"~~I don't see how usage of ~something symbols may break newlisp syntax.

Because one who uses such unify should encode information (~) in variable name. It is not good. Variable names should be just variable names. Imagine language such that programs look like:

Code Select Expand
(eval2 '_10_x=1 
       '_20_x=x+1 
       '_30_print_x 
       '_40_goto_20)

Formally, it is still Lisp, but essentially it is not Lisp any more. On the other hand, this one is Lisp:

Code Select Expand
(eval2 (10 (set 'x 1))
       (20 (set 'x (+ x 1)))
       (30 (print x))
       (40 (goto 20)))

~~Quote~~There is already conformism about prefixing in unify - binding variables a capitalized. Capitalising is common for Prolog and Erlang, because it is their syntax. But not for newlisp where capitalizing is not affected language symbols behavior.

Exactly.

Dmi · November 16, 2008, 03:40:52 AM

~~Quote from: "Kazimir Majorinc"~~
...Because one who uses such unify should encode information (~) in variable name. It is not good. Variable names should be just variable names. Imagine language such that programs look like:

Not so exactly because such (~) variables will not have any special meaning in newlisp code - just variables, like any others, and such prefix will have no special meaning for them in newlisp.

But inside unify forms there is no newlisp rules - there are unification rules, which require that variables must differ from constants and should maintain binding status. Currently for this capitalizing is used.

I only point an attention that prefixing is more useful here if we want unify to be interoperable with newlisp core engine - because ~-prefixing is more rarely used than capitalizing in regular newlisp code.

I suggest such result:

If U want to code without unify, U simply code without unify, even using ~prefixed symbols.

If U want to use unify forms, then using of ~prefixed symbols is the simplest way to:

- get results from unify directly into the code

- issue subsequent unify with implicit maintaining of binding status.

Kazimir Majorinc · November 16, 2008, 07:29:24 AM

I think we have two issues here. One is denoting unification variables with ~. I agree that it is better than uppercase. Still, it is encoding of information in symbol names. Imagine you have expressions F and G, both containing lot of symbols. Say,

F=(a (b (c ....)

G=(((z (y ...))) [/list]Suppose that the problem is: is it possible to unify F and G with substitution of only one symbol from F? How can one solve that? He has to generate

F0=(~a (b (c ...)

F0=(a (~b (c ...)

F0=(a (b (~c ...)[/list]and then to try (unify F0 G) for each F0. Don't you see that it is encoding information in symbol name?

Another issue is that you want that unification is associated with binding. It is special use of unification, that's why Erlang code is shorter. Not only that you bind results of unification, but existing binding influences unification.

Such unibind combination has lot of sense, but unify without binding - as it is now - has sense as well. One might want to see how unification looks like and analyze it. Or, if result of unification is ((x 7)) he might want to substitute x with 7, not to set value of x to 7. I think proposed combination of binding and unification has sense with any form of unification, one that exist, and one I proposed with explicit list of "unification variables" but it shouldnt be only form of unification.

Dmi · November 16, 2008, 01:29:11 PM

~~Quote from: "Kazimir Majorinc"~~I think we have two issues here. One is denoting unification variables with ~. I agree that it is better than uppercase. Still, it is encoding of information in symbol names. Imagine you have expressions F and G...

[skopped]

Don't you see that it is encoding information in symbol name?

Yes, it is encoding. Anyway there is three cases there:

1. To encode information in symbol names with capitalizing and providing a binding list (current engine).

pros: it is current

cons:

- unify is not so useful for fast scripting (which is still the horse of newlisp in common),

- capitalized names are frequent in regular practice, so there will be some problems to map unify bindings to newlisp symbols. (of course we can always use names like UnSymbolName to avoid interference, what means introducing prefix again ;)

2. To encode information in symblol names with rare used prefix (my proposal).

pros: we can map bindings directly to newlisp symbol values.

cons: we lost some virtual language purity (but note, that current unify implementation alog with "bind" suggests exactly this way).

3. To use only binding list for both setting initial binding and enumerating symbols which must be treated as variables. Someting like:

Code Select Expand
(unify '(A B C) '(B C A) '((A 1) (B unbound)))

so there C will be treated as constant.

pros: This is the most pure way.

cons: It is much for academic cases.

My proposal is based on my interest to newlisp: fast coding of complex practical systems.

And, in common analisis about the case I gree with you, Kazimir :)

newLISP Fan Club

News:

Unify

Kazimir Majorinc

cormullion

Dmi

cormullion

Dmi

cormullion

Dmi

Kazimir Majorinc

Dmi

Kazimir Majorinc

Dmi

Kazimir Majorinc

Dmi