Alternative JSON -> newLISP conversion

Started by Kirill, November 12, 2009, 07:01:49 AM

Previous topic - Next topic

Kirill

Hello!



There is a good module for converting between JSON and newLISP data structures. It's clean and very elegant. But it is unfortunately not usable for converting bigger JSON structures to newLISP because it takes a lot of time and just crashes.



My thought was: hey, JSON looks very much like LISP. Ok, it has [ and }, but they can be replaces with regular parens? If I just replace those braces with regular parens and remove comma and colon, why wouldn't newLISP be able to parse it? So here's a very quick and dirty hack that implements most of JSON. It lacks support for objects/dictionaries (they are treated just like arrays for now) and I should probably check that numbers are correct.



(set 'jsstr (trim (read-file "/home/km/conf/gcontacts.json")))
(set 'lspstr "")
(dostring (c jsstr)
          (let (x (char c))
            (if
              (= x "[") (write-buffer lspstr "(")
              (= x "{") (write-buffer lspstr "(")
              (= x "}") (write-buffer lspstr ")")
              (= x "]") (write-buffer lspstr ")")
              (= x ":") (write-buffer lspstr " ")
              (= x ",") (write-buffer lspstr " ")
                        (write-buffer lspstr x))))
; TODO: support for dicts, check that numbers are fine
(set 'result (eval-string (string "'" lspstr)))
(set-ref-all 'false result nil)
(set-ref-all 'null result nil)
;(println result)



-rw-r-----  1 km  km  898372 Nov 12 02:20 /home/km/conf/gcontacts.json


Parsing this on my machine takes under one second:



0.890u 0.022s 0:00.93 97.8%     313+854k 0+0io 0pf+0w


Trying the same with json.lsp's Json:json->lisp:



ERR: call or result stack overflow : = <805ADC0>
called from user defined function Json:read-string
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from user defined function Json:tokenize
called from
18.257u 8.775s 0:28.32 95.4%    314+896k 0+0io 0pf+0w


There's one more thing to take care of: []{},: might be inside quoted strings. They shouldn't be touched there. But that should be an easy fix.



Kirill

Lutz

#1
If you go that route, I wonder  if it would be much faster to use regular 'replace' to replace those brackets, braces, colon and comma. You don't even need the regular expression option, making it even faster:


(replace "[" jsstr "(")
(replace "]" jsstr ")")
(replace "{"  jsstr "(")
(replace "}" jsstr ")")
(replace ":" jsstr " ")
(replace "," jsstr " ")

Kirill

#2
I forgot one thing. Those []{},: might be inside strings as well. That's why I wanted to scan the string character by character. But then something disturbed me and I ended up with what you see :)

In any case - this was just a proposal for an alternative way to handle JSON data.

itistoday

#3
Quote from: "Kirill"I forgot one thing. Those []{},: might be inside strings as well. That's why I wanted to scan the string character by character. But then something disturbed me and I ended up with what you see :)

In any case - this was just a proposal for an alternative way to handle JSON data.


Awesome!



If you can get this to support JSON fully I'll add it to Dragonfly! :-)
Get your Objective newLISP groove on.

itistoday

#4
BTW, I think this is another great example that demonstrates the need for anonymous concepts in newLISP. There's no equivalent to JSON/Clojure's anonymous dictionaries { key value }.  A specific syntax to distinguish arrays from lists would also be nice (like Clojure's [a b c]).



How you're going to get those JSON dictionaries and objects converted properly I don't know, but I bet it's not going to be pretty. You might have to resort to using the slow and inappropriate associative lists...
Get your Objective newLISP groove on.

Lutz

#5
The issues involved cannot be discussed without looking at the applications side and how JSON objects are used. The problems then are very similar to those when handling large XML files.



For the smaller objects/arrays (up to a a hundred), association lists/simple lists are the right way to go. There is a reason association lists are so central in LISP, because they are handled efficiently, and most association lists handled in a project are small. The same is true for lists versus arrays.



The few big key:values lists loaded from a JSON files can easily be converted to dictionaries in newLISP in one statement:
((new Tree 'Dictionary) '(("a" 1) ("b" 2) ("c" 3)))
Vice versa an association list can be generated in one statement from the dictionary:
(Dictionary) => (("a" 1) ("b" 2) ("c" 3))
Typically those conversions would be done only once in a program, when loading or saving a bigger JSON data set.



The situation with arrays is similar. Look at arrays as lists optimized  for random access. Normally it is best to start a project using only lists than optimize by simply converting specific lists to arrays for random access.



The fact that lists and arrays look the same (except during creation), is very practical, typically only one place in the code needs to be changed for conversion.



In most applications there will be only one or very few large arrays (if at all) and those are easily converted once during saving and loading of a JSON data file.



Ps: Jeff's json.lsp module is well written and usable for small to medium JSON files. Only parts of the 'tokenize' function would have to be converted from a recursive to an iterative pattern to handle bigger JSON databases. The other functions can stay recursive.

Kirill

#6
Quote from: "Lutz"Ps: Jeff's json.lsp module is well written and usable for small to medium JSON files. Only parts of the 'tokenize' function would have to be converted from a recursive to an iterative pattern to handle bigger JSON databases. The other functions can stay recursive.


I second that. It's a pleasure to read Jeff's code. Having thought a bit about importing bigger JSON structures, it's better to use json.lsp. In my case I can use the approach I posted above, because the JSON structure will come from another script. I know that it won't contain surprises.



-- Kirill

itistoday

#7
Quote from: "Lutz"The issues involved cannot be discussed without looking at the applications side and how JSON objects are used.


They can't in newLISP, and that's exactly the problem, as in other languages this problem can be solved efficiently in a generic fashion, without knowing in advance the structure of the JSON data. In newLISP associative lists must be used in all cases, even when it would be far better to use hashtables/red-black trees.


QuoteThe few big key:values lists loaded from a JSON files can easily be converted to dictionaries in newLISP in one statement:
((new Tree 'Dictionary) '(("a" 1) ("b" 2) ("c" 3)))


What if you're dealing with a JSON data source that can have multiple large dictionaries, and you don't know where they're going to be?



That example is inappropriate for a generic JSON module.
Get your Objective newLISP groove on.

Lutz

#8
No language knows in advance the type structure of the JSON string it is going to parse, but will do so during the parsing process. You can insert type identifiers into the s-expr during parsing and end up with something like this:


((vector (1 2 "foo" 3 4 5)) (association (("a" 1) ("b" 1)))

Because JSON handles only two composite data types (objects and vector-arrays), this is all you need. In the real world I have never seen a necessity for this type of tagging. You always know enough of the data structure of the data your are dealing with.  



Its always possible to come up with edge cases and scenarios, trying to prove the case that anonymous dictionaries are a necessary, but in practice you solve real world problems just fine without it.



Here is a JSON parser in newLISP:



http://www.newlisp.org/syntax.cgi?code/json.lsp.txt">http://www.newlisp.org/syntax.cgi?code/json.lsp.txt



Most of this: http://json.org/">http://json.org/ is implemented. The tokenizer (just a 'find-all') doesn't know all escaped characters, but covers most of it. One could pre-compile the patterns with 'regex-comp' and inline 'is-string' and 'is-number' for top performance.

itistoday

#9
Quote from: "Lutz"No language knows in advance the type structure of the JSON string it is going to parse, but will do so during the parsing process. You can insert type identifiers into the s-expr during parsing and end up with something like this:


((vector (1 2 "foo" 3 4 5)) (association (("a" 1) ("b" 1)))

Because JSON handles only two composite data types (objects and vector-arrays), this is all you need. In the real world I have never seen a necessity for this type of tagging. You always know enough of the data structure of the data your are dealing with.


Since when was it decided that lists are "the superior race of data structures"? I don't understand this over-reliance on using lists to solve all the problems.



Why are lists being used to represent maps and arrays, when maps and arrays are far more appropriate.



It is rare when this sort of one-size-fits-all mentality leads to an efficient or successful result. When it really does "fit all" then it's OK, but here clearly there is a far more elegant, faster, and more appropriate solution.



It is like trying to water a plant with beer. Sure, it could work, but water would probably lead to better results. Therein lies the identical dilemma of applying the list data structure to data that does not suit it.
Get your Objective newLISP groove on.

Kirill

#10
Hello again, all.



The new standard http://www.newlisp.org/code/modules/json.lsp">json module works pretty well.



In the http://www.newlisp.org/code/modules/json.lsp.html">docs, the function jason2sexpr is being referred to, while it really should be json2expr.



Thanks.



-- Kirill

kanen

#11
I also posted a Json.lsp Module on my web site, which goes from JSON->LISP and LISP->JSON.



It's a modified version of Lutz' module, updated and with code from John DeSanto (former kozoru Team Member).



It probably makes sense to fold the changes back into the "official" newLisp module.



http://www.lifezero.org/journal/2011/1/10/newlisp-update.html">//http://www.lifezero.org/journal/2011/1/10/newlisp-update.html
. Kanen Flowers http://kanen.me[/url] .