Newbie question: RDBMS versus newLISP?

Started by gloodnc, October 28, 2007, 04:10:32 PM

Previous topic - Next topic

gloodnc

Hi,



I'm new to newLISP and slowly working my way through the tutorials. Soon I will be working with large transaction logs that need to be examined for relationships and data mining (which is too heavy a definition for the type of simple analysis that I'm supposed to be doing).



Typically, the work flow recommended to me has been to load the transaction logs into a RDBMS of some type (Oracle, sqlite3, mysql, monetdb, etc) and then run SQL queries against it. After learning more about newLISP, why could I not have a newLISP script run directly against the (text) transaction logs?



Each set of logs will be in the range of 10-35million rows with no more than 15 columns of information. What do RDBMS applications offer over newLISP (or other scripting language)?





Thanks in advance.

newdep

#1
Hello gloodnc,



It actualy all depends on what you want from the logs?



If you use SQL from within the database, thats quick and

flexible enough. If you want to parse and process individualy logs then

another language could be a good alternative indeed! being independent

from SQL.



35 Million rows is not a problem (except if you want to have them processed

within a minute ;-)



Norman.
-- (define? (Cornflakes))

cormullion

#2
I've been working with sqlite3 recently, and I liked it because it was easy to do quite complicated queries that didn't require too much thought. It seemed sensible to let sqlite concentrate on what it was good at, and leave newlisp to do what it was good at.



However - this doesn't really answer the question objectively, as to which is better at which task!



At the time, I did appreciate the fact that sqlite queries are easily built and tested - you can even test them in another application, or get help from sql-speaking folk. Perhaps you can try both out and see which shapes up...

Jeff

#3
The primary reason to use a relational database over text files is that text parsing is expensive and flaky.  A database product like MySQL is going to provide stable, reliable access to data, whereas , especially for large data sets, text files will require a lot of cpu time to read/parse, search, etc.



Generally, we rely on regular expressions to search text, which is also an expensive operation.



The fastest way to parse a regular text would be to write a lexer for that text, but you still would then have to store the data and create optimized search routines.



The community has spent years optimizing mysql's and sqllite's performance.  Of course, a row-based solution like most sql dbs (mysql and oracle included) isn't the fastest for searching.  Row-based storage is fastest for inserts and updates.  A column-oriented approach is fastest for searches, since the fields desired are all stored together, and you are essentially searching a text index of each row (http://en.wikipedia.org/wiki/Column-oriented_DBMS">http://en.wikipedia.org/wiki/Column-oriented_DBMS).
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

gloodnc

#4
Thanks for the feedback folks. Given the time constraints, I only have a chance to use one or the other approach -- I'll stick with the time tested extract/load/transform with a column-orientated database since I'm looking for fast searches and not looking to modify the transaction logs.





Ciao.

cormullion

#5
You can easily run sqlite/MySQL/Postgres/whatever under the control of newLISP? Then you get the best of both worlds...!

Jeff

#6
You can use newLisp to connect to your database and do the logic end of mining the data.  Generally, sql is not going to be powerful enough to do real analysis.  It can narrow your collection, but not make any intuitive choices about what to do with it.  Data returned from a query will become a list in newLisp, which can then be operated on using match, unify, or prolog-like logic.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code