C from newlisp?

Started by unixtechie, October 07, 2009, 03:02:15 AM

Previous topic - Next topic

unixtechie

I saw some quoted discussion in Kazimir's blog, and there's something maybe relevant to the wider topic of how to use C from scripting languages in general, including newlisp.



There is an excellent open-source project called "tcc", a tiny C compiler. The binary is slightly over 100k (123 on my platform).

It understands all of ANSI C plus some extensions; it can compile libraries or standalone execs, and it can be used for "scripting", if the first line in the file is

a shebang invocation of #!/path/to/tcc (plus some options, e.g. lib includes etc)

TCC also contains an assembler.



The main point is that it runs roughly an order of magnitude faster that GCC (the ratio was 9 in test-compilation of the source of a links web browser, if I am not mistaken).



It also compiles on Windows, i.e. it's crossplatform



HOME PAGE: http://bellard.org/tcc/">http://bellard.org/tcc/



--------------

THEREFORE there are basically several options if one wants to use C or even assembler from his scripting language.



(a) write your C, then invoke "tcc - run" piping output back into your script.

"tcc -run ......" will compile it on the fly and not create an "a.out".

Alternatively, use a simple wrapper that checks if your inline was compiled already and reuses the generated binary, saved as a small file, not to repeat it in subsequent runs.



(b) write your C, then compile it with tcc as a library; use newlisp built-in fuction to load the tiny lib you created on the fly and talk with it using newlisp facilities



(c) TCC itself can be compiled as a static libtcc.a

Its APIs are outlined in its header file. It is possible, generally speaking, to produce an extended version of newlisp with this lib compiled in (just the way a library that implements httpd or some hashing is compiled into it).

I do not believe it is the best way, though, because of the need to learn a whole bunch of API functions and because, while not fattening the newlisp binary that much, such an add-on would prevent newlisp from remaining a standalone exec, as it will tie it to some other files (e.g. headers ), i.e. will require a "system installation". This lack of dependencies is one feature that makes newlisp drastically different (and better than) most scripting languages, in my view.



The amount of wrapping of the C code to be used with such a tiny fast one-the-fly compiler should be negligible, and I would say in most cases of practical use the need to run an extra process for the C sections will not affect the usability of the script.



Two more major points:

- tcc is so small, it generates straightforward stuff in microseconds (5-8 microseconds for something like the Fibonacci test prog). One CAN use that for DYNAMIC generation of C code from your script, not only for pre-compilation of some static parts of a program

- tcc can help in using external libraries which are difficult to use from newlisp itself, directly. One can write a simple wrapper in a few lines which will present the result of an invocation of library functions in the form convenient for passing over to newlisp (e.g. as a string or some list, whatever).





----

There is another project (I'll check the name and add it), which fakes the same scripting approach. The "script" in C is in fact passed to the full GCC compiler on the first invocation, and the compiled a.out is called on subsequent ones.

This of course is (a) slower) and (b) much heavier on IO at least.



So  I believe the TCC road - using a blazingly fast C compiler which can of course link with any existing libraries in C etc. etc. - to write convenient wrappers whenever newlisp operators are at an end and/or to write pseudo-inlined sections of code in C or assembler, which then can be used from the scripting language -- is practical and will cover most of the real-life uses.



-----

P.S. Python people already went that way, as a matter of fact:

http://www.cs.tut.fi/~ask/cinpy/">http://www.cs.tut.fi/~ask/cinpy/

"Cinpy is a Python library that allows you to implement functions with C in Python modules. The functions are compiled with tcc (Tiny C Compiler) in runtime. The results are made callable in Python through the ctypes library."

m35

#1
Ah, I see they still haven't fixed the http://lists.gnu.org/archive/html/tinycc-devel/2007-07/msg00013.html">bug I reported 2 years ago. :)



But it's good to see that project has made progress.

xytroxon

#2
I've played with TCC, but newLISP is faster!!!



(At least for n < 20 on Win 98 ;p)



; "tcc-fibo.nl"
; Fibonacci test for newLISP to TCC interface
; Tiny C Compiler (TCC) http://bellard.org/tcc/
;
; Author:        xytroxon
; Platform(s):   Win98
; Date (Y-M-D):  2009-01-01

; TCC installed (unzipped) by user at C:Program Filestcc
(set 'tcc_path (string (env "PROGRAMFILES") {tcc}))

(define (tcc-exe)
  (set 'tcc_args (join (map string (args)) " "))
  (set 'tcc_cmd (format {"%stcc.exe" %s} tcc_path tcc_args))
; (println tcc_cmd) ; for debugging command line
  (exec tcc_cmd)
)

; newLISP recursive Fibonacci method
(define (fibr n)
  (if (<= n 2) 1
    (+ (fibr (- n 1)) (fibr (- n 2)))
  )
)

; Fibonacci C code
(set 'fibo_c [text]
/* tcc fibo.c code */
#include <stdio>

int fib(n){
  if (n <= 2)
    return 1;
  else
    return fib(n-1) + fib(n-2);
}

int main(int argc, char **argv){
  int n;

  if (argc < 2){
    printf("usage: fib nn"
           "Compute nth Fibonacci numbern");
    return 1;
  }

  n = atoi(argv[1]);
  printf("%dn", fib(n));
  return 0;
}
[/text]
)

; save .c for run script and pre-compiled modes
(write-file "fibo.c" fibo_c)

; compile .exe for pre-compiled mode
(tcc-exe "fibo.c")

(println "Fibonacci test for newLISP to TCC interface.")
(println "n: (fibr), fibo.c, fibo.exe.")

(for (num 1 30)
  (println
    num ": "
    ; newlisp recursive method
    (time (print (fibr num) " ")) "ms, "
    ; run script method
    (time (print (int (first (tcc-exe "-run" "fibo.c" num))) " ")) "ms, "
    ; pre-compiled method
    (time (print (int (first (exec (string "fibo.exe " num)))) " ")) "ms."
  )
)

(exit)


Fibonacci test for newLISP to TCC interface.

n: (fibr), fibo.c, fibo.exe.

1: 1 0ms, 1 380ms, 1 220ms.

2: 1 0ms, 1 390ms, 1 220ms.

3: 2 0ms, 2 330ms, 2 220ms.

4: 3 0ms, 3 440ms, 3 220ms.

5: 5 0ms, 5 440ms, 5 160ms.

6: 8 0ms, 8 390ms, 8 220ms.

7: 13 0ms, 13 380ms, 13 220ms.

8: 21 0ms, 21 380ms, 21 220ms.

9: 34 0ms, 34 390ms, 34 220ms.

10: 55 0ms, 55 330ms, 55 270ms.

11: 89 0ms, 89 330ms, 89 220ms.

12: 144 60ms, 144 330ms, 144 220ms.

13: 233 0ms, 233 490ms, 233 220ms.

14: 377 0ms, 377 380ms, 377 220ms.

15: 610 60ms, 610 330ms, 610 270ms.

16: 987 60ms, 987 380ms, 987 220ms.

17: 1597 170ms, 1597 380ms, 1597 280ms.

18: 2584 110ms, 2584 380ms, 2584 160ms.

19: 4181 220ms, 4181 330ms, 4181 220ms.

20: 6765 380ms, 6765 390ms, 6765 160ms.

21: 10946 610ms, 10946 380ms, 10946 220ms.

22: 17711 990ms, 17711 380ms, 17711 220ms.

23: 28657 1540ms, 28657 280ms, 28657 160ms.

24: 46368 2420ms, 46368 330ms, 46368 270ms.

25: 75025 3350ms, 75025 390ms, 75025 380ms.

26: 121393 9500ms, 121393 550ms, 121393 220ms.

27: 196418 11260ms, 196418 330ms, 196418 270ms.

28: 317811 13680ms, 317811 380ms, 317811 330ms.

29: 514229 22580ms, 514229 550ms, 514229 320ms.

30: 832040 37020ms, 832040 610ms, 832040 550ms.



-- xytroxon
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

xytroxon

#3
Almost forgot ;)



The problem I ran into was that while TCC also accepts code from stdin, on Windows (at least on Win98), newLISP won't capture stdout if you use the stdin mode of (exec ) . So a file write by the C code and a file read by newLISP is required to get any output from the C program...  This is not a problem with "real" OSes ;) and will remove the need for C stcript temp files...



(define (tcc-run-stdin tcc_c)
  (set 'tcc_args (join (map string (args)) " "))
  (set 'tcc_cmd (format {"%stcc.exe" -run - %s} tcc_path tcc_args))
; (println tcc_cmd) ; for debugging command line
  (exec tcc_cmd tcc_c)
)

(for (num 1 30)
  (println
    num ": "
    ; run - stdin method
    (time (print (tcc-run-stdin fibo_c num) " ")) "ms, "
    ; ... other methods
  )
)


So for Windows, the fastest method is for newLISP to capture output from the compiled C program... But it does that from any command line program, negating the advantage of TCC...



But for "on the fly" newLISP generated C code, the "run script" method is still a good option over slower gcc, etc. methods...



One interesting application I have found for this method is to use TCC for imperative style math calculations that are easier to write and understand (now and later on) than the nested lisp prefixed operator variety that non-lispers and "me-lisper" despise  debugging ;)



Note: When using the dynamic C code generation and execution, you can make simpler C code maths by directly assigning variables in the C script equations and hence, avoiding argc and argv string to number conversion C code. Also remember to printf a newline after each calculated result, error codes etc. so newLISP can capture all the results as a list...



-- xytroxon
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

cormullion

#4
Thanks for the code - better to see things laid out...!



I presume the timings reflect more the start/end overhead of launching another process, though?

xytroxon

#5
Quote from: "cormullion"Thanks for the code - better to see things laid out...!



I presume the timings reflect more the start/end overhead of launching another process, though?


Yes, it would would nice to see what faster systems do rather than my lazy weekend hacking on Win98 :)



Stay tuned for creating dlls...



-- xytroxon
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

cormullion

#6
sadly I can't find a pre-built binary package download for tcc. Otherwise I'd have played with it! (And no I'm not  getting into compiling it... ;)