Ability to import C functions from a function pointer?

Started by ryuo, November 02, 2014, 02:18:50 PM

Previous topic - Next topic

ryuo

I've been experimenting with libtcc, part of the tcc compiler. One of the features is it can compile C functions during runtime, and you can convert these to a function pointer much like you would when using dlopen to access functions in a shared object. I've been able to execute these functions once I cast them to an appropriate function pointer.



I was wondering if it would be possible to extend newLISP's import function to be able to import a function from a C function pointer. It would mean that I could call C functions that were compiled during the runtime of newLISP. This means that newLISP would be capable of executing C code that is generated and compiled, if you were to use a code generator of some kind.



Anyway, it's just a thought I had. Thank you for reading. :)

iNPRwANG

The newlisp module has exported a function named "newlispCallback", that's able to make a callback from a C function pointer, and then call it from newlisp.



For more reference to:

http://www.newlisp.org/downloads/CodePatterns.html">//http://www.newlisp.org/downloads/CodePatterns.html

Lutz

Using function pointers as imported functions



You could experiment with cpymem :

http://www.newlisp.org/downloads/newlisp_manual.html#cpymem">http://www.newlisp.org/downloads/newlis ... tml#cpymem">http://www.newlisp.org/downloads/newlisp_manual.html#cpymem



The example using assembler works with the standard Windows newlisp.exe and with 32-bit compiled newLISP on most other OS (Mac OSX last with Snow Leopard, not after).



Some OS seem not to allow executing code in data areas of memory. On Mac OSX the last working version is Snow Leopard. On XP and Windows 7 it works fine. Have not tried on Windows 8.



Writing exit events ( ryuo's earlier post in other thread )



You could do the following:



> (constant '_exit exit)
exit@408925
> (constant 'exit (fn () (println "Good Bye") (_exit)))
(lambda () (println "Good Bye") (_exit))
> (exit)
Good Bye
~>


So basically you save the original exit in _exit, then redefine it. When redefining primtive symbols, you have to use constant, because symbols of built-in functions are protected. Any primitive in newLISP can be redefined this way.

ryuo

Lutz, I examined that cpymem example and discovered how it works. It basically creates a new CDECL type cell with the function pointer as the contents.



However, I still must ask you. What does the next and aux fields do? In the example, it appears you copy a string pointer into the aux field, but I can't see why this is needed. I left it as a null pointer in my example.



So, do you know what the imported functions use the next and aux fields for? If nothing, then is it safe to leave them as a null pointer (0)? Thank you.

TedWalther

Lutz, have you looked at mprotect()?  It looks like it would allow the assembly language examples to work on Mac OSX again, and any other similar OS that enables NX by default.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

Lutz

Ted, thanks for the tip about mprotect(), I will look into it.



To sum it up for cpymem: both the aux and the next field in function cells must be as shown in the cpymem example and cannot be 0, else the system could crash.



The aux field in a function cell contains a pointer to the symbol name. It is used when printing a function cell or translating it to a string:

> (dump println)
(8266368 263 8262688 4476078 4216405)
> (dump nil)
(8262688 256 8262688 8262688 8262688)
> println
println@405655
> (string println)
"println@405655"          <=== the function address in hex
> (format "%X" 4216405)
"405655"
> (get-string ((dump println) 3))
"println"
>

On a 64-bit newLISP values could be bigger. The cpymem example in the manual is only for 32-bit newLISP.



The next field contains the address of the original system wide used nil cell. This cell has all fields except the type field set to that address, which is used in the next field to indicate "no link to next list member". When nil's occur as list members or atoms they are copies of that original nil cell with a different address.



Ps: updated the demo code in the manual:

http://www.newlisp.org/downloads/newlisp_manual.html#cpymem">http://www.newlisp.org/downloads/newlis ... tml#cpymem">http://www.newlisp.org/downloads/newlisp_manual.html#cpymem

the string pointer for name is now taken from the symbol as it is the case with real functions and also uses less memory

ryuo

About the usage of mprotect. I did some digging into it. For it to work, the pointer you give it must be aligned on a page size boundary. This can only be done to my knowledge by allocating with posix_memalign. And to get the page size, you must use the sysconf function. So in all, to make use of mprotect, you must use 3 functions: sysconf, posix_memalign, and mprotect. Here is an example I drafted for usage with tcc:



#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <libtcc.h>

const char source[] = "int square(int x) { return x * x; }";

int main()
{
TCCState *tcc = tcc_new();
long pagesize;
long bytes;
void *data;
int (*cb) (int);

if(tcc == NULL)
{
puts("Cannot allocate TCC state!");
return 1;
}

if(tcc_set_output_type(tcc, TCC_OUTPUT_MEMORY) == -1)
{
puts("Failed to set output type!");
tcc_delete(tcc);
return 1;
}

if(tcc_compile_string(tcc, source) == -1)
{
puts("Failed to compile source!");
tcc_delete(tcc);
return 1;
}

pagesize = sysconf(_SC_PAGE_SIZE);

if(pagesize == -1)
{
puts("Failed to retrieve page size!");
tcc_delete(tcc);
return 1;
}

bytes = tcc_relocate(tcc, NULL);

if(bytes == -1)
{
puts("Failed to retrieve bytes!");
tcc_delete(tcc);
return 1;
}

if(posix_memalign(&data, pagesize, bytes) == -1)
{
puts("Failed to allocate memory!");
tcc_delete(tcc);
return 1;
}

if(tcc_relocate(tcc, data) == -1)
{
puts("Failed to relocate!");
tcc_delete(tcc);
return 1;
}

printf("%ld - %ldn", bytes, pagesize);

if(mprotect(data, bytes, PROT_EXEC) == -1)
{
puts("Failed!");
tcc_delete(tcc);
return 1;
}

cb = data;

printf("%dn", cb(4));

tcc_delete(tcc);

return 0;
}

TedWalther

Good detective work ryuo, thank you.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

ryuo

Lutz, I have a question about how newLISP reclaims previously allocated memory. Part of what is needed to be done to make a C function callable requires the copying of a string pointer into the aux field of a cell. Do I need to make this string part of a persistent object or such? I'm not sure if my hacked cell is going to end up with a dangling pointer in the aux field at some point. Essentially, I am not sure if newLISP's memory management will consider the string to still be in use, even if the last thing using it is my hacked cell. Please advise. Thank you.

Lutz

For all function pointer cells as in built-in primitives and imported library functions using the simple or extended import syntax, the name string pointer in the cell->aux field is copied from the symbol name string ptr in the symbol structure.



When the symbol holding the function pointer cell gets deleted, the memory get freed too from the name string ptr in the symbol structure. The function cell held by the symbol gets recycled to the newLISP lisp cell pool.



So no special action is required for your C-function faked imports if function symbols are deleted or reassigned.



You could make the symbol holding the function pointer cell a constant to avoid deletion of the function on deleting of the symbol. In the example program in the manual, instead of (set 'foo print) do (constant 'foo print).



The memory holding the code - 'foo-code' in the manual example - is managed by the program. If 'foo-code' gets re-assigned with something else the code memory would be reclaimed and the function pointer in the function cell contents field would be invalid. You also could protect it using constant.

ryuo

Forgive me, but not everything here makes much sense to me right now. I'm mostly familiar with imperative languages, so not all of these functional / Lisp concepts are familiar. Some of it I thought I understood, but it seems I still have much to learn about other programming concepts.


Quote
For all function pointer cells as in built-in primitives and imported library functions using the simple or extended import syntax, the name string pointer in the cell->aux field is copied from the symbol name string ptr in the symbol structure.

Your usage of pointers here is confusing. Do you mean that for all symbols that evaluate/resolve to a usable function? Or just the builtin functions and imported C functions? My experiments appear to show that for a symbol cell, the contents is a SYMBOL struct pointer. Inside this struct is a field containing a char * name field. This pointer appears to be copied to the aux field of the function cell that is stored within the symbol cell. Is this what you were getting at?



For example, for newLISP on linux amd64:

(println ((dump print) 3))
(println ((unpack {lu lu Lu} (address 'print)) 2))

should give the same address to the same C string.


Quote
When the symbol holding the function pointer cell gets deleted, the memory get freed too from the name string ptr in the symbol structure. The function cell held by the symbol gets recycled to the newLISP lisp cell pool.

So when this symbol is deleted, the memory allocated for the name gets reused for something else. And the function cell the symbol contained gets deleted as well? Otherwise the aux field would become a dangling pointer. Does this mean that there could still remain other symbols containing function cells that point to the same function address? It sounds like the memory containing the function is not reclaimed here.


Quote
You could make the symbol holding the function pointer cell a constant to avoid deletion of the function on deleting of the symbol. In the example program in the manual, instead of (set 'foo print) do (constant 'foo print).

Eh? How does this protect the contents of a symbol from deletion? You can change the contents of a constant symbol by another call to the constant function. If you did that, the original contents of the symbol may be left without any references. So, how does this protect the function from deletion if the symbol is deleted? For that matter, it sounds like the constant symbol can be deleted during runtime. It doesn't seem to work that way if you call the delete function.


Quote
The memory holding the code - 'foo-code' in the manual example - is managed by the program. If 'foo-code' gets re-assigned with something else the code memory would be reclaimed and the function pointer in the function cell contents field would be invalid. You also could protect it using constant.

In other words, the memory allocated for the code and stored in the symbol 'foo-code' is controlled by newLISP. If the symbol 'foo-code' loses the reference to this memory, it will be reclaimed if it was the last reference? As a result, the pointer to this memory would effectively become a dangling pointer? Thus, any function cell using it would become unusable? What happens then if I allocated this memory outside of newLISP? This is the case when using libtcc. The library or the programmer manages the memory used for storing the binary code compiled from the C function definition. I would assume that newLISP will not try to free it, as it was allocated outside of the scope of newLISP.



--

In the example from the manual, it appears to copy the string pointer from the symbol foo and move it into the aux field of the function cell within the symbol. This would mean that the aux field string is controlled by newLISP then. I was thinking that it was setting sym-name to the string "foo" and then using the address of that for the aux field. I was concerned about whether this would become a dangling pointer when the last reference that newLISP knows about is gone. But that appears to be a non-issue here. Correct me if I am wrong please.

--



From what limited understanding I have so far, it would appear to me that I could write a macro to convert the function pointer to a function cell. I could then fill in the function cell's data field with the function address and the aux field with the address of the name of the symbol that the caller wishes to associate the function cell with. This leaves me with only one piece of memory that would require manual tracking: the binary function data.



Please let me know if I am understanding how newLISP handles this situation correctly. I don't expect to fully understand how newLISP is implemented, but it is good to know how to properly format a function cell to call compiled runtime code.



Also, thank you for your patience Lutz. It means a lot to me. My experience with Common Lisp was horrible. From my perspective, I was being unfairly beaten down on by the elites of their community for daring to ask "stupid" questions or such. Good way to keep your language as a niche no one cares about. *cough*



I have learned primarily from C and shell scripts to date. After about 5-6 years of that, I am now trying to learn more about the mathematical theory behind what I've been doing this whole time. I don't know all the formal terms for the concepts I have come to understand through application, but hopefully that will change. I am also wanting to learn more about the functional paradigm and its roots in lambda calculus. I also hope to learn a practical Lisp dialect as an additional programming language. I feel this would be a good test of my adaptability -- to program using different syntaxes and paradigms. Any advice you have to give me would also be appreciated.



Thank you for reading this.

Lutz

You understand well, here some more comments:


QuoteDo you mean that for all symbols that evaluate/resolve to a usable function? Or just the builtin functions and imported C functions?


Yes, just builtin functions and imported C functions - and faked imports.


QuoteMy experiments appear to show that for a symbol cell, the contents is a SYMBOL struct pointer. Inside this struct is a field containing a char * name field. This pointer appears to be copied to the aux field of the function cell that is stored within the symbol cell. Is this what you were getting at?


Yes, a symbol struct is pointed too by a symbol type lisp cell. The stuff assigned to the symbol is pointed to by the contents field of the symbols struct.


Quote(println ((dump print) 3))

(println ((unpack {lu lu Lu} (address 'print)) 2))


Yes and:
(get-string ((dump print) 3)) => "print"

QuoteSo when this symbol is deleted, the memory allocated for the name gets reused for something else. And the function cell the symbol contained gets deleted as well? Otherwise the aux field would become a dangling pointer.


Yes, the function cell gets deleted as well.


QuoteIt sounds like the memory containing the function is not reclaimed here.


Yes, the memory containing the code is not reclaimed. In the fake import of the manual example, that memory would be held by foo-code and not be reclaimed until foo-code is reassigned or deleted.


QuoteEh? How does this protect the contents of a symbol from deletion? You can change the contents of a constant symbol by another call to the constant function. If you did that, the original contents of the symbol may be left without any references.


Yes, using constant for the holding symbol assignment is just a weak protection, but could be overwritten by an other constant assignment.


Quote For that matter, it sounds like the constant symbol can be deleted during runtime. It doesn't seem to work that way if you call the delete function.


The delete function will not work on protected symbols and return nil, but yes you can delete the contents - code memory - by re-assigning to foo-code using constant.


Quote If the symbol 'foo-code' loses the reference to this memory, it will be reclaimed if it was the last reference?


Yes and note that memory allocated by newLISP is always referenced only once in newLISP. When you normally import a C-function from a library, that code memory sits somewhere in the library and is not controlled by newLISP.


QuoteAs a result, the pointer to this memory would effectively become a dangling pointer? Thus, any function cell using it would become unusable?


Yes.


QuoteWhat happens then if I allocated this memory outside of newLISP? This is the case when using libtcc. The library or the programmer manages the memory used for storing the binary code compiled from the C function definition. I would assume that newLISP will not try to free it, as it was allocated outside of the scope of newLISP.


Yes and yes to all the rest too.



One last comment to reclaiming memory in newLISP. Only the following gets reclaimed (see void deleteList() in newlisp.c):



- lisp cell memory

- string memory pointed to by cell->contents in string type cells

- bigint memory pointed to by cell->contents in bigint type cells

- lists of cell memory pointed to by cell->contents in a list/expression type cell

- arrays of cell memory pointed to by cell->contents in a array type cell

- symbol memory of SYMBOL structs

- string memory pointed by symbol->name in SYMBOL structs



Each lisp cell is only referenced once in the system but the same symbol struct may be referenced with its address by numerous symbol type cells. This is why deleting a symbol in newLISP requires a complete cell-memory walk. All references to the symbol are replaced with nil or the symbol does not get deleted when references exist depending on the delete flag used.