script dir detection: any ideas for improvement?

Started by hartrock, September 14, 2015, 07:59:31 PM

Previous topic - Next topic

hartrock

Some effort is needed to detect the dir of a started script; currently there is (it should work with or without shebang start):
 ;; script dir detection
  (set 'Inspector:scriptname "startIt.lsp"
       'Inspector:dir ; be robust against CLI args not containing scriptname
       (0 (- (length Inspector:scriptname)) ; only leave dirpath
          (first (filter (fn (a) (find Inspector:scriptname a))
                         (main-args)))))
  (if (null? Inspector:dir)
      (set 'Inspector:dir ".")) ; cwd

This is not perfect:

[*] it's easy to forget changing the scriptname inside, if it will be changed outside (by renaming it);
  • [*] it may fail, if there is an arg before containing script name (e.g. a script loaded before);

  • [*] code is quite long.
  • [/list]

    Any ideas for improvement?

    rrq

    #1
    The following stanza at the top of a file seems to do the job, on Linux:


    (constant 'meself
              (letn ((base (format "/proc/%d/fd" (sys-info -3)))
                     (fd ((sort (directory base)) -2)))
                ((exec (format "readlink -f %s/%s" base fd)) 0)))


    It doesn't work on Mac of course, since it doesn't have a procfs, and it probably doesn't work on Windows either for the same or similar reasons. Maybe there are variants that do work; i.e., to determine the name of a most recently opened (and closed) file.



    Sometimes I've been wishing this to be available as a system function; especially to know the top level script file currently being evaluated. Though just a nice-to-have, and not very important.

    xytroxon

    #2
    On windows, when my newLISP script is executed from a .bat or .cmd file, I can use:



    (main-args)
    > ("C:\Program Files (x86)\newlisp\newlisp.exe" "C:\Users\Apps\newlisp\work\test.lsp")

    (set 'script_name (last (parse (main-args 1) "\")))
    > test.lsp

    (set 'script_dir (join (0 -1 (parse (main-args 1) "\")) "\"))
    > C:\Users\Apps\newlisp\work


    Too bad this isn't standard behavior on all platforms. But I think it is operating system dependent.



    -- xytroxon
    \"Many computers can print only capital letters, so we shall not use lowercase letters.\"

    -- Let\'s Talk Lisp (c) 1976

    xytroxon

    #3
    Ugh! Of course the "current working directory" is easily found with real-path



    (set 'current-dir (real-path))
    > C:\Users\Apps\newlisp\work

    (change-dir "..")
    > true

    (set 'current-dir (real-path))
    > C:\Users\Apps\newlisp




    PS: Both of your code fragments are problematic on Windows ;o)



    --xytroxon
    \"Many computers can print only capital letters, so we shall not use lowercase letters.\"

    -- Let\'s Talk Lisp (c) 1976

    rrq

    #4
    I understood hartrock's problem to be how to know the name of the script file currently being evaluated, without hard-coding its name into the script. There are of course a range of ways to understand this, including the ones we've solved.



    For example, hartrock initially refers to "the directory of the script", and then implements this in the way of attempting to locate and clip the command line path name used when referring to the hard-coded script name. He rightly points at the hard-coding of the name into the script as a maintenance problem.



    My suggestion, which (likely) is Linux only, tries to exploit procfs to find the last opened file descriptor (which hangs around a while in procfs even after being closed in the program), and then pick up the file name from there. This method, where it works, locates the canonical path name for that file (i.e., resolving any links), and assumes this to be the name of the script file currently being evaluated, whether mentioned on the command line or loaded recursively or manually.



    xytroxon's first approach chooses the second command line argument, and clips that path name. This is a simple and straight-forward method that usually works, except of course if the script is not the second argument.



    And xytroxon's second approach provides the current working directory, which also is good if you are sure that the script file resides there.



    Ideally, I think, the system load function should be modified so as to maintain a push-down list of the files being loaded while they are loaded, and there should be a function (maybe even just load without arguments) that returns that list. Through this, generic code could be written to work out where "sibling" files are for the files being loaded while they are loaded. But still, the benefit of having this might not outweigh the effort of doing that modification?

    hartrock

    #5
    First thanks to all for your input!


    Quote from: "ralph.ronnquist"I understood hartrock's problem to be how to know the name of the script file currently being evaluated, without hard-coding its name into the script. ...
    This is one usecase: e.g. for output of scriptname by logging and getopts.
    Quote from: "ralph.ronnquist"For example, hartrock initially refers to "the directory of the script", ...
    This is another usecase: being able to load code from other files located relative to executed script.

    Using NEWLISPDIR by storing code relative to it, is not an alternative for code not thought as suited or mature enough to stand for itself (*) as published module. Moreover it needs admin rights and has the risk of overwriting something, if something has to be installed in this area as a precondition for getting some software run (**).
    Quote from: "ralph.ronnquist"My suggestion, which (likely) is Linux only, tries to exploit procfs to find the last opened file descriptor (which hangs around a while in procfs even after being closed in the program), and then pick up the file name from there. This method, where it works, locates the canonical path name for that file (i.e., resolving any links), and assumes this to be the name of the script file currently being evaluated, whether mentioned on the command line or loaded recursively or manually.
    I think this can be a good solution, if you are limited to Linux-like OSes. But changing it to a working one for other systems - if someone wants to use some software at another kind of OS - is not trivial (it's Linux expert code (and I cannot say, if and what are the restrictions between different Linux or Linux like OSes here)).
    Quote from: "ralph.ronnquist"
    xytroxon's first approach chooses the second command line argument, and clips that path name. This is a simple and straight-forward method that usually works, except of course if the script is not the second argument.
    This is a show stopper for newlisp calls of scripts with newlisp CLI arguments before them and/or calling multiple scripts.
    Quote from: "ralph.ronnquist"
    And xytroxon's second approach provides the current working directory, which also is good if you are sure that the script file resides there.
    Such an assumption may or may not hold: which is bad for having flexibility and robustness at the same time.



    My current usecase is https://github.com/hartrock/Inspector">//https://github.com/hartrock/Inspector:

    - installation of this app should be as easy as possible,

    - to get it run should be as easy and robust as possible, without the need to write complicated explanations about

    - how to adapt paths in scripts for other configurations, and/or

    - restrictions about how to call start scripts.


    Quote from: "ralph.ronnquist"
    Ideally, I think, the system load function should be modified so as to maintain a push-down list of the files being loaded while they are loaded, and there should be a function (maybe even just load without arguments) that returns that list. Through this, generic code could be written to work out where "sibling" files are for the files being loaded while they are loaded.
    I like this idea!

    In addition to the possibility to write platform independent code, this would even allow to check for duplicated file loads, which usually are not wanted.



    Another point at my wishlist is to have additional info about which (main-args) index executed script has (interpreter knows, script not): this is good for a robust getopts (robustness is very important for a good user (and developer, too!) experience, so I'm stressing this point here).


    Quote from: "ralph.ronnquist"But still, the benefit of having this might not outweigh the effort of doing that modification?

    The interpreter (and its creator ;-) ) has all the knowledge needed; so this shouldn't be very hard: it feels wrong to create hackish workarounds, due to not having important info about script properties, which could be provided by simple interpreter queries. (***)



    Footnotes:

    (*) There may be strong interdependencies between parts of code, nevertheless located in different files. It's not easy to refactor this to modules, standing each for itself.

    (**) This may users let hesitate to just try out some unknown software.

    (***) Meta: This is my technical opinion, amongst others arising from praxis: on the other side I'm knowing very well, that even small changes can be impossible, if there is no time to do them.

    hartrock

    Quote from: "hartrock"The interpreter (and its creator ;-) ) has all the knowledge needed; so this shouldn't be very hard:

    Perhaps it's harder than I'm thinking so far, because the process of loading and interpreting files may be more complicated as I know. It may be difficult to find the right place for building suggested file load list: at least for me it would need much time to understand the inner workings of the interpreter better, before trying to do such a thing.

    It may be worth to take the effort (btw: it's interesting to understand the inner workings of the newLISP interpreter better, which is a nice peace of software), if there would be a chance to get such an interpreter extension accepted by Lutz...

    I'm more interested in developing solutions than in writing posts about not existing ones: but such a one has no chance without being accepted as part of interpreter core.
    Quote from: "hartrock" it feels wrong to create hackish workarounds, due to not having important info about script properties, which could be provided by simple interpreter queries. (***)

    I have no idea, how to do this reasonably as a module (which could be located at the official place inside NEWLISPDIR/modules/), to avoid changing interpreter core: any idea? Possibly I'm missing opportunities here.


    Quote from: "hartrock"(***) Meta: This is my technical opinion, amongst others arising from praxis:
    I have started locating parts of software related to NEWLISPDIR, but later changed this to have user-local files/directories. Especially for parts not meant to be reusable by others - e.g. by not having the time to prepare them for this purpose - this seems to be the wrong approach to me: it would solve the 'find a directory for loading parts of the software' issue, but it has drawbacks:

    [*] admin rights needed,
  • [*] namespace/overwrite issues,

  • [*] experimental parts of software at 'official' place,

  • [*] all of the former make it more difficult for users not to fear to just trying out some software.
  • [/list]

    If there would be some policy about

    [*] how to name contexts with some kind of prefix, and
  • [*] how to name subdirs of NEWLISPDIR/modules/,
  • [/list]
    to avoid name clashes with other software, this could be a way to go.

    But there is no such policy, therefrom my interest in having reliable user-local places for installation (which even with such a policy would have their merits).
    Quote from: "hartrock"
     on the other side I'm knowing very well, that even small changes can be impossible, if there is no time to do them.

    And there is more motivation for taken the time needed for some contribution, if there is a chance to get such a contribution accepted...



    PS: Currently my solution - see first post in this thread - continues to look like a pragmatic one in current situation: but it has it's drawbacks - so I don't really like it - like

    [*] not being elegant,
  • [*] having maintenance risks/efforts, and

  • [*] being dangerous for making scripts taken filename arguments (which could be a file named like the script itself, too).
  • [/list]

    rrq

    #7
    Since some while ago, I tend to package up my newlisp programs into tar files, and then install a newlisptar binary (embedding lsptar.lsp) at the production places. It's easy enough to set up a Makefile to collate various files into temporary local files for packing them into the tar appropriately. This has the advantage for me that the production application is held in a single file, and there is no risk that component files get misplaced or confusion about which they are.



    Apart from making that comment, I basically agree with you; the lack of prescribed scheme for modularising is great in terms of offering flexibility, but it leads to that everyone has to invent their own.