Hi -
I would like to know how to do one thing, please.
I would like to know how to remove string elements which are less than 4 characters long from my string. (By characters, i mean letters, numbers, !/#/$//,/'/;/:/etc).
For example:
If my string is this:
("a bb ccc dddd eeeee ffffff")
I would like to know which function is best to make this list only a list of strings four or more characters long.
("dddd eeeee ffffff")
I tried replace-ing
(replace "[.]" title "" 0)
and
(replace "[+]" title "" 0)
but that did not seem to clean out one character strings.
I tried various maneuvers w/ define and length but wound up lost and without solution.
Am I simply missing the "magic" regex that means "characters less than 4 characters long"?
And is replace the correct function to "remove" these things?
Thank you for any guidance!
One method, is to parse the line into words, define a small? predicate, then use the clean function. The join function can be used to make a string again.
(setq input (parse "a bb ccc dddd eeeee ffffff"))
(println input)
(define (small? x) (< (length x) 4))
(setq output (clean small? input))
(println output)
(println (join output " "))
(exit)
-- xytroxon
Cool. I was thinking of something along the same lines:
(setf input "a bb ccc dddd eeeee ffffff")
(filter (fn (s) (>= (length s) 4)) (parse input))
Thanks xytroxon and bairui !
I appreciate the examples and now understand how to use them.
Two things -
I tried both examples with this string (containing random non letter/number characters):
(setq input (parse "! @ # $$$ *- a bb ccc dddd eeeee ffffff"))
(println input)
(define (small? x) (< (length x) 4))
(setq output (clean small? input))
(println output)
(println (join output " "))
(exit)
On xytroxon's example, I get a result of this:
("!" "@")
()
On bairui's example, I get this:
()
I am still after only the string components with 4 or more characters, meaning somehow strip out those exclamations, symbols, etc. Is this possible?
The second thing,
The examples you both provided, am I close the correct way to translate to a list example?
(setq 'input '("a" "bb" "ccc" "dddd" "eeeee" "ffffff"))
(println input)
(define (small? x) (< (length x) 4))
(setq output (clean small? input))
(println output)
(println (join output " "))
(exit)
I am getting this:
nil
ERR: list expected in function clean : input
and for bairui's example:
(setq 'input '("a" "bb" "ccc" "dddd" "eeeee" "ffffff"))
(filter (fn (s) (>= (length s) 4)) (parse input))
(exit)
I get this, too:
ERR: string expected in function parse : input
Okay and thanks!
You have a symbol quoting error...
Use: (set 'input ...
or: (setq input ... or (setf input ...
but not: (setq 'input ... nor (setf 'input ...
(setq and (setf are the same as using (set '
-----------------
We can then also add some regex to parse to force it to break on one or more whitespace chars {s+}
(setq input (parse "! @ # $$$ *- a bb ccc dddd eeeee ffffff" {s+} 0))
(println input)
(define (small? x) (< (length x) 4))
(setq output (clean small? input))
(println output)
(println (join output " "))
(exit)
-- xytroxon
Joejoe, you should usually use parse with a string-break argument and optionally a regex option:
(parse string string-break regex-option)
otherwise you will see unexpected results, as newLISP tries to treat your input as source code.
> (parse "this is #1 in a list of 3")
("this" "is")
> (parse "well ; there's a thing!")
("well")
> (parse "[This sentence isn't going to be broken into words, whatever you do.")
("[This sentence isn't going to be broken into words, whatever you do.]")
> (parse "0800-074-085")
("0" "800" "-074" "-0" "85")
>
xytroxon, major thanks on using set, setq and setf properly! got it!
cormullion, thanks for the parse guidance because i will be using that a lot! ;0)
You could also use 'find-all'. In that case the regular expression describes a class of tokens instead of break strings:
(set 'input "! @ # $$$$$ *- a bb ccc dddd eeeee ffffff")
(find-all {w{4,}} input) => ("dddd" "eeeee" "ffffff")
(find-all "\w{4,}" input) => ("dddd" "eeeee" "ffffff")
(find-all "[^ ]{4,}" input) => ("$$$$$" "dddd" "eeeee" "ffffff")
Most excellent!
That is the magic regex of 4+ characters! :0)
Thanks very much Lutz and I will study the slight differences in your regexes.
Very much appreciated and thanks again!