Lisp - Splitting Input into Separate Strings
Asked Answered
J

5

24

I'm trying to take user input and storing it in a list, only instead of a list consisting of a single string, I want each word scanned in to be its own string. Example:

> (input)
This is my input. Hopefully this works

would return:

("this" "is" "my" "input" "hopefully" "this" "works")

Taking note that I don't want any spaces or punctuation in my final list.

Any input would be greatly appreciated.

Judgeship answered 13/3, 2013 at 18:43 Comment(2)
Checkout cl-cookbook.sourceforge.net/strings.html they have a bunch of common use case functions one of which is a simple space split which you could modify to remove punctuation and the like.Radiotelephony
The Cookbook continues here: lispcookbook.github.io/cl-cookbook/strings.htmlWaiter
F
23

split-sequence is the off-the-shelf solution.

you can also roll your own:

(defun my-split (string &key (delimiterp #'delimiterp))
  (loop :for beg = (position-if-not delimiterp string)
    :then (position-if-not delimiterp string :start (1+ end))
    :for end = (and beg (position-if delimiterp string :start beg))
    :when beg :collect (subseq string beg end)
    :while end))

where delimiterp checks whether you want to split on this character, e.g.

(defun delimiterp (c) (or (char= c #\Space) (char= c #\,)))

or

(defun delimiterp (c) (position c " ,.;/"))

PS. looking at your expected return value, you seem to want to call string-downcase before my-split.

PPS. you can easily modify my-split to accept :start, :end, :delimiterp &c.

PPPS. Sorry about bugs in the first two versions of my-split. Please consider that an indicator that one should not roll one's own version of this function, but use the off-the-shelf solution.

Fuze answered 13/3, 2013 at 18:58 Comment(10)
I find plenty of material on split-sequence, but apparently I need to import the cl-utilities package, which I just can't figure out how to do =/ #imanewbJudgeship
@SeanEvans: careful! import is a CL function which you do not want here! what you need is install the package using, e.g., quicklisp: (ql:quickload "split-sequence")Fuze
@sds: Your edit broke your code (for instance, test with "" and "a").Ancilla
To clarify, the first code can't handle strings that end with a delimiter (e.g. "abc "), and the second version most of the times fails to get the last token (e.g. "ab cd" -> ("ab")).Ancilla
I think I fixed the code now. Sorry about the bugs.Fuze
I don't know the details but I have to change delimiterp into #'delimiterp for the code to work.Factitive
This is useful. However, when I try to install "split-sequence", it seems to run successfully, but then complains that this function is unknown. What could cause this? (I am using Aquamacs with SBCL and SLIME).Temuco
Also, if one uses "my-split" function from above, is it possible to split on an empty character? That is, is it possible to do a character-by-character split?Temuco
char-by-char split is easier done by coerce to listFuze
if you have problems installing split-sequence, you should ask for support from the vendor, not here. e.g., a separate question would be fine.Fuze
H
11

For that task in Common-Lisp I found useful (uiop:split-string str :separator " ") and the package uiop, in general, has a lot of utilities, take a look at the docs https://common-lisp.net/project/asdf/uiop.html#index-split_002dstring.

Harping answered 14/1, 2019 at 13:12 Comment(2)
uiop:split-string is nice, but it can’t split by newline, sadly.Dinar
@PhilippLudwig I don't believe that's true. I have a text file with a bunch of newlines and was able to run (uiop:split-string (uiop:read-file-string "input.txt") :separator uiop:+lf+). This returned a cons of strings, one for each line in the file.Negativism
W
5

There's cl-ppcre:split:

* (split "\\s+" "foo   bar baz
frob")
("foo" "bar" "baz" "frob")

* (split "\\s*" "foo bar   baz")
("f" "o" "o" "b" "a" "r" "b" "a" "z")

* (split "(\\s+)" "foo bar   baz")
("foo" "bar" "baz")

* (split "(\\s+)" "foo bar   baz" :with-registers-p t)
("foo" " " "bar" "   " "baz")

* (split "(\\s)(\\s*)" "foo bar   baz" :with-registers-p t)
("foo" " " "" "bar" " " "  " "baz")

* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
("foo" "," NIL "bar" NIL ";" "baz")

* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
("foo" "," "bar" ";" "baz")

* (split ":" "a:b:c:d:e:f:g::")
("a" "b" "c" "d" "e" "f" "g")

* (split ":" "a:b:c:d:e:f:g::" :limit 1)
("a:b:c:d:e:f:g::")

* (split ":" "a:b:c:d:e:f:g::" :limit 2)
("a" "b:c:d:e:f:g::")

* (split ":" "a:b:c:d:e:f:g::" :limit 3)
("a" "b" "c:d:e:f:g::")

* (split ":" "a:b:c:d:e:f:g::" :limit 1000)
("a" "b" "c" "d" "e" "f" "g" "" "")

http://weitz.de/cl-ppcre/#split

For common cases there is the (new, "modern and consistent") cl-str string manipulation library:

(str:words "a sentence    with   spaces") ; cut with spaces, returns words
(str:replace-all "," "sentence") ; to easily replace characters, and not treat them as regexps (cl-ppcr treats them as regexps)

You have cl-slug to remove non-ascii characters and also punctuation:

 (asciify "Eu André!") ; => "Eu Andre!"

as well as str:remove-punctuation (that uses cl-change-case:no-case).

Waiter answered 3/1, 2017 at 1:35 Comment(0)
S
0
; in AutoLisp usage (splitStr "get off of my cloud" " ") returns (get off of my cloud)

(defun splitStr (src delim / word letter)

  (setq wordlist (list))
  (setq cnt 1)
  (while (<= cnt (strlen src))

    (setq word "")

    (setq letter (substr src cnt 1))
    (while (and (/= letter delim) (<= cnt (strlen src)) ) ; endless loop if hits NUL
      (setq word (strcat word letter))
      (setq cnt (+ cnt 1))      
      (setq letter (substr src cnt 1))
    ) ; while

    (setq cnt (+ cnt 1))
    (setq wordlist (append wordlist (list word)))

  )

  (princ wordlist)

  (princ)

) ;defun
Stephniestepladder answered 25/7, 2014 at 19:51 Comment(0)
H
-1
(defun splitStr (src pat /)
    (setq wordlist (list))
    (setq len (strlen pat))
    (setq cnt 0)
    (setq letter cnt)
    (while (setq cnt (vl-string-search pat src letter))
        (setq word (substr src (1+ letter) (- cnt letter)))
        (setq letter (+ cnt len))
        (setq wordlist (append wordlist (list word)))
    )
    (setq wordlist (append wordlist (list (substr src (1+ letter)))))
)
Healthful answered 8/6, 2017 at 14:44 Comment(1)
While this may answer the question, it is always good to provide an explanation of your code and any references that may be helpful. Check out How to Answer for details on answering questions.Velodrome

© 2022 - 2024 — McMap. All rights reserved.