Split String by Delimiter and Include Delimiter - Common Lisp
Asked Answered
R

4

5

How can I split a string by a delimiter in Common Lisp, like is done in SPLIT-SEQUENCE, but also add the delimiter in the list of strings?

For example, I could write: (split-string-with-delimiter #\. "a.bc.def.com") and the result would be ("a" "." "bc" "." "def" "." "com").

I've tried the following code (make-adjustable-string makes a string that can be extended with vector-push-extend):

(defun make-adjustable-string (s)
  (make-array (length s)
    :fill-pointer (length s)
    :adjustable t
    :initial-contents s
    :element-type (array-element-type s)))

(defun split-str (string &key (delimiter #\ ) (keep-delimiters nil))
  "Splits a string into a list of strings, with the delimiter still
  in the resulting list."
  (let ((words nil)
        (current-word (make-adjustable-string "")))
    (do* ((i 0 (+ i 1))
          (x (char string i) (char string i)))
         ((= (+ i 1) (length string)) nil)
      (if (eql delimiter x)
        (unless (string= "" current-word)
          (push current-word words)
          (push (string delimiter) words)
          (setf current-word (make-adjustable-string "")))
        (vector-push-extend x current-word)))
    (nreverse words)))

But this doesn't print out the last substring/word. I'm not sure what's going on.

Thanks for the help ahead of time!

Reprieve answered 29/12, 2019 at 2:37 Comment(0)
M
3

The problem is after the end condition of the do* loop. When variable i reaches the end of the string, the do* loop is exited but there is still a current-word which has not been added yet to words. When the end condition is met you need to add x to current-word and then current-word to words, before exiting the loop:

(defun split-string-with-delimiter (string delimiter)
  "Splits a string into a list of strings, with the delimiter still
  in the resulting list."
  (let ((words nil)
        (current-word (make-adjustable-string "")))
    (do* ((i 0 (+ i 1))
          (x (char string i) (char string i)))
         ((>= (+ i 1) (length string)) (progn (vector-push-extend x current-word) (push current-word words)))
      (if (eql delimiter x)
        (unless (string= "" current-word)
          (push current-word words)
          (push (string delimiter) words)
          (setf current-word (make-adjustable-string "")))
        (vector-push-extend x current-word)))
    (nreverse words)))

However, note that this version is still buggy in that if the last character of string is a delimiter, this will be included into the last word, i.e. (split-string-with-delimiter "a.bc.def." #\.) => ("a" "." "bc" "." "def.") I'll let you add this check.

In any case, you might want to make this more efficient by looking ahead for delimiter and extracting all the characters between the current i and the next delimiter at once as one single substring.

Marindamarinduque answered 29/12, 2019 at 7:57 Comment(0)
D
8

If you're just looking for a solution, and not for an exercise, you can use cl-ppcre:

CL-USER> (cl-ppcre:split "(\\.)" "a.bc.def.com" :with-registers-p t)
("a" "." "bc" "." "def" "." "com")
Dumbarton answered 29/12, 2019 at 14:48 Comment(0)
P
7

Something like this?

  • copy sub-strings using subseq
  • using LOOP makes collecting things easier

Example:

(defun split-string-with-delimiter (string
                                    &key (delimiter #\ )
                                         (keep-delimiters nil)
                                    &aux (l (length string)))
  (loop for start = 0 then (1+ pos)
        for pos   = (position delimiter string :start start)

        ; no more delimiter found
        when (and (null pos) (not (= start l)))
        collect (subseq string start)

        ; while delimiter found
        while pos

        ;  some content found
        when (> pos start) collect (subseq string start pos)
        ;  optionally keep delimiter
        when keep-delimiters collect (string delimiter)))

Example:

CL-USER 120 > (split-string-with-delimiter "..1.2.3.4.."
                :delimiter #\. :keep-delimiters nil)
("1" "2" "3" "4")

CL-USER 121 > (split-string-with-delimiter "..1.2.3.4.."
                :delimiter #\. :keep-delimiters t)
("." "." "1" "." "2" "." "3" "." "4" "." ".")

CL-USER 122 > (split-string-with-delimiter "1.2.3.4"
                :delimiter #\. :keep-delimiters nil)
("1" "2" "3" "4")

CL-USER 123 > (split-string-with-delimiter "1.2.3.4"
                :delimiter #\. :keep-delimiters t)
("1" "." "2" "." "3" "." "4")

Or modified to work with any sequence (lists, vectors, strings, ...):

(defun split-sequence-with-delimiter (sequence delimiter
                                      &key (keep-delimiters nil)
                                      &aux (end (length sequence)))
  (loop for start = 0 then (1+ pos)
        for pos   = (position delimiter sequence :start start)

        ; no more delimiter found
        when (and (null pos) (not (= start end)))
        collect (subseq sequence start)

        ; while delimiter found
        while pos

        ;  some content found
        when (> pos start) collect (subseq sequence start pos)
        ;  optionally keep delimiter
        when keep-delimiters collect (subseq sequence pos (1+ pos))))
Plow answered 29/12, 2019 at 10:44 Comment(0)
M
3

The problem is after the end condition of the do* loop. When variable i reaches the end of the string, the do* loop is exited but there is still a current-word which has not been added yet to words. When the end condition is met you need to add x to current-word and then current-word to words, before exiting the loop:

(defun split-string-with-delimiter (string delimiter)
  "Splits a string into a list of strings, with the delimiter still
  in the resulting list."
  (let ((words nil)
        (current-word (make-adjustable-string "")))
    (do* ((i 0 (+ i 1))
          (x (char string i) (char string i)))
         ((>= (+ i 1) (length string)) (progn (vector-push-extend x current-word) (push current-word words)))
      (if (eql delimiter x)
        (unless (string= "" current-word)
          (push current-word words)
          (push (string delimiter) words)
          (setf current-word (make-adjustable-string "")))
        (vector-push-extend x current-word)))
    (nreverse words)))

However, note that this version is still buggy in that if the last character of string is a delimiter, this will be included into the last word, i.e. (split-string-with-delimiter "a.bc.def." #\.) => ("a" "." "bc" "." "def.") I'll let you add this check.

In any case, you might want to make this more efficient by looking ahead for delimiter and extracting all the characters between the current i and the next delimiter at once as one single substring.

Marindamarinduque answered 29/12, 2019 at 7:57 Comment(0)
L
3

For the case that you want to split with many delimiters, and keep them:

(defun split-string-with-delims (str delims)
  (labels ((delim-p (c)
             (position c delims))
           (tokens (stri test)
             (when (> (length stri) 0)
               (let ((p (position-if test stri)))
                 (if p
                     (if (= p 0)
                         (cons (subseq stri 0 (1+ p))
                               (tokens (subseq stri (1+ p) nil) test))
                         (cons (subseq stri 0 p)
                               (tokens (subseq stri p nil) test)))
                     (cons stri nil))))))
    (tokens str #'delim-p)))

And you can call it either:

(split-string-with-delims ".,hello world,," '(#\. #\, #\ ))
  ; => ("." "," "hello" " " "world" "," ",")

or:

(split-string-with-delims ".,hello world,,!!" "., ")
  ; => ("." "," "hello" " " "world" "," "," "!!")

Concerning your code, since there is subseq, i'd go for Rainer Joswig's way(above), instead of your make-adjustable-string + vector-push-extend.

Lanfri answered 29/12, 2019 at 17:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.