Split a string even if the last character is a delimiter
Asked Answered
W

2

0

I want to delete some characters at the end of a string.

I made this function :

(defun del-delimiter-at-end (string)
  (cond
    ((eq (delimiterp (char string (- (length string) 1))) nil) 
        string )
    (t 
        (del-delimiterp-at-end (subseq string 0 (- (length string) 1))) ) ) )

with this :

(defun delimiterp (c) (position c " ,.;!?/"))

But I don't understand why it doesn't work. I have the following error :

Index must be positive and not -1

Note that I want to split a string in list of strings, I already looked here :

Lisp - Splitting Input into Separate Strings

but it doesn't work if the end of the string is a delimiter, that's why I'm trying to do that.

What am I doing wrong? Thanks in advance.

Warring answered 2/6, 2014 at 17:57 Comment(1)
Based on the answer that you accepted, it sounds more like you wanted to know how to trim a string from the right side, but your original question title is "Split a string even if the last character is a delimiter". Can you clarify what exactly you're looking for?Turley
C
8

The Easy Way

Just use string-right-trim:

(string-right-trim " ,.;!?/" s)

Your Error

If you pass an empty string to you del-delimiter-at-end, you will be passing -1 as the second argument to char.

Your Code

  1. There is no reason to do (eq (delimiterp ...) nil); just use (delimiterp ...) instead (and switch the clauses!)

  2. It is mode idiomatic to use if and not cond when you have just two clauses and each has just one form.

  3. You call subseq recursively, which means that you not only allocate memory for no reason, your algorithm is also quadratic in string length.

Cascara answered 2/6, 2014 at 18:25 Comment(1)
Thank you again for your response and your advices! I just want to mention that the error came from the fact that I didn't set string with the new value, I forgot (setq string (del-delimiter-at-end (string)))Warring
T
2

There are really two questions here. One is more specific, and is described in the body of the question. The other is more general, and is what the title asks about (how to split a sequence). I'll handle the immediate question that's in the body, of how to trim some elements from the end of a sequence. Then I'll handle the more general question of how to split a sequence in general, and how to split a list in the special case, since people who find this question based on its title may be interested in that.

Right-trimming a sequence

sds answered this perfectly if you're only concerned with strings. The language already includes string-right-trim, so that's probably the best way to solve this problem, if you're only concerned with strings.

A solution for sequences

That said, if you want a subseq based approach that works with arbitrary sequences, it makes sense to use the other sequence manipulation functions that the language provides. Many functions take a :from-end argument and have -if-not variants that can help. In this case, you can use position-if-not to find the rightmost non-delimiter in your sequence, and then use subseq:

(defun delimiterp (c)
  (position c " ,.;!?/"))

(defun right-trim-if (sequence test)
  (let ((pos (position-if-not test sequence :from-end t)))
    (subseq sequence 0 (if (null pos) 0 (1+ pos)))))
(right-trim-if "hello!" 'delimiterp)    ; some delimiters to trim
;=> "hello"

(right-trim-if "hi_there" 'delimiterp)  ; nothing to trim, with other stuff
;=> "hi_there"

(right-trim-if "?" 'delimiterp)         ; only delimiters
;=> ""

(right-trim-if "" 'delimiterp)          ; nothing at all
;=> ""

Using complement and position

Some people may point out that position-if-not is deprecated. If you don't want to use it, you can use complement and position-if to achieve the same effect. (I haven't noticed an actual aversion to the -if-not functions though.) The HyperSpec entry on complement says:

In Common Lisp, functions with names like xxx-if-not are related to functions with names like xxx-if in that

(xxx-if-not f . arguments) ==  (xxx-if (complement f) . arguments)

For example,

 (find-if-not #'zerop '(0 0 3)) == 
 (find-if (complement #'zerop) '(0 0 3)) =>  3

Note that since the xxx-if-not functions and the :test-not arguments have been deprecated, uses of xxx-if functions or :test arguments with complement are preferred.

That said, position and position-if-not take function designators, which means that you can pass the symbol delimiterp to them, as we did in

(right-trim-if "hello!" 'delimiterp)    ; some delimiters to trim
;=> "hello"

complement, though, doesn't want a function designator (i.e., a symbol or function), it actually wants a function object. So you can define right-trim-if as

(defun right-trim-if (sequence test)
  (let ((pos (position-if (complement test) sequence :from-end t)))
    (subseq sequence 0 (if (null pos) 0 (1+ pos)))))

but you'll have to call it with the function object, not the symbol:

(right-trim-if "hello!" #'delimiterp)
;=> "hello"

(right-trim-if "hello!" 'delimiterp)
; Error

Splitting a sequence

If you're not just trying to right-trim the sequence, then you can implement a split function without too much trouble. The idea is to increment a "start" pointer into the sequence. It first points to the beginning of the sequence. Then you find the first delimiter and grab the subsequence between them. Then find the the next non-delimiter after that, and treat that as the new start point.

(defun split (sequence test)
  (do ((start 0) 
       (results '()))
      ((null start) (nreverse results))
    (let ((p (position-if test sequence :start start)))
      (push (subseq sequence start p) results)
      (setf start (if (null p) 
                      nil
                      (position-if-not test sequence :start p))))))

This works on multiple kinds of sequences, and you don't end up with non delimiters in your subsequences:

CL-USER> (split '(1 2 4 5 7) 'evenp)
((1) (5 7))
CL-USER> (split '(1 2 4 5 7) 'oddp)
(NIL (2 4))
CL-USER> (split "abc123def456" 'alpha-char-p)
("" "123" "456")
CL-USER> (split #(1 2 3 foo 4 5 6 let 7 8 list) 'symbolp)
(#(1 2 3) #(4 5 6) #(7 8))

Although this works for sequences of all types, it's not very efficient for lists, since subseq, position, etc., all have to traverse the list up to the start position. For lists, it's better to use a list specific implementation:

(defun split-list (list test)
  (do ((results '()))
      ((endp list)
       (nreverse results))
    (let* ((tail (member-if test list))
           (head (ldiff list tail)))
      (push head results)
      (setf list (member-if-not test tail)))))
CL-USER> (split-list '(1 2 4 5 7) 'oddp)
(NIL (2 4))
CL-USER> (split-list '(1 2 4 5 7) 'evenp)
((1) (5 7))

Instead of member-if and ldiff, you could also us cut from this answer to Idiomatic way to group a sorted list of integers?.

Turley answered 2/6, 2014 at 20:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.