Check for proper list in Common Lisp
Asked Answered
F

4

5

Is there a standard function in Common Lisp that can check against improper lists (i.e. circular and dotted lists) without signaling an error? list-length can check against circular lists (it returns nil for them), but signals type-error when given a dotted list.

Scheme's list? traverses the whole list to make sure it is not dotted or circular; Common Lisp's listp only checks that it's given nil or a cons cell.

Here's the simplest I could come up with:

(defun proper-list-p (x)
  (not (null (handler-case (list-length x) (type-error () nil)))))

Since several implementations have been suggested and many unexpected problems have been found, here's a test suite for aspiring proper-list-p writers:

(defun circular (xs)
  (let ((xs (copy-list xs)))
    (setf (cdr (last xs)) xs)
    xs))

(assert (eql t (proper-list-p '())))
(assert (eql t (proper-list-p '(1))))
(assert (eql t (proper-list-p '(1 2))))
(assert (eql t (proper-list-p '(1 2 3))))

(assert (not (proper-list-p 1)))
(assert (not (proper-list-p '(1 . 2))))
(assert (not (proper-list-p '(1 2 . 3))))
(assert (not (proper-list-p '(1 2 3 . 4))))

(assert (not (proper-list-p (circular '(1)))))
(assert (not (proper-list-p (circular '(1 2)))))
(assert (not (proper-list-p (circular '(1 2 3)))))
(assert (not (proper-list-p (list* 1 (circular '(2))))))
(assert (not (proper-list-p (list* 1 2 (circular '(3 4))))))
Fogged answered 16/2, 2020 at 11:13 Comment(0)
A
3

There is no standard function to do this, perhaps because such a function was seen as rather expensive if it was to be correct, but, really, this just seems like am omission from the language to me.

A minimal (not very performant) implementation, which does not rely on handling errors (Python people think that's a reasonable way to program, I don't, although this is a stylistic choice), is, I think

(defun proper-list-p (l)
  (typecase l
    (null t)
    (cons
     (loop for tail = l then (cdr tail)
           for seen = (list tail) then (push tail seen)
           do (cond ((null tail)
                     (return t))
                    ((not (consp tail))
                     (return nil))
                    ((member tail (rest seen))
                     (return nil)))))))

This takes time quadratic in the length of l, and conses proportional to the length of l. You can obviously do better using an hashtable for the occurs check, and you can use a tortoise-&-hare algorithm do avoid the occurs check (but I'm not sure what the complexity of that is off the top of my head).

I am sure there are much better functions than this in libraries. In particular Alexandria has one.


While thinking about this question, I also wrote this function:

(defun classify-list (l)
  "Classify a possible list, returning four values.

The first value is a symbol which is
- NULL if the list is empty;
- LIST if the list is a proper list;
- CYCLIC-LIST if it contains a cycle;
- IMPROPER-LIST if it does not end with nil;
- NIL if it is not a list.

The second value is the total number of conses in the list (following
CDRs only).  It will be 0 for an empty list or non-list.

The third value is the cons at which the cycle in the list begins, or
NIL if there is no cycle or the list isn't a list.

The fourth value is the number if conses in the cycle, or 0 if there is no cycle.

Note that you can deduce the length of the leading element of the list
by subtracting the total number of conses from the number of conses in
the cycle: you can then use NTHCDR to pull out the cycle."
  ;; This is written as a tail recursion, I know people don't like
  ;; that in CL, but I wrote it for me.
  (typecase l
    (null (values 'null 0 nil 0 0))
    (cons
     (let ((table (make-hash-table)))
       (labels ((walk (tail previous-tail n)
                  (typecase tail
                    (null
                     (values 'list n nil 0))
                    (cons
                     (let ((m (gethash tail table nil)))
                       (if m
                           (values 'cyclic-list n tail (- n m))
                         (progn
                           (setf (gethash tail table) n)
                           (walk (cdr tail) tail (1+ n))))))
                    (t
                     (values 'improper-list n previous-tail 0)))))
         (walk l nil 0))))
    (t (values nil 0 nil 0))))

This can be used to get a bunch of information about a list: how long it is, if it is proper, if not if it's cyclic, and where the cycle is. Beware that in the cases of cyclic lists this will return circular structure as its third value. I believe that you need to use an occurs check to do this – tortoise & hare will tell you if a list is cyclic, but not where the cycle starts.

Alejandro answered 16/2, 2020 at 14:12 Comment(13)
Here's the Alexandria function: gitlab.common-lisp.net/alexandria/alexandria/blob/master/… They use tortoise-and-hare.Fogged
@Lassi: yes, I think my bias against tortoise-and-hare is unfounded in fact.Alejandro
It may be that it's not possible to do better than t&h for a general-purpose function. O(a + b) runtime and only uses two pointers. en.wikipedia.org/wiki/Cycle_detection That page also gives an alternative O(a + b) algorithm, as well as others that take more storage.Fogged
Why would it matter that the function is expensive? It would only be expensive when the application chooses to call it. The language includes printing support for circular objects which is equally expensive. The solution is an opt-in switch in the form of the *print-circle* dynamic variable which defaults to nil.Grown
@Kaz: I agree. Especially given list-length which already does all the expensive bit but just fails to have the right conventions. I've amended my answer to change 'probably' to 'perhaps' and add some other wording around it. I've always believed the 'because its expensive' thing, but I think I was just wrong.Alejandro
While correctness is much more important than speed, a general-purpose library function should also be fast, especially if there's a simple, well-known fast algorithm. Also, a fast solution can implemented in more places, making the language more reliable overall. Most of Scheme's standard list functions check for cycles and cope in a well-defined manner. People don't complain that they are slow. O(a + b) is not a slow runtime unless the list is huge.Fogged
In the printer it's a non-issue since lists that long are not human-readable to begin with, and printing a circular list to a stream probably means a bug (or bad input to the program). E.g. *print-circle* defaulting to nil causes more problems in practice (by causing inadvertent infinite loops in the REPL) than it saves meaningful cycles in well-behaved code. On a more general note, quadratic algorithms should be avoided in standard libraries whenever possible. I respect that you admit your algorithm is quadratic :) And it passes the test suite, which is what really matters.Fogged
@Lassi: yes, the important bit of my answer was really the first bit (there's no standard function that does this in CL).Alejandro
@Fogged A list processing library function that ignores the possibility of cycles is not incorrect, if it is implementing a set of requirements that call for cycles to be ignored, and if those requirement are not widely regarded as unwanted/wrong.Grown
@Lassi: having *print-circle* on often makes output radically unreadable, because it does not only control printing oc objects with cycles, but printing objects with shared structure. Sometimes this is interesting to see but often it's not.Alejandro
@Kaz: While it's not incorrect, that argument has the same form as the argument against bignums and garbage collection :) @tfb: I didn't know this about *print-circle* - thanks! So ideally we would have 3 flavors of shared-structure detection - none, cycles only, and full.Fogged
@Fogged FWIW following our discussions in this comment thread & in the older one I've added a function called classify-list to this answer which tells you all sorts of stuff about lists and things which might be lists. I just wrote this because I was interested: it doesn't have any particular value I think.Alejandro
Good work. I also couldn't resist the temptation to write a length-tail in Scheme: github.com/lassik/junkcode/blob/master/lisp/length-tail.scm The algorithm is lightly modified tortoise-and-hare from Wikipedia/Alexandria. It's like CL code but (let foo ((a init) (b init) ...) ...) makes a recursive function foo and calls it with the given initial arguments.Fogged
W
2

in addition, something slightly less verbose, than the accepted answer:

(defun improper-tail (ls)
  (do ((x ls (cdr x))
       (visited nil (cons x visited)))
      ((or (not (consp x)) (member x visited)) x)))

(defun proper-list-p (ls)
  (null (improper-tail ls)))

or just like this:

(defun proper-list-p (ls)
  (do ((x ls (cdr x))
       (visited nil (cons x visited)))
      ((or (not (consp x)) (member x visited)) (null x))))

seen to pass all the op's test assertions

Waistline answered 16/2, 2020 at 18:22 Comment(1)
(proper-listp '(1 . 2)) signals a type-error so it's the same as using the standard list-length. To detect such a dotted list, you need to additionally check for a non-cons, non-null x at each iteration in the do loop.Fogged
I
1

After our hopeless attempts with tailp, here, sth which uses the sharp-representation of circular lists :) .

With regex (to detect circular sublist)

(setf *print-circle* t)

(ql:quickload :cl-ppcre)

(defun proper-listp (lst)
  (or (null lst)                                                   ; either a `'()` or:
      (and (consp lst)                                             ; a cons
           (not (cl-ppcre::scan "#\d+=(" (princ-to-string lst))))  ; not circular
           (null (cdr (last lst))))))                              ; not a dotted list

Without regex (cannot detect circular sublists)

(defun proper-listp (lst)
  (or (null lst)                                                   ; either a `'()` or:
      (and (consp lst)                                             ; a cons
           (not (string= "#" (subseq (princ-to-string lst) 0 1)))  ; not circular
           (null (cdr (last lst))))))                              ; not a dotted list
Interflow answered 17/2, 2020 at 0:25 Comment(7)
This should work unless the current readtable has been modified (it can change the syntax of Lisp). See clhs.lisp.se/Body/v_rdtabl.htm Using (with-standard-io-syntax (princ-to-string ...)) ensures a standard readtable. However, even with standard syntax, the behavior of the Lisp printer is very complex. Hence it's best to avoid depending on precise details of its output syntax. (char= #\# (char "foo" 0)) can be used to test the first character (char will signal an error if the string's length is zero).Fogged
Sorry, my comment about the readtable is probably wrong. It should only affect the reader, not the printer. However, the printer also has many settings. with-standard-io-syntax sets both reader and printer settings to default values.Fogged
From an algorithm complexity point-of-view, this will walk through the list twice - the first time by princ-to-string so it can reach all the items, and then another time by last. A standard cycle detection algorithm (e.g. tortoise and hare) can be modified so that it checks for a circular and a dotted list both in one pass. It simply stops immediately if it finds a non-cons cdr. This is what Alexandria's proper-list-p does: gitlab.common-lisp.net/alexandria/alexandria/blob/master/…Fogged
I thought maybe also to catch the inner circular lists to rely on regex ... but okay it is just a hack ...Interflow
You're right - the sublist cycles cause a problem with this approach as well. More generally, the string processing may take more time than the list-walking, even if you walk the list twice; and Lisp has to allocate some memory for the string. The regexp parsing will probably allocate more memory, and regexp behavior is very complex in general. The reader and printer are so complex that it's usually best to avoid them for any task where they are not absolutely needed :) There is probably nothing simpler than using `list-length, and the next simplest is probably the Alexandria version.Fogged
@Fogged yeah, I agree. Just wanted to try it out :D . Nothing is better than Alexandria I guess - these are so smart guys who built that. And true, at the end, your solution with list-length might be probably the faster and simplest one ... although not 'nice/best' practice. :)Interflow
The simplest and most tested solutions are usually the best, no matter which language we are using :) Though you can learn a lot from exploring more complex ones. If you want to explore printing circular lists, you could copy the Alexandria code and modify it to count the length of the cycle, or return the cons that starts the cycle (then it would be your own better version of tailp :)Fogged
I
0

(tailp l (cdr l)) is t for circular lists but nil for non-circular lists.

Credits to @tfp and @RainerJoswig who taught me this here .

So, your function would be:

(defun proper-listp (lst)
  (or (null lst)                           ; either a `'()` or:
      (and (consp lst)                     ; a cons
           (not (tailp lst (cdr lst)))     ; not circular
           (null (cdr (last lst))))))      ; not a dotted list

By the way, I use proper-listp by purpose. Correct would be - by convetion proper-list-p. However, this name is already occupied in the CLISP implementation by SYSTEM::%PROPER-LIST-Pwhy the definition of the function raises a continuable error.

Conclusion of our discussion in the comment section:

The behavior of tailp for circular lists is undefined. Therefore this answer is wrong! Thank you @Lassi for figuring this out!

Interflow answered 16/2, 2020 at 22:21 Comment(1)
Comments are not for extended discussion; this conversation has been moved to chat.Frady

© 2022 - 2024 — McMap. All rights reserved.