Can there be a (Java 7) FileSystem for which a Path .isAbsolute() but has a null root?
Asked Answered
K

3

35

The javadoc for .isAbsolute() says:

Tells whether or not this path is absolute.
An absolute path is complete in that it doesn't need to be combined with other path information in order to locate a file.

Returns: true if, and only if, this path is absolute

The javadoc for .getRoot() says:

Returns the root component of this path as a Path object, or null if this path does not have a root component.

Returns: a path representing the root component of this path, or null

OK, so, I am at a loss here; are there any filesystems out there for which a path may be absolute without a root at all?


EDIT: note that there CAN be paths which have a root but are NOT absolute. For instance, these on Windows systems:

  • C:foo;
  • \foo\bar.

But I am asking for the reverse here: no root and absolute.

Kitsch answered 29/11, 2014 at 12:24 Comment(6)
I don't know of a real file system that has such behavior, but FileSystem is a public abstract class. You may want to extend it (and the Path instances it returns) for some reason to do that.Torey
@SotiriosDelimanolis I know all this, I am currently developing a FileSystem helper library and explore all the nooks and crannies of the API ;)Kitsch
Down the rabbit hole you go.Torey
Apologies, I must be misunderstanding something, but how could you possibly have an absolute path without a root?Cancroid
@Rudi that is exactly what I'm asking; the javadoc of java.nio.file mentions nowhere that this is NOT possible. They must have had a reason to do it this way, right? Or was it only to satisfy Windows and its "broken" filesystem model?Kitsch
@Kitsch orcale doc here docs.oracle.com/javase/tutorial/essential/io/path.html#relative states that a absolute path always contains the root element. So at least there they mention itStow
G
12

The Definition

The interface states the following about roots:

A root component, that identifies a file system hierarchy, may also be present.

So as you see, the comment seems to imply that roots are used for file system hierarchies. Now we have to reason about what an absolute path is. The interface tells us the following:

An absolute path is complete in that it doesn't need to be combined with other path information in order to locate a file.

So, as you see, there is no word about roots in the definition about absolute paths. The only restriction is that we have to be able to locate the file without further information.

Hierarchical File Systems

Most file system are hierarchical, i.e., they are trees (or graphs if we consider links) or forests. The root in a tree is a node that is not the child of another node (excluding links). Windows file systems are, for example, forests, as they have many roots (C:,D:,...). Linux has usually only one root which is /. Roots are very important as without them it would be hard to start locating a file. In such file systems, you can usually rely on each absolute path having a root.

Non-Hierarchical File Systems

As long as we have a hierarchical file system, we can anticipate a root in an absolute path, but what if we don't have one? Then, an absolute path might not contain a root.

An example that comes to my mind: Distributed file systems like Chord. These are often not hierarchical so the meaning of roots is usually undefined. Instead, a file hash identifies a file (SHA-1 in Chord). So a valid Chord path might look like this:

cf23df2207d99a74fbe169e3eba035e633b65d94

This is an absolute path. One can retrieve the associated file without further information, so the path is absolute. However, I see no root. We could define the whole hash to be its own root (then each file would be its own root), but nobody can guarantee that every person that implements a Chord file system will agree to this. So there might be reasonable implementations that do not treat these hashes as roots. In such a file system, each path would be absolute, but none would contain a root.

If I would implement a non-hierarchical file system, I would always return null as root, as IMHO a root is not a defined concept in a non-hierarchical file system. Since I think like this, other devs might think so as well. Consequently, you may not assume that every absolute path has a root.

Note that distributed file systems are quite common in many areas, so this is not merely a corner case that will never be implemented. I think you have to anticipate it.

Conclusion

  1. The interface does not mandate that each absolute path must have a root
  2. There are reasonable file systems in which having no root makes sense
  3. An Oracle tutorial as mentioned in the comments is no contract for the interface. You should not rely on this

So there will be people implementing file systems without roots; you should anticipate this.

Godship answered 15/12, 2014 at 15:56 Comment(7)
good catch with hash paths :) really interesting and i totally agree a filesystem with only absolute paths can be considered having no root.Stow
Just to play the devils advocate, wouldn't each non-heirarchical file be it's own root?Cancroid
@Rudi: Sure, you could chose to define it this way. But you could also chose to define it differently. As the interface does not mandate any semantics, there will surely be people writing file systems using either of these definitions.Godship
Surely it doesn't matter how either of us or the file system define it. Isn't it the JVMs definition that counts?Cancroid
@Rudi: What counts is what is written in the contract of the interface (i.e., the javadoc of the class). Right now, there is not enough info in it. If Oracle wants to mandate this, they need to patch the javadoc. But then it will be too late and a lot of non-conforming implementations will already exist.Godship
No disagreement there.Cancroid
@Godship I considered distributed systems, but basically what I do in my implementations is that if you have foo://[hashcode or id] you can call foo:// the root and simply note it's not accessible, in which case the client will still think there is a root although the distributed system / server will think it doesn't exist. It's the same as http://foo/bar where http://foo/ doesn't exist. You might even state that http is the root in this case - but that's contrary to the Java interface I suppose... Eventually it'll boil down to the definition of 'exists'.Shreeves
S
16

Well, there are some obscure things with file systems. I made a few enterprise search crawlers, and somewhere down the road you will notice some strange file system things going on with paths. BTW: these are all implementations of custom (overridden) file systems, so no standard ones, and you can definitely argue for hours on what of those things are good ideas and what are not... Still, I don't think you'll encounter any of these cases with the standard file systems.

Here goes a few examples of strange things:

Files in container file systems (OLE2, ZIP, TAR, etc): c:\foo\bar\blah.zip\myfile

In this case, you can decide what item is 'the root':

  • 'c:\' ? That's not the root of the zip file containing the file...
  • 'c:\foo\bar\blah.zip' ? It might be the root of the file, but by doing that it might break your application.
  • 'blah.zip' ? Might be the root of the zip file - but regardless this might probably break your application as well.
  • '/' ? As in the '/' folder in the zip file? It might be possible, but that will give you a serious headache in the long run.

'graph' like structures like HTTP:

  • The fact that you have '/foo/bar' doesn't imply that '/foo' or even '/' exists. (Suppose that meets your criterium). The only thing you can do is walk the graph...
  • Note that protocols like WebDav are HTTP based and can give you a similar headache. I have some examples here of custom webdav file systems that don't have a 'root' folder, but do have absolute paths.

Still, you can argue that the top-most common path (if that exists...) that you can reach is the root or that there is a root - but you simply cannot reach it (even though it's really non-existent).

Samba/netbios

If you see a complete Samba (windows networking) network as a single file system, then you basically end up with a 'root' containing all workgroups, a workgroup containing all computers, a computer containing all shares, and then the files in the share.

However... the root and the workgroups don't really exist. They are things that are made up from a broadcast protocol (which is also quite unreliable if you have a network of over 1000 computers). From a crawler perspective, it makes all the sense in the world to treat the 'root' and 'workgroup' directories completely different from the (reliable) rest.

However

These scenario's describe only paths where the root is unreachable, unreliable or something else. Theoretically, I suppose that in any URL you can think of, there is always a root. After all, it's made up as a string of characters defining a hierarchy, which therefore by definition has a start.

Shreeves answered 15/12, 2014 at 15:47 Comment(2)
Well, since I am developing custom filesystems myself over this API, I am interested... Your arguments make sense. I have done three already: Dropbox, box.com and FTP. But none of these "apply", so to speak.Kitsch
@Kitsch Yes, that's what I did some time ago as well, including for dozens of other protocols and file formats. Basically you have hierarchical file systems - pretty much all of them will work and fit the criteria - and graph file systems - all of them will fail miserably with this (f.ex. social media graphs, http, etc). Hierarchical file systems are either sequential (f.ex. tar.bz2) or random access (f.ex. OLE2, ZIP, your C drive, dropbox, etc). Oh and obviously there are combinations. Design it with that in your mind, and you'll be fine.Shreeves
C
14

A Question of Semantics

From my understanding of the subject, an absolute path can only be absolute if it can be traced back to it's root. As such there should never be an absolute path without a root. Ultimately this just comes down to semantics and although we can find definitions that define the absolute path such (egs below);

The only real question left after this point is whether the definition by the Java API follows suit. The only place I can find reference to the definition of an absolute path (with reference to the root element) from an official Oracle source is from inside the official Java tutorial. The official Java tutorials say

An absolute path always contains the root element

If this statement is to be believed, then no file system (no matter how obscure) can contain a Path that the Java API will consider absolute, unless it also considers it to contain a root.

You could argue that in some non-heirarchical file systems you might fall into some issues deciding whether a file can be it's own root. However, by this definition in the Path API (emphasis mine), a path should not represent a non-hierarchical element;

A Path represents a path that is hierarchical and composed of a sequence of directory and file name elements

Cancroid answered 15/12, 2014 at 15:50 Comment(3)
@Kitsch - You could easily get into semantics on this with varying definitions of absolute, but I think the root question (no pun intended) is whether the Java API could ever consider a path absolute if it doesn't contain a root. In which case we can only trust the Oracle sources, in which case the answer is no (even if it is a fairly tenuous mention in the tutorials).Cancroid
@Rudi: So you rely on the fact that every person that ever writes a reasonable file system has read the tutorial and follows it. A tutorial is no contract. Relying on this in a library feels quite brittle.Godship
@Godship - Yes, but surely (and correct me if I am wrong), if a custom OS running a custom file system is to run a JVM it will have to use an existing JVM with a predetermined definition of absolute?Cancroid
G
12

The Definition

The interface states the following about roots:

A root component, that identifies a file system hierarchy, may also be present.

So as you see, the comment seems to imply that roots are used for file system hierarchies. Now we have to reason about what an absolute path is. The interface tells us the following:

An absolute path is complete in that it doesn't need to be combined with other path information in order to locate a file.

So, as you see, there is no word about roots in the definition about absolute paths. The only restriction is that we have to be able to locate the file without further information.

Hierarchical File Systems

Most file system are hierarchical, i.e., they are trees (or graphs if we consider links) or forests. The root in a tree is a node that is not the child of another node (excluding links). Windows file systems are, for example, forests, as they have many roots (C:,D:,...). Linux has usually only one root which is /. Roots are very important as without them it would be hard to start locating a file. In such file systems, you can usually rely on each absolute path having a root.

Non-Hierarchical File Systems

As long as we have a hierarchical file system, we can anticipate a root in an absolute path, but what if we don't have one? Then, an absolute path might not contain a root.

An example that comes to my mind: Distributed file systems like Chord. These are often not hierarchical so the meaning of roots is usually undefined. Instead, a file hash identifies a file (SHA-1 in Chord). So a valid Chord path might look like this:

cf23df2207d99a74fbe169e3eba035e633b65d94

This is an absolute path. One can retrieve the associated file without further information, so the path is absolute. However, I see no root. We could define the whole hash to be its own root (then each file would be its own root), but nobody can guarantee that every person that implements a Chord file system will agree to this. So there might be reasonable implementations that do not treat these hashes as roots. In such a file system, each path would be absolute, but none would contain a root.

If I would implement a non-hierarchical file system, I would always return null as root, as IMHO a root is not a defined concept in a non-hierarchical file system. Since I think like this, other devs might think so as well. Consequently, you may not assume that every absolute path has a root.

Note that distributed file systems are quite common in many areas, so this is not merely a corner case that will never be implemented. I think you have to anticipate it.

Conclusion

  1. The interface does not mandate that each absolute path must have a root
  2. There are reasonable file systems in which having no root makes sense
  3. An Oracle tutorial as mentioned in the comments is no contract for the interface. You should not rely on this

So there will be people implementing file systems without roots; you should anticipate this.

Godship answered 15/12, 2014 at 15:56 Comment(7)
good catch with hash paths :) really interesting and i totally agree a filesystem with only absolute paths can be considered having no root.Stow
Just to play the devils advocate, wouldn't each non-heirarchical file be it's own root?Cancroid
@Rudi: Sure, you could chose to define it this way. But you could also chose to define it differently. As the interface does not mandate any semantics, there will surely be people writing file systems using either of these definitions.Godship
Surely it doesn't matter how either of us or the file system define it. Isn't it the JVMs definition that counts?Cancroid
@Rudi: What counts is what is written in the contract of the interface (i.e., the javadoc of the class). Right now, there is not enough info in it. If Oracle wants to mandate this, they need to patch the javadoc. But then it will be too late and a lot of non-conforming implementations will already exist.Godship
No disagreement there.Cancroid
@Godship I considered distributed systems, but basically what I do in my implementations is that if you have foo://[hashcode or id] you can call foo:// the root and simply note it's not accessible, in which case the client will still think there is a root although the distributed system / server will think it doesn't exist. It's the same as http://foo/bar where http://foo/ doesn't exist. You might even state that http is the root in this case - but that's contrary to the Java interface I suppose... Eventually it'll boil down to the definition of 'exists'.Shreeves

© 2022 - 2024 — McMap. All rights reserved.