What's the correct regexp pattern to match a VMS filename?
Asked Answered
W

3

4

The documentation at http://h71000.www7.hp.com/doc/731final/documentation/pdf/ovms_731_file_app.pdf (section 5-1) says the filename should look like this:

node::device:[root.][directory-name]filename.type;version

Most of them are optional (like node, device, version) - not sure which ones and how to correctly write this in a regexp, (including the directory name):

DISK1:[MYROOT.][MYDIR]FILE.DAT

DISK1:[MYDIR]FILE.DAT

[MYDIR]FILE.DAT

FILE.DAT;10

NODE::DISK5:[REMOTE.ACCESS]FILE.DAT
Warrantable answered 16/12, 2010 at 19:51 Comment(6)
Is the text in the "[]" brackets in the example above optional as well? Are "MYROOT.", "MYDIR" and "REMOTE.ACCESS" aliases for something or are those string literals?Vitrescence
As far as I can tell, the square brackets are litterals and not optional. I haven't found the exact specifications for the directory names and characters that are allowed. My understanding from the spec is that the directory names are identified with STRING dot STRINGWarrantable
Also note that each directory name and the filename are limited to 9 characters or less. Node and device have length limits, too (but I don't remember what they are).Outage
@Loadmaster, I don't believe the limit is as small as that; I can remember creating files with 31-character names.Misfortune
DECnet Phase IV node names are limited, but Phase V node names can be FQDN's. On an ODS-2 volume, filenames and extensions are limited to 39 characters (upper case and limited special characters). But on an ODS-5 volume (extended filename support), the file names are viritualy unlimited, case preserving and allow special charcters, including whitespace...Barbey
You don't plan on supporting UIC format directories: [100,377]?Bakelite
A
5

See the documentation and source for the VMS::Filespec Perl module.

Angi answered 16/12, 2010 at 20:36 Comment(1)
Thanks - this looks pretty good ([^:]*::)?([^:]*:)?([^>]]*[>]])?([^.;]*)(\.?[^.;]*)([.;]?\d*)Warrantable
S
4

From wikipedia, the full form is actually a bit more than that:

NODE"accountname password"::device:[directory.subdirectory]filename.type;ver

This one took a while, but here is an expression that should accept all valid variations, and place the components into capture groups.

(?:(?:(?:([^\s:\[\]]+)(?:"([^\s"]+) ([^\s"]+)")?::)?([^\s:\[\]]+):)?\[([^\s:\[\]]+)\])?([^\s:\[\]\.]+)(\.[^\s:\[\];]+)?(;\d+)?

Also, from what I can tell, your example of

DISK1:[MYROOT.][MYDIR]FILE.DAT

is not a valid name. I believe only one pair of brackets are allowed. I hope this helps!

Splendiferous answered 16/12, 2010 at 21:16 Comment(1)
I suspect that is an example of a rooted directory.Braunite
A
1

You could probably come up with a single complicated regex for this, but it will be much easier to read your code if you work your way from left to right stripping off each section if it is there. The following is some Python code that does just that:

lines = ["DISK1:[MYROOT.][MYDIR]FILE.DAT", "DISK1:[MYDIR]FILE.DAT", "[MYDIR]FILE.DAT", "FILE.DAT;10", "NODE::DISK5:[REMOTE.ACCESS]FILE.DAT"]
node_re = "(\w+)::"
device_re = "(\w+):"
root_re = "\[(\w+)\.]"
dir_re = "\[(\w+)]"
file_re = "(\w+)\."
type_re = "(\w+)"
version_re = ";(.*)"
re_dict = {"node": node_re, "device": device_re, "root": root_re, "directory": dir_re, "file": file_re, "type": type_re, "version": version_re}
order = ["node", "device", "root", "directory", "file", "type", "version"]
for line in lines:
    i = 0
    print line
    for item in order:
        m = re.search(re_dict[item], line[i:])
        if m is not None:
            print "  " + item + ": " + m.group(1)
            i += len(m.group(0))

and the output is

DISK1:[MYROOT.][MYDIR]FILE.DAT
  device: DISK1
  root: MYROOT
  directory: MYDIR
  file: FILE
  type: DAT
DISK1:[MYDIR]FILE.DAT
  device: DISK1
  directory: MYDIR
  file: FILE
  type: DAT
[MYDIR]FILE.DAT
  directory: MYDIR
  file: FILE
  type: DAT
FILE.DAT;10
  file: FILE
  type: DAT
  version: 10
NODE::DISK5:[REMOTE.ACCESS]FILE.DAT
  node: NODE
  device: DISK5
  directory: REMOTE.ACCESS
  file: FILE
  type: DAT
Anatropous answered 16/12, 2010 at 20:47 Comment(2)
sorry for not being clear - I am using the regexp in a lex fileWarrantable
No problem, my answer was less about the language and more about the strategy. I would hate having ([^:]*::)?([^:]*:)?([^>]]*[>]])?([^.;]*)(\.?[^.;]*)([.;]?\d*) anywhere in my code, sometimes clarity is better than cleverness.Anatropous

© 2022 - 2024 — McMap. All rights reserved.