Pseudorandom directory tree generation?
Asked Answered
A

3

6

I'm trying to write a program which will pseudorandomly autogenerate (based on a seed value so I can re-run the same test more than once) a growing directory structure consisting of files. (this is to stress test a source control database installation)

I was wondering if any of you were aware of something similar to the quasirandom "space-filling" sequences (e.g. van der Corput sequences or Halton sequences) that might work here.

edit: Or a fractal algorithm. This sounds suspiciously like a fractal algorithm.


edit 2: Never mind, I think I figured out the obvious solution, start with an empty tree, and just use sequential outputs of a pseudorandom generator to deterministically (based on the generated number and the state of the tree generated so far) do one of N actions, e.g. make a new subdirectory, add a new file, rename a file, delete a file, etc.

I want to do it this way rather than just sequentially dump files into a folder structure, because we're running into a situation where we are having some problems with large #s of files, and are not sure exactly what the cause is. (tree depth, # of renames, # of deletes, etc.)

It's not just 1 fixed tree I need to generate, the use strategy is: grow the tree structure a little bit, evaluate some performance statistics, grow the tree structure a little more, evaluate some performance statistics, etc.

Anstus answered 5/2, 2009 at 19:58 Comment(2)
If you get an answer, make sure only to use it for the power of good. Sounds like a fun problem to solve.Leman
"Do you use your powers for good, or for awesome?"Percyperdido
D
1

As you mention in your second edit, I would probably implement the whole thing as a file tree traversal, with the PRNG deciding "change to directory", "create directory", "move up one level", "create file", "delete file" and have another value to determine what file to delete, what directory to change to and to generate names for files and directories.

I used a similar method to stress-test a workflow server I wrote (though I didn't need to keep track of where work items were, just needed to randomly pick one to operate on).

Daytoday answered 5/2, 2009 at 20:39 Comment(1)
That's pretty much what I decided to do. In other words make it a finite state machine (almost a cellular automaton)Anstus
O
2

If this is just for testing, what is wrong with some simple, naive generation algorithm? Like, generate a random (1-10) amount of subdirectories, generate names for them, then for each directory recursively generate subdirectories and some amount of files.

This is easily customizable and you can control the seed for rand. For funkier needs, the distribution of the amounts of files/directories can be non-linear, but something that suits your needs more.

Sounds something that can be whipped up in half an hour and get done with. I fail to see a need for something mathemathical or complex. Unless this is just for fun, of course :-)

Orpah answered 5/2, 2009 at 20:12 Comment(0)
W
1

This is a set of different problems which makes it a fun puzzle.

First we have the pseudorandom number generator. There is a lot of stuff available. I only expect a function that creates a number in the range 0..n-1.

Then we have an algorithm to determine the number of subnodes on a single node. It is tempting to use a linear function but that is not a fair representation to reality. So you can create the following function:

randomsize() {
  int n = Random(0,10);
  if (n<10) return n;

  return Random(0,9) + 10 * random;
}

This function produces small numbers. Most will be in the range 0..9 but the top is virtually endless. If you want to have bigger numbers you could also use a bigger threshold

randomsize() {
  int n = Random(0,100);
  if (n<10) return n;

  return Random(0,9) + 10 * random;
}

The last problem is how to create a tree. This is rather simple. But you should keep in mind that the algorith has to end. So you need to do one of the following:

  • use a max depth
  • decrement the generated number based on the nesting level
  • determine the number of leaves as a percentage of the total subnodes. This percentage should increment at higher levels (10-50 at first level, 20-60 at second.. 50-100 at fifth, 60-100 at sixth, until 90-100 at nineth and higher.

Ofcourse you can tweak the parameters to create your required tree.

Watchful answered 5/2, 2009 at 20:28 Comment(0)
D
1

As you mention in your second edit, I would probably implement the whole thing as a file tree traversal, with the PRNG deciding "change to directory", "create directory", "move up one level", "create file", "delete file" and have another value to determine what file to delete, what directory to change to and to generate names for files and directories.

I used a similar method to stress-test a workflow server I wrote (though I didn't need to keep track of where work items were, just needed to randomly pick one to operate on).

Daytoday answered 5/2, 2009 at 20:39 Comment(1)
That's pretty much what I decided to do. In other words make it a finite state machine (almost a cellular automaton)Anstus

© 2022 - 2024 — McMap. All rights reserved.