Is it possible to get a list of files under a directory of a website? How?

Asked 24/10, 2010 at 20:0 Answered 27/3 at 18:52

Say I have a website www.example.com. Under the website directory there is a page secret.html. It can be accessed directly like www.example.com/secret.html, but there are no pages that link to it. Is it possible to discover this page, or will it remain hidden from outside world?

Resemblance answered 24/10, 2010 at 20:0 Comment(2)

"Hidden" may not be the best word to use, especially when discussing it with any possible business owners/users/etc. "Unadvertised" perhaps? – Begun 24/10, 2010 at 20:5

OP means orphan – Functional 10/6 at 12:59

If you have directory listing disabled in your webserver, then the only way somebody will find it is by guessing or by finding a link to it.

That said, I've seen hacking scripts attempt to "guess" a whole bunch of these common names. secret.html would probably be in such a guess list.

The more reasonable solution is to restrict access using a username/password via a htaccess file (for apache) or the equivalent setting for whatever webserver you're using.

Destructor answered 24/10, 2010 at 20:5 Comment(1)

to enable/disable directory browsing in IIS: technet.microsoft.com/en-us/library/cc731109(WS.10).aspx – Resemblance 25/10, 2010 at 19:22

There are only two ways to find a web page: through a link or by listing the directory.

Usually, web servers disable directory listing, so if there is really no link to the page, then it cannot be found.

BUT: information about the page may get out in ways you don't expect. For example, if a user with Google Toolbar visits your page, then Google may know about the page, and it can appear in its index. That will be a link to your page.

Ardra answered 24/10, 2010 at 20:3 Comment(1)

The Google Toolbar seems to help realy well. You can then use a advanced Google search query to search for files. In my case I wanted to know if there where more pdf files on a specific website. The above remark about the toolbar triggered me to search for: "site:<URL> filetype:pdf". Voila, I get all pdf's :) – Colangelo 11/12, 2019 at 20:52

Yes, you can, but you need a few tools first. You need to know a little about basic coding, FTP clients, port scanners and brute force tools, if it has a .htaccess file.

If not just try tgp.linkurl.htm or html, ie default.html, www/home/siteurl/web/, or wap /index/ default /includes/ main/ files/ images/ pics/ vids/, could be possible file locations on the server, so try all of them so www/home/siteurl/web/includes/.htaccess or default.html. You'll hit a file after a few tries then work off that. Yahoo has a site file viewer too: you can try to scan sites file indexes.

Alternatively, try brutus aet, trin00, trinity.x, or whiteshark airtool to crack the site's FTP login (but it's illegal and I do not condone that).

Oppression answered 20/6, 2014 at 2:53 Comment(0)

DirBuster is such a hacking script that guesses a bunch of common names as nsanders had mentioned. It literally brute forces lists of common words and file endings (.html, .php) and over time figures out the directory structure of such sites, this could discover the page as you described but would also discover many others.

Whaley answered 17/10, 2019 at 2:17 Comment(2)

Please, try to answer the question Is it possible to get a list of files under a directory of a website? How? and no to just describe a commercial solution. Try to understand that it seems advertisement. – Virtuoso 17/10, 2019 at 2:59

Regarding the question itself, there are a few software solutions that are designed to "read" a domain and search on it in order to find the root, extensions... This kind of software solution are called spider: A spider is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index – Virtuoso 17/10, 2019 at 3:2

Any crawler or spider will read your index.htm or equivalent, that is exposed to the web, they will read the source code for that page, and find everything that is associated to that webpage and contains subdirectories. If they find a "contact us" button, there may be is included the path to the webpage or php that deal with the contact-us action, so they now have one more subdirectory/folder name to crawl and dig more. But even so, if that folder has a index.htm or equivalent file, it will not list all the files in such folder.

If by mistake, the programmer never included an index.htm file in such folder, then all the files will be listed on your computer screen, and also for the crawler/spider to keep digging. But, if you created a folder www.yoursite.com/nombresinistro75crazyragazzo19/ and put several files in there, and never published any button or never exposed that folder address anywhere in the net, keeping only in your head, chances are that nobody ever will find that path, with crawler or spider, for more sophisticated it can be.

Except, of course, if they can enter your FTP or access your site control panel.

Unction answered 19/2, 2020 at 20:42 Comment(0)

If a website's directory does NOT have an "index...." file, AND .htaccess has NOT been used to block access to the directory itself, then Apache will create an "index of" page for that directory. You can save that page, and its icons, using "Save page as..." along with the "Web page, complete" option (Firefox example). If you own the website, temporarily rename any "index...." file, and reference the directory locally. Then restore your "index...." file.

Brodsky answered 7/9, 2013 at 5:26 Comment(0)

First of all you have to add this line in the header your html file which you to hide from outside worlds including all search engines/bots.

<meta name="robots" content="noindex, nofollow" />

Then create robots.txt file on your website root and add these lines it also block access to all search bots, but if someone access robots.txt file then they can the filename you Disallow. so it all depends on you add it to robot.txt or not.

User-agent: *
Disallow:  /yourFileNametoHide.html

But here you can also block the file access even if anyone know the name of your file from robot.txt file Simply add these lines to your .htaccess file, Replace xxx.xxx with first two sections of your IP address if you have dynamic ip and it change so only you can access the file. and if your IP address is always same static then you can simple replace it with your static IP address.

Options All -Indexes

    <Files "YourFileNametoBlock.html">
            Order Deny,Allow
            Deny from all 
            Allow from localhost
            Allow from 127.0.0.1
            Allow from xxx.xxx.0.0/15
            Allow from xxx.xxx.*.*
            Options +Indexes                    
    </Files>

and if you want deny file from all including yourself. then simply add this to .htaccess file

Options All -Indexes

    <Files "YourFileNametoBlock.html">
                Order Deny,Allow  
                Deny from all       
     </Files>

Vibratile answered 27/3 at 18:52 Comment(1)

Note that noindex and nofollow only work if the bots actually listen to that. There is nothing saying they have to remove the page from their index, even if you put that on the page. – Mcneil 27/3 at 20:15

Recommended topics

Hot tags