writing a http sniffer
Asked Answered
C

6

2

I would like to write a program to extract the URLs of websites visited by a system (an IP address) through packet capture.. I think this URL will come in the data section ( ie not in any of the headers - ethernet / ip / tcp-udp ).. ( Such programs are sometimes referred to as http sniffers , i'm not supposed to use any available tool ). As a beginner , I've just now gone through this basic sniffer program : sniffex.c.. Can anyone please tell me in which direction i should proceed..

Cristobalcristobalite answered 15/1, 2010 at 16:38 Comment(2)
i've just edited my question , now that i understood better...Cristobalcristobalite
Sounds exactly like my old CSE 409 assignment..Licha
T
4

Note: In the info below, assume that GET also includes POST and the other HTTP methods too.

It's definitely going to be a lot more work than looking at one packet, but if you capture the entire stream you should be able to get it from the HTTP headers sent out.

Try looking at the Host header if that's provided, and also what is actually requested by the GET. The GET can be either a full URL or just a file name on the server.

Also note that this has nothing to do with getting a domain name from an IP address. If you want the domain name, you have to dig into the data.

Quick example on my machine, from Wireshark:

GET http://www.google.ca HTTP/1.1
Host: www.google.ca
{other headers follow}

Another example, not from a browser, and with only a path in the GET:

GET /ccnet/XmlStatusReport.aspx HTTP/1.1
Host: example.com

In the second example, the actual URL is http://example.com/ccnet/XmlStatusReport.aspx

Tortile answered 15/1, 2010 at 17:8 Comment(4)
Hmmm , so i first need to parse the payload section , look for GET and Host portions and extract the URLs then.. Is that correct..Cristobalcristobalite
Not just GET. There are other http commands as well. You need to start by reading some documentation.Tbar
Wireshark being an opensource tool , will I be able to get any code handy for this.. ( I dont have much time to write it as this is just a part of my project - which involves the URLs thus captured )Cristobalcristobalite
probably. I haven't perused the wireshark source, but it'll be in there somewhere.Tortile
T
4

No, there is not enough information. A single IP can correspond to any number of domain names, and each of those domains could have literally an infinite number of URLs.

However, look at gethostbyaddr(3) to see how to do a reverse dns lookup on the ip to at least get the canonical name for that ip.

Update: as you've edited the question, @aehiilrs has a much better answer.

Tbar answered 15/1, 2010 at 16:40 Comment(2)
You can't. You might as well ask how I can tell a person's hair colour from their email address.Tbar
What if it's something like [email protected]?Tortile
T
4

Note: In the info below, assume that GET also includes POST and the other HTTP methods too.

It's definitely going to be a lot more work than looking at one packet, but if you capture the entire stream you should be able to get it from the HTTP headers sent out.

Try looking at the Host header if that's provided, and also what is actually requested by the GET. The GET can be either a full URL or just a file name on the server.

Also note that this has nothing to do with getting a domain name from an IP address. If you want the domain name, you have to dig into the data.

Quick example on my machine, from Wireshark:

GET http://www.google.ca HTTP/1.1
Host: www.google.ca
{other headers follow}

Another example, not from a browser, and with only a path in the GET:

GET /ccnet/XmlStatusReport.aspx HTTP/1.1
Host: example.com

In the second example, the actual URL is http://example.com/ccnet/XmlStatusReport.aspx

Tortile answered 15/1, 2010 at 17:8 Comment(4)
Hmmm , so i first need to parse the payload section , look for GET and Host portions and extract the URLs then.. Is that correct..Cristobalcristobalite
Not just GET. There are other http commands as well. You need to start by reading some documentation.Tbar
Wireshark being an opensource tool , will I be able to get any code handy for this.. ( I dont have much time to write it as this is just a part of my project - which involves the URLs thus captured )Cristobalcristobalite
probably. I haven't perused the wireshark source, but it'll be in there somewhere.Tortile
T
0

What you might want is a reverse DNS lookup. Call gethostbyaddr for that.

Tackett answered 15/1, 2010 at 16:40 Comment(0)
P
0

If you are using Linux, you can add a filter in iptables to add a new rule which looks for packets containing HTTP get requests and get the url.

So rule will look like this.

For each packet going on port 80 from localhost -> check if the packet contains GET request -> retrieve the url and save it

This approach should work in all cases, even for HTTPS headers.

Prewitt answered 15/1, 2010 at 17:8 Comment(2)
Ya , I am using Linux.. By rules , are you referring to conditional blocks within the program ..Cristobalcristobalite
He's actually referring to iptables rules, iptables being the built-in firewall.Tortile
S
0

Have a look at PasTmon. http://pastmon.sourceforge.net

Sashasashay answered 16/1, 2010 at 0:0 Comment(0)
L
0

I was researching on something similar and came across this. Hope this could be a good start if you are using linux - justniffer.

http://justniffer.sourceforge.net/

There is also a nice http traffic grab python script that would help if you are looking to get information from HTTP requests.

Liber answered 8/7, 2011 at 3:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.