I have a URL like this:
http://192.168.0.1:8080/servlet/rece
I want to parse the URL to get the values:
IP: 192.168.0.1
Port: 8080
page: /servlet/rece
How do I do that?
I have a URL like this:
http://192.168.0.1:8080/servlet/rece
I want to parse the URL to get the values:
IP: 192.168.0.1
Port: 8080
page: /servlet/rece
How do I do that?
Write a custom parser or use one of the string replace functions to replace the separator ':' and then use sscanf()
.
Personally, I steal the HTParse.c
module from the W3C (it is used in the lynx Web browser, for instance). Then, you can do things like:
strncpy(hostname, HTParse(url, "", PARSE_HOST), size)
The important thing about using a well-established and debugged library is that you do not fall into the typical traps of URL parsing (many regexps fail when the host is an IP address, for instance, specially an IPv6 one).
HTParse.c
has a number of dependencies, any chance you can explain how you can "steal" this from the project easily? Maybe back in 2009 it did not ;) –
Linkboy I wrote a simple code using sscanf, which can parse very basic URLs.
#include <stdio.h>
int main(void)
{
const char text[] = "http://192.168.0.2:8888/servlet/rece";
char ip[100];
int port = 80;
char page[100];
sscanf(text, "http://%99[^:]:%99d/%99[^\n]", ip, &port, page);
printf("ip = \"%s\"\n", ip);
printf("port = \"%d\"\n", port);
printf("page = \"%s\"\n", page);
return 0;
}
./urlparse
ip = "192.168.0.2"
port = "8888"
page = "servlet/rece"
With a regular expression if you want the easy way. Otherwise use FLEX/BISON.
You could also use a URI parsing library
May be late,...
what I have used, is - the http_parser_parse_url()
function and the required macros separated out from Joyent/HTTP parser lib - that worked well, ~600
LOC.
Libcurl now has curl_url_get()
function that can extract host, path, etc.
Example code: https://curl.haxx.se/libcurl/c/parseurl.html
/* extract host name from the parsed URL */
uc = curl_url_get(h, CURLUPART_HOST, &host, 0);
if(!uc) {
printf("Host name: %s\n", host);
curl_free(host);
}
This one has reduced size and worked excellent for me http://draft.scyphus.co.jp/lang/c/url_parser.html . Just two files (*.c, *.h).
I had to adapt code [1].
[1]Change all the function calls from http_parsed_url_free(purl) to parsed_url_free(purl)
//Rename the function called
//http_parsed_url_free(purl);
parsed_url_free(purl);
Pure sscanf()
based solution:
//Code
#include <stdio.h>
int
main (int argc, char *argv[])
{
char *uri = "http://192.168.0.1:8080/servlet/rece";
char ip_addr[12], path[100];
int port;
int uri_scan_status = sscanf(uri, "%*[^:]%*[:/]%[^:]:%d%s", ip_addr, &port, path);
printf("[info] URI scan status : %d\n", uri_scan_status);
if( uri_scan_status == 3 )
{
printf("[info] IP Address : '%s'\n", ip_addr);
printf("[info] Port: '%d'\n", port);
printf("[info] Path : '%s'\n", path);
}
return 0;
}
However, keep in mind that this solution is tailor made for [protocol_name]://[ip_address]:[port][/path]
type of URI's. For understanding more about the components present in the syntax of URI, you can head over to RFC 3986.
Now let's breakdown our tailor made format string : "%*[^:]%*[:/]%[^:]:%d%s"
%*[^:]
helps to ignore the protocol/scheme (eg. http, https, ftp, etc.)
It basically captures the string from the beginning until it encounters the :
character for the first time. And since we have used *
right after the %
character, therefore the captured string will be ignored.
%*[:/]
helps to ignore the separator that sits between the protocol and the IP address, i.e. ://
%[^:]
helps to capture the string present after the separator, until it encounters :
. And this captured string is nothing but the IP address.
:%d
helps to capture the no. sitting right after the :
character (the one which was encountered during the capturing of IP address). The no. captured over here is basically your port no.
%s
as you may know, will help you to capture the remaining string which is nothing but the path of the resource you are looking for.
This C gist could be useful. It implements a pure C solution with sscanf.
https://github.com/luismartingil/per.scripts/tree/master/c_parse_http_url
It uses
// Parsing the tmp_source char*
if (sscanf(tmp_source, "http://%99[^:]:%i/%199[^\n]", ip, &port, page) == 3) { succ_parsing = 1;}
else if (sscanf(tmp_source, "http://%99[^/]/%199[^\n]", ip, page) == 2) { succ_parsing = 1;}
else if (sscanf(tmp_source, "http://%99[^:]:%i[^\n]", ip, &port) == 2) { succ_parsing = 1;}
else if (sscanf(tmp_source, "http://%99[^\n]", ip) == 1) { succ_parsing = 1;}
(...)
I wrote this
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
typedef struct
{
const char* protocol = 0;
const char* site = 0;
const char* port = 0;
const char* path = 0;
} URL_INFO;
URL_INFO* split_url(URL_INFO* info, const char* url)
{
if (!info || !url)
return NULL;
info->protocol = strtok(strcpy((char*)malloc(strlen(url)+1), url), "://");
info->site = strstr(url, "://");
if (info->site)
{
info->site += 3;
char* site_port_path = strcpy((char*)calloc(1, strlen(info->site) + 1), info->site);
info->site = strtok(site_port_path, ":");
info->site = strtok(site_port_path, "/");
}
else
{
char* site_port_path = strcpy((char*)calloc(1, strlen(url) + 1), url);
info->site = strtok(site_port_path, ":");
info->site = strtok(site_port_path, "/");
}
char* URL = strcpy((char*)malloc(strlen(url) + 1), url);
info->port = strstr(URL + 6, ":");
char* port_path = 0;
char* port_path_copy = 0;
if (info->port && isdigit(*(port_path = (char*)info->port + 1)))
{
port_path_copy = strcpy((char*)malloc(strlen(port_path) + 1), port_path);
char * r = strtok(port_path, "/");
if (r)
info->port = r;
else
info->port = port_path;
}
else
info->port = "80";
if (port_path_copy)
info->path = port_path_copy + strlen(info->port ? info->port : "");
else
{
char* path = strstr(URL + 8, "/");
info->path = path ? path : "/";
}
int r = strcmp(info->protocol, info->site) == 0;
if (r && info->port == "80")
info->protocol = "http";
else if (r)
info->protocol = "tcp";
return info;
}
Test
int main()
{
URL_INFO info;
split_url(&info, "ftp://192.168.0.1:8080/servlet/rece");
printf("Protocol: %s\nSite: %s\nPort: %s\nPath: %s\n", info.protocol, info.site, info.port, info.path);
return 0;
}
Out
Protocol: ftp
Site: 192.168.0.1
Port: 8080
Path: /servlet/rece
Write a custom parser or use one of the string replace functions to replace the separator ':' and then use sscanf()
.
© 2022 - 2024 — McMap. All rights reserved.