I'm writing a little web crawler, and a lot of the links on sites I'm crawling are relative (so they're /robots.txt
, for example). How do I convert these relative URLs to absolute URLs (so /robots.txt
=> http://google.com/robots.txt
)? Does Go have a built-in way to do this?
Convert relative to absolute URLs in Go
Asked Answered
Yes, the standard library can do this with the net/url
package. Example (from the standard library):
package main
import (
"fmt"
"log"
"net/url"
)
func main() {
u, err := url.Parse("../../..//search?q=dotnet")
if err != nil {
log.Fatal(err)
}
base, err := url.Parse("http://example.com/directory/")
if err != nil {
log.Fatal(err)
}
fmt.Println(base.ResolveReference(u))
}
Notice that you only need to parse the absolute URL once and then you can reuse it over and over.
Thank you @Not_a_Golfer. Great idea. –
Bracket
On top of @Not_a_Golfer's solution.
You can also use base
URL's Parse
method to provide a relative or absolute URL.
package main
import (
"fmt"
"log"
"net/url"
)
func main() {
// parse only base url
base, err := url.Parse("http://example.com/directory/")
if err != nil {
log.Fatal(err)
}
// and then use it to parse relative URLs
u, err := base.Parse("../../..//search?q=dotnet")
if err != nil {
log.Fatal(err)
}
fmt.Println(u.String())
}
Try it on Go Playground.
I think you are looking for ResolveReference
method.
import (
"fmt"
"log"
"net/url"
)
func main() {
u, err := url.Parse("../../..//search?q=dotnet")
if err != nil {
log.Fatal(err)
}
base, err := url.Parse("http://example.com/directory/")
if err != nil {
log.Fatal(err)
}
fmt.Println(base.ResolveReference(u))
}
// gives: http://example.com/search?q=dotnet
I use it for my crawler as well and works like a charm!
© 2022 - 2024 — McMap. All rights reserved.