How to construct relative url, given two absolute urls in Python
Asked Answered
Q

3

13

Is there a builtin function to get url like this: ../images.html given a base url like this: http://www.example.com/faq/index.html and a target url such as http://www.example.com/images.html

I checked urlparse module. What I want is counterpart of the urljoin() function.

Question answered 19/9, 2011 at 10:35 Comment(1)
do you mean something like wget --convert-links?Sori
T
10

You could use urlparse.urlparse to find the paths, and the posixpath version of os.path.relname to find the relative path.

(Warning: This works for Linux, but may not for Windows):

import urlparse
import sys
import posixpath

def relurl(target,base):
    base=urlparse.urlparse(base)
    target=urlparse.urlparse(target)
    if base.netloc != target.netloc:
        raise ValueError('target and base netlocs do not match')
    base_dir='.'+posixpath.dirname(base.path)
    target='.'+target.path
    return posixpath.relpath(target,start=base_dir)

tests=[
    ('http://www.example.com/images.html','http://www.example.com/faq/index.html','../images.html'),
    ('http://google.com','http://google.com','.'),
    ('http://google.com','http://google.com/','.'),
    ('http://google.com/','http://google.com','.'),
    ('http://google.com/','http://google.com/','.'), 
    ('http://google.com/index.html','http://google.com/','index.html'),
    ('http://google.com/index.html','http://google.com/index.html','index.html'), 
    ]

for target,base,answer in tests:
    try:
        result=relurl(target,base)
    except ValueError as err:
        print('{t!r},{b!r} --> {e}'.format(t=target,b=base,e=err))
    else:
        if result==answer:
            print('{t!r},{b!r} --> PASS'.format(t=target,b=base))
        else:
            print('{t!r},{b!r} --> {r!r} != {a!r}'.format(
                t=target,b=base,r=result,a=answer))
Tekla answered 19/9, 2011 at 10:48 Comment(10)
doesn't this depend on current operating system? I am getting ..\\images.html in win 7.Question
@yasar11732: use posixpath.relpath()Sori
just set up a linux virtual machine that exposes an xmlrpc method that returns you the correct result of the function.. :P :) [ok, I didn't thought of different path separators, now I'm trying to find another way to do that//]Burlington
@J.F. Sebastian: looks like posixpath is only available on UNIX.. same problem as above: docs.python.org/library/undoc.html?highlight=posixpathBurlington
@redShadow: it is available everywhere, but on Unix it also called os.path. On Windows you must call it posixpath.Sori
@J.F.S. Yep - just tried that, you are right. The os module imports (posixpath|ntpath|...) as path depending on the current platform -- or sys.builtin_module_names, but all the implementations are still available. Didn't know that, +1.Burlington
posixpath.relpath() calls posixpath.abspath() which calls os.getcwd() It leads to incorrect results paste.pocoo.org/show/wwrINjNV74wL6pSMBQFB on Windows. It might be better to copy relpath() from posixpath.py and remove abspath() calls due to relurl() expects absolute urls.Sori
I'm sorry for the confusion but my example is incorrect. It passes 'images.html' instead of '/images.html'. Relative path won't work even on Unix.. So posixpath.relpath() should be used everywhere with the assertion that target.startswith('/') and base_dir.startswith('/') (it is not always true e.g., url='http://google.com'target='').Sori
@J.F. Sebastian: Thank you for your help. Instead of asserting startswith('/') I stumbled upon adding '.' in front of base_dir and target. This is the only way I've found to pass all tests on linux (see above). I'm giving up on supporting Windows since I don't have a way to test code there.Tekla
All tests pass on win7 with python 2.7Question
B
5

The first solutions that comes to mind is:

>>> os.path.relpath('/images.html', os.path.dirname('/faq/index.html'))
'../images.html'

Of course, this requires URL parsing -> domain name comparison (!!) -> path rewriting if that's the case -> re-adding query and fragment.

Edit: a more complete version

import urlparse
import posixpath

def relative_url(destination, source):
    u_dest = urlparse.urlsplit(destination)
    u_src = urlparse.urlsplit(source)

    _uc1 = urlparse.urlunsplit(u_dest[:2]+tuple('' for i in range(3)))
    _uc2 = urlparse.urlunsplit(u_src[:2]+tuple('' for i in range(3)))

    if _uc1 != _uc2:
        ## This is a different domain
        return destination

    _relpath = posixpath.relpath(u_dest.path, posixpath.dirname(u_src.path))

    return urlparse.urlunsplit(('', '', _relpath, u_dest.query, u_dest.fragment)

Then

>>> relative_url('http://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'../images.html'
>>> relative_url('http://www.example.com/images.html?my=query&string=here#fragment', 'http://www.example.com/faq/index.html')
'../images.html?my=query&string=here#fragment'
>>> relative_url('http://www.example.com/images.html', 'http://www2.example.com/faq/index.html')
'http://www.example.com/images.html'
>>> relative_url('https://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'https://www.example.com/images.html'

Edit: now using the posixpath implementation of os.path to make it work under windows too.

Burlington answered 19/9, 2011 at 10:43 Comment(0)
C
0
import itertools
import urlparse

def makeRelativeUrl(sourceUrl, targetUrl):
  '''

  :param sourceUrl: a string
  :param targetUrl: a string
  :return: the path to target url relative to first or targetUrl if at different net location
  '''
  # todo test
  parsedSource = urlparse.urlparse(sourceUrl)
  parsedTarget = urlparse.urlparse(targetUrl)

  if parsedSource.netloc == parsedTarget.netloc:
    # if target on same path but lower than source url
    if parsedTarget.path.startswith(parsedSource.path):
      return parsedTarget.path.replace(parsedSource.path, '.')
    # on same path
    elif parsedTarget.path.rsplit('/', 1)[0] == parsedSource.path.rsplit('/', 1)[0]:
      return './' + parsedTarget.path.rsplit('/', 1)[1]
    # same netloc, varying paths
    else:
      path = ''
      upCount = 0
      for item in list(itertools.izip_longest(parsedSource.path.rsplit('/'), parsedTarget.path.rsplit('/'))):
        if item[0] == item[1]:
          pass
        else:
          if item[0] is not None:
            upCount += 1
          if item[1] is not None:
            path += item[1] + '/'
      return upCount * '../' + path
  else:
    return targetUrl


if __name__ == '__main__':
  '''
  "tests" :p
  '''
  url1 = 'http://coolwebsite.com/questions/bobobo/bo/bo/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python'
  url2 = 'http://coolwebsite.com/questions/126524/iterate-a-list-with-indexes-in-python'

  print url1
  print url2
  print 'second relative to second:'
  print makeRelativeUrl(url1, url2)

  url1 = 'http://coolwebsite.com/questions/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python'
  url2 = 'http://coolwebsite.com/questions/1663807/bananas'

  print url1
  print url2
  print 'second relative to first:'
  print makeRelativeUrl(url1, url2)

  url1 = 'http://coolwebsite.com/questions/1663807/fruits'
  url2 = 'http://coolwebsite.com/questions/1663807/fruits/berries/bananas'

  print url1
  print url2
  print 'second relative to first:'
  print makeRelativeUrl(url1, url2)

Run 'tests' to see if it works :P

Checkup answered 21/4, 2015 at 21:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.