Get PDF hyperlinks on iOS with Quartz
Asked Answered
R

3

55

I've spent all day trying to get hyperlinks metadata from PDFs in my iPad application. The CGPDF* APIs are a true nightmare, and the only piece of information I've found on the net about all this is that I have to look for an "Annots" dictionary, but I just can't find it in my PDFs.

I even used the old Voyeur Xcode sample to inspect my test PDF file, but no trace of this "Annots" dictionary...

You know, this is a feature I see on every PDF reader - this same question has been asked multiple times here with no real practical answers. I usually never ask for sample code directly but apparently this time I really need it... anyone got this working, possibly with sample code?

Update: I just realized the guy who has done my testing PDF had just inserted an URL as text, and not a real annotation. He tried putting an annotation and my code works now... But that's not what I need, so it seems I'll have to analyze text and search for URLs. But that's another story...

Update 2: So I finally came up with some working code. I'm posting it here so hopefully it'll help someone. It assumes the PDF document actually contains annotations.

for(int i=0; i<pageCount; i++) {
    CGPDFPageRef page = CGPDFDocumentGetPage(doc, i+1);

    CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(page);

    CGPDFArrayRef outputArray;
    if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( outputArray );
    if(!arrayCount) {
        continue;
    }

    for( int j = 0; j < arrayCount; ++j ) {
        CGPDFObjectRef aDictObj;
        if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
            return;
        }

        CGPDFDictionaryRef annotDict;
        if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
            return;
        }

        CGPDFDictionaryRef aDict;
        if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
            return;
        }

        CGPDFStringRef uriStringRef;
        if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
            return;
        }

        CGPDFArrayRef rectArray;
        if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
            return;
        }

        int arrayCount = CGPDFArrayGetCount( rectArray );
        CGPDFReal coords[4];
        for( int k = 0; k < arrayCount; ++k ) {
            CGPDFObjectRef rectObj;
            if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
                return;
            }

            CGPDFReal coord;
            if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
                return;
            }

            coords[k] = coord;
        }               

        char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

        NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
        CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);

        CGPDFInteger pageRotate = 0;
        CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
        CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
        if( pageRotate == 90 || pageRotate == 270 ) {
            CGFloat temp = pageRect.size.width;
            pageRect.size.width = pageRect.size.height;
            pageRect.size.height = temp;
        }

        rect.size.width -= rect.origin.x;
        rect.size.height -= rect.origin.y;

        CGAffineTransform trans = CGAffineTransformIdentity;
        trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height);
        trans = CGAffineTransformScale(trans, 1.0, -1.0);

        rect = CGRectApplyAffineTransform(rect, trans);

        // do whatever you need with the coordinates.
        // e.g. you could create a button and put it on top of your page
        // and use it to open the URL with UIApplication's openURL
    }
}
Raff answered 2/11, 2010 at 17:26 Comment(9)
line 6, should that not be continue instead of return? - why do you return after checking object,value,dict,string,array etc.Cathouse
That's just example code without any error checking.Raff
PDF rects dont translate to native rects see my thread for details: scroll down to to: 'Other PDF Features','Getting Links inside a PDF', 'Understanding the PDF Rect for link positioning' #3890134Cathouse
I'm doing rect.size.width -= rect.origin.x; rect.size.height -= rect.origin.y; to fix that, it's working for me..Raff
Yea that works for w&h but the pdf spec states: the array takes the form [llx lly urx ury] specifying the lower-left x, lower-left y, upper-right x, and upper-right y coordinates of the rectangle, in that order. This means that your rect.origin.y is actually rect.origin.y+rect.size.height as the adobe rect is the bottom left and not the top left defaulted by CGRect. It may not have been that noticable as it would probably only been 20-30 px out and still registered your pressCathouse
It's also worth mentioning that i couldn't get a URI from the annot, only a 'Dest' I assume this is default for internal document links?Cathouse
Yeah, IIRC "Dest" is for internal page links.Raff
See also #3046087 to get the page size and convert the coordinates from PDF values to iOS valuesScrapple
@pt2ph8 Hi have you chance to get all links from document?Flyte
F
15

heres the basic idea to get to the annots CGPDFDictionary for each page atleast. after that you should be able to figure it out with help from the PDF spec from Adobe.

1.) get the CGPDFDocumentRef.

2.) get each page.

3.) on each page, use CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray) where pageDictionary is the CGPDFDictionary representing the CGPDFPage, and outputArray is the variable (CGPDFArrayRef) to store the Annots array of that page in.

Fining answered 2/11, 2010 at 18:20 Comment(13)
@Jesse Naugher: Thanks a lot for your answer, but: "after that you should be able to figure it out with help from the PDF spec from Adobe" I couldn't find any useful information from that bloated mess that is the Adobe's PDF spec. The only part of it where the word "annotation" appears is section 8, but again, I can't see any info that could help me here... frustrationRaff
theres an entire section about every kind of annotation that can be in a pdf document, including the link annotation. Basically when you get the Annotations Array, you loop through it, and each entry is a dictionary that is an annotation. These dictionaries have a key called 'Subtype' that determines the type of annotation it is, and "Link" is one of them, and is defined in the pdf spec.Fining
@Jesse Naugher: Amazing, I just realized I was staring at the wrong document - now I have the real PDF spec document. I'll check it out now, thanks (yeah, that's what happens when you're tired/frustrated).Raff
@Jesse Naugher: CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray) is returning false for me... Here's how I get pageDictionary: CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(page);Raff
make sure you are getting the pdf itself correctly, and also the page you have is valid, and has an annotation on it. you have to check each page for annotations separatelyFining
@Jesse Naugher: Yeah, I'm doing that in a for loop. The CGPDFPageRef is valid, and the document is, too (I'm also drawing it so I'm pretty sure about it). Also, there are three links on the page I'm testing with... And the Preview app is reading them. Here's my method: pastebin.com/69JW1Kkc I set a breakpoint inside the CGPDFDictionaryGetArray if and it doesn't reach the CGPDFArrayGetCount call..Raff
I also inspected the PDF with XCode's Voyeur sample (github.com/below/PDF-Voyeur/network) (which shows the three of nodes in a PDF) and there's no Annots array... but the links are there, I can click them in Preview...Raff
im not sure what to tell you, that should work, the only difference i have is i use a bool variable for the if check, but obviously that shouldn't make a difference. id try with a pdf you make in adobe or something, perhaps the creator doesn't correctly create annotations for links? im not sure.Fining
I updated my post, it was my PDF... Anyway it seems I'll have to parse text and search for URLs, I need it in my app... Thanks anyway for your answers.Raff
Good luck, your going to need it :pFining
@Jesse Naugher: Now that I'm parsing annotations successfully, I need to display them. The only problem is that PDF obviously uses a different coordinate system. It seems that they are upside down or something. Any idea on how to fix this?Raff
I figured it out. First the Rect in the PDF is not in X,Y,W,H format, but it's an array of the four points that make up the rectangle, so: CGRect rect = CGRectMake(coords[0],coords[1],coords[2]-coords[0],coords[3]-coords[1]).Raff
Then the rect needs to be transformed in the same way the PDF itself is transformed when drawing (normally it would be upside down since Quartz uses a different coordinate system). So the code: CGAffineTransform trans = CGAffineTransformIdentity; trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height); trans = CGAffineTransformScale(trans, 1.0, -1.0); rect = CGRectApplyAffineTransform(rect, trans);Raff
C
9

Great code but I am having a little trouble working it into my project. It gets all the URL's correctly but when I click on it nothing happens. Here is my code I had to modify yours slightly to work with my project). Is there something missing:

- (void) renderPageAtIndex:(NSUInteger)index inContext:(CGContextRef)ctx {
//CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);

CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);
CGAffineTransform transform1 = aspectFit(CGPDFPageGetBoxRect(page, kCGPDFMediaBox),
                                         CGContextGetClipBoundingBox(ctx));
CGContextConcatCTM(ctx, transform1);
CGContextDrawPDFPage(ctx, page);

int pageCount = CGPDFDocumentGetNumberOfPages(pdf);
int i = 0;
while (i<pageCount) {
    i++;
    CGPDFPageRef page = CGPDFDocumentGetPage(pdf, i+1);

    CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(page);

    CGPDFArrayRef outputArray;
    if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( outputArray );
    if(!arrayCount) {
        continue;
    }

    for( int j = 0; j < arrayCount; ++j ) {
        CGPDFObjectRef aDictObj;
        if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
            return;
        }

        CGPDFDictionaryRef annotDict;
        if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
            return;
        }

        CGPDFDictionaryRef aDict;
        if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
            return;
        }

        CGPDFStringRef uriStringRef;
        if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
            return;
        }

        CGPDFArrayRef rectArray;
        if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
            return;
        }

        int arrayCount = CGPDFArrayGetCount( rectArray );
        CGPDFReal coords[4];
        for( int k = 0; k < arrayCount; ++k ) {
            CGPDFObjectRef rectObj;
            if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
                return;
            }

            CGPDFReal coord;
            if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
                return;
            }

            coords[k] = coord;
        }               

        char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

        NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
        CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);

        CGPDFInteger pageRotate = 0;
        CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
        CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
        if( pageRotate == 90 || pageRotate == 270 ) {
            CGFloat temp = pageRect.size.width;
            pageRect.size.width = pageRect.size.height;
            pageRect.size.height = temp;
        }

        rect.size.width -= rect.origin.x;
        rect.size.height -= rect.origin.y;

        CGAffineTransform trans = CGAffineTransformIdentity;
        trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height);
        trans = CGAffineTransformScale(trans, 1.0, -1.0);

        rect = CGRectApplyAffineTransform(rect, trans);

        // do whatever you need with the coordinates.
        // e.g. you could create a button and put it on top of your page
        // and use it to open the URL with UIApplication's openURL
        NSURL *url = [NSURL URLWithString:uri];
        NSLog(@"URL: %@", url);
        CGPDFContextSetURLForRect(ctx, (CFURLRef)url, rect);
       // CFRelease(url);
        }
    }   


}

Thanks & great work BrainFeeder!

UPDATE:

For anybody using the leaves project in your app this is how I got the PDF links to work (it's not perfect as the rect seems to fill the entire screen but it's a start):

- (void) renderPageAtIndex:(NSUInteger)index inContext:(CGContextRef)ctx {

CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);
CGAffineTransform transform1 = aspectFit(CGPDFPageGetBoxRect(page, kCGPDFMediaBox),
                                         CGContextGetClipBoundingBox(ctx));
CGContextConcatCTM(ctx, transform1);
CGContextDrawPDFPage(ctx, page);


    CGPDFPageRef pageAd = CGPDFDocumentGetPage(pdf, index);

    CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(pageAd);

    CGPDFArrayRef outputArray;
    if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( outputArray );
    if(!arrayCount) {
        //continue;
    }

    for( int j = 0; j < arrayCount; ++j ) {
        CGPDFObjectRef aDictObj;
        if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
            return;
        }

        CGPDFDictionaryRef annotDict;
        if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
            return;
        }

        CGPDFDictionaryRef aDict;
        if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
            return;
        }

        CGPDFStringRef uriStringRef;
        if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
            return;
        }

        CGPDFArrayRef rectArray;
        if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
            return;
        }

        int arrayCount = CGPDFArrayGetCount( rectArray );
        CGPDFReal coords[4];
        for( int k = 0; k < arrayCount; ++k ) {
            CGPDFObjectRef rectObj;
            if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
                return;
            }

            CGPDFReal coord;
            if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
                return;
            }

            coords[k] = coord;
        }               

        char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

        NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
        CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);

        CGPDFInteger pageRotate = 0;
        CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
        CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
        if( pageRotate == 90 || pageRotate == 270 ) {
            CGFloat temp = pageRect.size.width;
            pageRect.size.width = pageRect.size.height;
            pageRect.size.height = temp;
        }

        rect.size.width -= rect.origin.x;
        rect.size.height -= rect.origin.y;

        CGAffineTransform trans = CGAffineTransformIdentity;
        trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height);
        trans = CGAffineTransformScale(trans, 1.0, -1.0);

        rect = CGRectApplyAffineTransform(rect, trans);

            // do whatever you need with the coordinates.
            // e.g. you could create a button and put it on top of your page
            // and use it to open the URL with UIApplication's openURL
            NSURL *url = [NSURL URLWithString:uri];
            NSLog(@"URL: %@", url);
//          CGPDFContextSetURLForRect(ctx, (CFURLRef)url, rect);
            UIButton *button = [[UIButton alloc] initWithFrame:rect];
            [button setTitle:@"LINK" forState:UIControlStateNormal];
            [button addTarget:self action:@selector(openLink:) forControlEvents:UIControlEventTouchUpInside];
            [self.view addSubview:button];
           // CFRelease(url);
        }
    //} 

Final Update Below is the final code I used in my apps.

- (void) renderPageAtIndex:(NSUInteger)index inContext:(CGContextRef)ctx {
//If the view already contains a button control remove it
if ([[self.view subviews] containsObject:button]) {
    [button removeFromSuperview];
}

CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);
CGAffineTransform transform1 = aspectFit(CGPDFPageGetBoxRect(page, kCGPDFMediaBox),
                                         CGContextGetClipBoundingBox(ctx));
CGContextConcatCTM(ctx, transform1);
CGContextDrawPDFPage(ctx, page);


CGPDFPageRef pageAd = CGPDFDocumentGetPage(pdf, index);

CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(pageAd);

CGPDFArrayRef outputArray;
if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
    return;
}

int arrayCount = CGPDFArrayGetCount( outputArray );
if(!arrayCount) {
    //continue;
}

for( int j = 0; j < arrayCount; ++j ) {
    CGPDFObjectRef aDictObj;
    if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
        return;
    }

    CGPDFDictionaryRef annotDict;
    if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
        return;
    }

    CGPDFDictionaryRef aDict;
    if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
        return;
    }

    CGPDFStringRef uriStringRef;
    if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
        return;
    }

    CGPDFArrayRef rectArray;
    if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( rectArray );
    CGPDFReal coords[4];
    for( int k = 0; k < arrayCount; ++k ) {
        CGPDFObjectRef rectObj;
        if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
            return;
        }

        CGPDFReal coord;
        if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
            return;
        }

        coords[k] = coord;
    }               

    char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

    NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
    CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);
    CGPDFInteger pageRotate = 0;
    CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
    CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
    if( pageRotate == 90 || pageRotate == 270 ) {
        CGFloat temp = pageRect.size.width;
        pageRect.size.width = pageRect.size.height;
        pageRect.size.height = temp;
    }

    rect.size.width -= rect.origin.x;
    rect.size.height -= rect.origin.y;

    CGAffineTransform trans = CGAffineTransformIdentity;
    trans = CGAffineTransformTranslate(trans, 35, pageRect.size.height+150);
    trans = CGAffineTransformScale(trans, 1.15, -1.15);

    rect = CGRectApplyAffineTransform(rect, trans);

    urlLink = [NSURL URLWithString:uri];
    [urlLink retain];

    //Create a button to get link actions
    button = [[UIButton alloc] initWithFrame:rect];
    [button setBackgroundImage:[UIImage imageNamed:@"link_bg.png"] forState:UIControlStateHighlighted];
    [button addTarget:self action:@selector(openLink:) forControlEvents:UIControlEventTouchUpInside];
    [self.view addSubview:button];
}   
[leavesView reloadData];
}

}
Candiscandle answered 3/11, 2010 at 22:46 Comment(9)
@user470763: Yeah, adding a button is the most obvious solution :)Raff
@Brainfeeder The only problems I am really having now is that rect size only scales for iPhone not iPad. Also, on full page links I can't swipe to change page.Candiscandle
@kmcg : Thank you for your code,i am able to scale rect sizes also in ipad,the only thing you need is to change the values of x and y,may be it may help you. Also wanted to ask whether u r able to find any word from the pdf file other than URLs.Thanks.Shuler
Beware that the button created by that piece of code is clear with white font. So if your pdf is not colored, then you won't see it. I'm not being able to put the rect on the right place thoughCephalalgia
@kmcg Does this work for internal links too? Do you have any example project? Thanks in advance.Reprehension
@Reprehension What do you mean by internal links? It works for links within the PDF (e.g. advertisements, URL's etc.) These links have to be built into the PDF when it was created though. It doesn't just scan for links written as text. I don't have an example project, sorry.Candiscandle
@kmcgrady Thanks, Appreciate your reply, I figured it out!Reprehension
@kmcgrady Did you ever figure out how to translate the rect correctly?Salify
@lindon I have updated my answer with my final code. I'm 90% sure this worked on both iPhone and iPad but I don't have the time to test just now. I haven't worked on the project for about 6 months so I can't remember. Hopefully it helps you though. When I finished everything was working.Candiscandle
U
0

I must be confused, because this all works if I use:

CGRect rect = CGRectMake(coords[0],coords[1],coords[2]-coords[0]+1,coords[3]-coords[1]+1);

Am I misusing something later, perhaps? PDF supplies the corners, and CGRect wants a corner and a size.

Unsettle answered 18/8, 2013 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.