I am parsing a .pdf using the acrobat.tlb library
Hyphenated words are being split across new lines with the hyphens removed.
e.g. ABC-123-XXX-987
Parses as:
ABC
123
XXX
987
If I parse the text using iTextSharp it parses the whole string as displayed in the file which is the behaviour I want. However, I need to highlight these strings (serial numbers) in the .pdf and iTextSharp is not placing the highlight in the correct location... hence acrobat.tlb
I am using this code, from here: http://www.vbforums.com/showthread.php?561501-RESOLVED-2003-How-to-highlight-text-in-pdf
' filey = "*your full file name including directory here*"
AcroExchApp = CreateObject("AcroExch.App")
AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
' Open the [strfiley] pdf file
AcroExchAVDoc.Open(filey, "")
' Get the PDDoc associated with the open AVDoc
AcroExchPDDoc = AcroExchAVDoc.GetPDDoc
sustext = "accessorizes"
suktext = "accessorises"
' get JavaScript Object
' note jso is related to PDDoc of a PDF,
jso = AcroExchPDDoc.GetJSObject
' count
nCount = 0
nCount1 = 0
gbStop = False
bUSCnt = False
bUKCnt = False
' search for the text
If Not jso Is Nothing Then
' total number of pages
nPages = jso.numpages
' Go through pages
For i = 0 To nPages - 1
' check each word in a page
nWords = jso.getPageNumWords(i)
For j = 0 To nWords - 1
' get a word
word = Trim(CStr(jso.getPageNthWord(i, j)))
'If VarType(word) = VariantType.String Then
If word <> "" Then
' compare the word with what the user wants
If Trim(sustext) <> "" Then
result = StrComp(word, sustext, vbTextCompare)
' if same
If result = 0 Then
nCount = nCount + 1
If bUSCnt = False Then
iUSCnt = iUSCnt + 1
bUSCnt = True
End If
End If
End If
If suktext<> "" Then
result1 = StrComp(word, suktext, vbTextCompare)
' if same
If result1 = 0 Then
nCount1 = nCount1 + 1
If bUKCnt = False Then
iUKCnt = iUKCnt + 1
bUKCnt = True
End If
End If
End If
End If
Next j
Next i
jso = Nothing
End If
The code does the job of highlighting the text, but the FOR loop with the 'word' variable is splitting the hyphenated string into component parts.
For i = 0 To nPages - 1
' check each word in a page
nWords = jso.getPageNumWords(i)
For j = 0 To nWords - 1
' get a word
word = Trim(CStr(jso.getPageNthWord(i, j)))
Does anyone know how to maintain the whole string using acrobat.tlb? My quite extensive searches have drawn a blank.