How do I stop parsing an XML document with IVBSAXXMLReader in Delphi?
Asked Answered
D

1

6

In order to quickly parse some large XML documents in a Delphi (2007) program, I have implemented the IVBSAXContentHandler interface and use it like this:

FXMLReader := CoSAXXMLReader60.Create;
FXMLReader.contentHandler := Self;
FXMLReader.parseURL(FXmlFile);

This works fine, as long as I simply parse the whole file, but I'd like to stop once I found the content that I am looking for. So my implementation of IVBSAXContentHandler.startElement checks for some condition and when it is true should abort further parsing. I tried this:

procedure TContentHandler.startElement(var strNamespaceURI, strLocalName,  strQName: WideString; const oAttributes: IVBSAXAttributes);
begin
  if SomeCondition then
    SysUtils.Abort;
end;

Unfortunately this raises the rather unhelpful EOleException "Catastrophic failure". (I also tried raising a custom exception with the same result.)

MSDN says the following:

The ErrorHandler interface essentially allows the XMLReader to signal the ContentHandler implementation that it wants to abort processing. Conversely, ContentHandler implementations can indicate to the XMLReader that it wants to abort processing. This can be accomplished by simply raising an application-specific exception. This is especially useful for aborting processing once the implementation finds what it is looking for:

Private Sub IVBSAXContentHandler_characters(ByVal strChars As String)
' I found what I was looking for, abort processing
  Err.Raise vbObjectError + errDone, "startElement", _
        "I got what I want, let's go play!"
End Sub

So, apparently somehow I also need to implement the IVBSAXErrorHandler interface somehow. This interface needs three methods:

procedure TContentHandler.error(const oLocator: IVBSAXLocator;
  var strErrorMessage: WideString; nErrorCode: Integer);
begin

end;

procedure TContentHandler.fatalError(const oLocator: IVBSAXLocator;
  var strErrorMessage: WideString; nErrorCode: Integer);
begin

end;

procedure TContentHandler.ignorableWarning(const oLocator: IVBSAXLocator;
  var strErrorMessage: WideString; nErrorCode: Integer);
begin

end;

and also must be assigned before calling the ParseURL method:

FXMLReader := CoSAXXMLReader60.Create;
FXMLReader.contentHandler := Self;
FXMLReader.errorHandler := Self;
FXMLReader.parseURL(FXmlFile);

Unfortunately that doesn't make any difference, because now the fatalError handler gets called with strErrorMessage = 'Catastrophic failure'. With an empty method body this still results in the above mentioned unhelpful EOleException "Catastrophic failure".

So, now I am out of ideas:

  • Do I need to implement something special in the errorhandler interface?
  • Do I need to raise a particular exception instead of EAbort?
  • Or am I missing something else?

EDIT:

Based on Ondrej Kelle's answer, here is the solution I finally used:

Declare the following constant:

const
  // idea taken from Delphi 10.1 unit System.Win.ComObj:
  EExceptionRaisedHRESULT = HResult(E_UNEXPECTED or (1 shl 29)); // turn on customer bit

Add two new fields to the TContentHandler class:

FExceptObject: TObject;
FExceptAddr: Pointer;

Add this code to the destructor:

FreeAndNil(FExceptObject);

Add a new method SafeCallException:

function TContentHandler.SafeCallException(ExceptObject: TObject; ExceptAddr: Pointer): HResult;
var
  GUID: TGUID;
  exc: Exception;
begin
  if ExceptObject is Exception then begin
    exc := Exception(ExceptObject);
    // Create a copy of the exception object and store it in the FExceptObject field
    FExceptObject := exc.NewInstance;
    Exception(FExceptObject).Create(exc.Message);
    Exception(FExceptObject).HelpContext := exc.HelpContext;
    // Store the exception address in the FExceptAddr field
    FExceptAddr := ExceptAddr;
    // return a custom HRESULT
    Result := EExceptionRaisedHRESULT;
  end else begin
    ZeroMemory(@GUID, SizeOf(GUID));
    Result := HandleSafeCallException(ExceptObject, ExceptAddr, GUID, '', '');
  end;
end;

Add an exception handler to the calling code:

var
  exc: Exception;
begin
  try
    FXMLReader := CoSAXXMLReader60.Create;
    FXMLReader.contentHandler := Self;
    // we do not need an errorHandler
    FXMLReader.parseURL(FXmlFile);
    FXMLReader := nil;
  except
    on e: EOleException do begin
      // Check for the custom HRESULT
      if e.ErrorCode = EExceptionRaisedHRESULT then begin
        // Check that the exception object is assigned
        if Assigned(FExceptObject) then begin
          exc := Exception(FExceptObject);
          // set the pointer to NIL
          FExceptObject := nil;
          // raise the exception a the given address
          raise exc at FExceptAddr;
        end;
      end;
      // fallback: raise the original exception
      raise;
    end;
  end;
end;

While this works for me, it has a serious flaw: It copies only the Message and the HelpContext property of the original exception. So, if there are more properties/fields, e.g.

EInOutError = class(Exception)
public
  ErrorCode: Integer;
end;

These will not be initialized when the exception is re-raised in the calling code.

The advantage is that you will get the correct exception address in the debugger. Beware that you won't get the correct call stack.

Donny answered 17/8, 2016 at 8:55 Comment(1)
You should change your edit into an actual answer. Posting it in the question itself isn't really appropriate, and it would be much more useful to future readers in the form of a complete answer. (It would also get you some votes, as it's pretty well written as it is; posted as an actual answer and fleshed out slightly would definitely be worth some upvotes.)Satin
R
7

Simply calling Abort; is fine. In this case, just override SafeCallException in your IVBSAXContentHandler implementor class:

function TContentHandler.SafeCallException(ExceptObject: TObject; ExceptAddr: Pointer): HRESULT;
begin
  Result := HandleSafeCallException(ExceptObject, ExceptAddr, TGUID.Empty, '', '');
end;

HandleSafeCallException supplied in ComObj will cause EAbort you're raising to be translated into a HRESULT value E_ABORT which will then be translated back to EAbort by SafeCallError.

Alternatively, you can raise your own exception class, override SafeCallException to translate it into your specific HRESULT value and replace SafeCallErrorProc with your own to translate it back into your Delphi exception which you can then handle on the calling side.

Retiform answered 17/8, 2016 at 10:12 Comment(10)
@DavidHeffernan They do, but only in TComObject which is usually the base class when you're implementing a full COM object. Possibly they could have added the default EAbort handling already at a level between TComObject and TInterfacedObject and still leave the latter independent of COM, I'm not sure.Retiform
Apparently HandleSafeCallException (which is declared in unit ComObj just in case anybody else wondered) does not return E_ABORT if ExceptObject is EAbort (at least not in Delphi 2007), so I had to change that myself. But thanks for pointing me in this direction.Donny
@Donny Sorry, I don't have D2007 at hand at the moment. I was using XE7. Glad you've managed to sort it out, though.Retiform
@OndrejKelle I wasn't complaining, just giving a hint to other people who might come across this answer later. Oh, and TGUID.Empty does not exist in Delphi 2007. Instead we need a TGUID record initialized with all zero, e.g. ZeroMemory(@GUIDRec, SizeOf(GUIDRec))Donny
@Donny Thanks for the hint. I think the best would be to clean up the answer, I'll do that later when I'm back at my computer with D2007 installed.Retiform
Another thing: Even in Delphi 10.1 HandleSafeCallException apparently only handles EAbort and EOleSysError in a way that does not return E_UNEXPECTED which in turn results in a "Catastrophic failure". So to to answer my original question: EAbort must be used here, any other excecption will not work. And in addition the caller must handle EOleException since that's the class being raised, not EAbort. (This is getting complicated ...)Donny
@Donny You could handle the special cases in your overridden SafeCallException and use HandleSafeCallException only as a fallback.Retiform
@OndrejKelle spoiled by Delphi and the VCL/RTL as I am, I was assuming that there is a mostly automatic way to raise a Delphi exception in the code called by parseURL and getting back exactly this exception. Apparently that's not possible. (Actually it is, but not automatically.)Donny
I just googled for the same problem again and found my own question - I'm definitely getting old. I also grep-ed my code base and found that the JCL unit JclCompression supplies a TJclSaveCallInterfaceObject which seems to implement most of what I need. If I now could remember where in my code I implemented it myself when I asked this question ...Donny
Ouch, it's right there at the end of my questionDonny

© 2022 - 2024 — McMap. All rights reserved.