INFO - Sample code
I've set up sample code (SSCCE) for you to help track the problem:
https://github.com/ljader/test-cxf-base64-marshall
The problem
I'm integrating with 3rd party JAX-WS service, so I cannot change the WSDL.
The 3rd party webservice expects Base64 encoded bytes to perform some operation on them - they expect that client sends whole bytes in SOAP message. They don't want to change to MTOM / XOP, so I'm stuck with current requirements.
I decided to use CXF to easily set up sample client, and it worked ok for small files.
But when I try to send BIG data, i.e. 200MB, the CXF/JAXB throws an exception:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.xml.bind.v2.util.ByteArrayOutputStreamEx.readFrom(ByteArrayOutputStreamEx.java:75)
at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.get(Base64Data.java:196)
at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.writeTo(Base64Data.java:312)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.text(UTF8XmlOutput.java:312)
at com.sun.xml.bind.v2.runtime.XMLSerializer.leafElement(XMLSerializer.java:356)
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$PcdataImpl.writeLeafElement(RuntimeBuiltinLeafInfoImpl.java:191)
at com.sun.xml.bind.v2.runtime.MimeTypedTransducer.writeLeafElement(MimeTypedTransducer.java:96)
at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.writeLeafElement(TransducedAccessor.java:254)
at com.sun.xml.bind.v2.runtime.property.SingleElementLeafProperty.serializeBody(SingleElementLeafProperty.java:130)
at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:360)
at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:696)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl$1.serializeBody(ElementBeanInfoImpl.java:155)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl$1.serializeBody(ElementBeanInfoImpl.java:130)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeBody(ElementBeanInfoImpl.java:332)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeRoot(ElementBeanInfoImpl.java:339)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeRoot(ElementBeanInfoImpl.java:75)
at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsRoot(XMLSerializer.java:494)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:323)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:251)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.writeObject(JAXBEncoderDecoder.java:617)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.marshall(JAXBEncoderDecoder.java:241)
at org.apache.cxf.jaxb.io.DataWriterImpl.write(DataWriterImpl.java:237)
at org.apache.cxf.interceptor.AbstractOutDatabindingInterceptor.writeParts(AbstractOutDatabindingInterceptor.java:117)
at org.apache.cxf.wsdl.interceptors.BareOutInterceptor.handleMessage(BareOutInterceptor.java:68)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:514)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:423)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:324)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:277)
at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:139)
My findings
I've tracked bug, that based on xsd type "base64Binary", the
com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl
decides, that
com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data
should handle marshalling of data from
javax.activation.DataHandler
During marshalling, the WHOLE data from underlying InputStream is trying to be read http://grepcode.com/file/repo1.maven.org/maven2/com.sun.xml.bind/jaxb-impl/2.2.11/com/sun/xml/bind/v2/runtime/unmarshaller/Base64Data.java/#311, which causes OOME exception.
Problem
CXF uses JAXB during marshalling Java objects into SOAP messages - when marshalling InputStream, the WHOLE input stream is read to memory before beeing converted into Base64 binary.
So I want to send ("stream") data from client to server in chunks (since the OutputSteam in marshaller is wrapped direct HttpURLConnection), so my client could can handle sending any amount of data.
Especially when many threads would be using my client, the streaming is IMHO very desirable.
I don't have good JAX-WS/CXF/JAXB knowledge, hence the question.
The only materials which I found and may be usefull are:
Can JAXB parse large XML files in chunks
http://rezarahim.blogspot.com/2010/05/chunking-out-big-xml-with-stax-and-jaxb.html
The questions
Why CXF/JAXB loads whole InputStream into memory - is not the DataHandler purpouse to prevent such implementation?
Do you know any way to change JAXB behaviour to differently marshall InputStream?
Do you know different marshallers, which can handle such big data marshalling?
As a last resort, maybe you have links to some materials, how to create custom marshaller which would stream the data directly to the server?