Who should be responsible for selecting the appropriate derived class?

Asked 2/7, 2012 at 16:5 Answered 2/7, 2012 at 16:10

I recently wrote a class library that includes some objects that model certain types of files. For example, there is an abstract Document class, with derived classes PdfDocument (concrete) and OfficeDocument (abstract, with concrete derived classes such as WordDocument and ExcelDocument), etc.

Currently the way clients create a new object is by selecting the appropriate derived class and passing it the byte array. So for example, if I have a byte array of a PdfDocument and a WordDocument, I would do something like:

var wordDocument = new WordDocument(wordDocumentByteArray);
var pdfDocument = new PdfDocument(pdfDocumentByteArray);

Is this acceptable design, that the client must know what derived class to use? Or would I be better off hiding all but the abstract Document class and using something such as an abstract factory pattern to return the correct derived type? e.g.:

var wordDocument = DocumentFactory.GetDocument(wordDocumentByteArray, "docx");
// pass file extension so we know what the file is

Note that the derived types do not add additional properties/methods to the abstract class, they just implement the abstract methods in different ways.

Reproduction answered 2/7, 2012 at 16:5 Comment(3)

Definitely the 2nd option. Allows for far easier future extensibility and means that people spend less time updating class declarations when new, more appropriate types are added. – Shebat 2/7, 2012 at 16:11

Does the Document class have everything that the end user will ever need to do with a given Document, or will they occasionally (or frequently) need access to functionality that's specific to a more derived type? – Tillo 2/7, 2012 at 16:19

@Tillo Yes, the Document class has one public abstract method. All derived classes consist only of protected and private helper methods (plus the overridden public method) with the sole purpose of implementing the one public method. – Reproduction 2/7, 2012 at 16:26

The second approach is much better than the first one, because it hides the very fact of existence of Word and Pdf documents from the users of your library. This becomes especially important when you decide to add more document types - e.g. Rtf, Html, and so on: the users would get the benefits of the newly added types without having to recompile their code. In fact, they would not even notice that you have changed anything: if done right, their code will "just work" with the documents of type they have never knew existed.

P.S. If you can scan the byte array and figure out the correct type from it, your API can "earn some points for style" by eliminating the second parameter.

Polyphemus answered 2/7, 2012 at 16:9 Comment(1)

Thanks. I will definitely look in to seeing if I can figure out the type from the byte array, the only reason I added the extension in my example was because from some very brief research it looked like there may not be a surefire method of determining the file type from the binary representation every time. – Reproduction 2/7, 2012 at 16:15

If the derived types don't add any properties/methods and you have the technical ability to determine what type to use for a given byte[], I would not even make the derived classes public... they just increase the surface area of stuff the consumer will have to parse when learning your library. Just have a static factory method like like public static Document OpenDocument(byte[] data) in the Document class.

Taper answered 2/7, 2012 at 16:10 Comment(3)

Thanks. I will go with the factory method instead of an abstract factory (which does seem like overkill for what I need to accomplish). – Reproduction 2/7, 2012 at 16:17

"If the derived types don't add any properties/methods" That's a pretty big "if". There are a lot of things that could conceivably be done with a known file type that can't be done with a generic "Document". – Tillo 2/7, 2012 at 16:17

@Tillo I agree but for the purpose of my library, which I didn't state in the question, all it does is make some modifications to the binary data. The functionality is pretty specific so in this case I think the assumption is ok (the only need for the derived classes in the first place is because the changes and their implementation vary by the file type). – Reproduction 2/7, 2012 at 16:21

Recommended topics

Hot tags