using cTAKES to parse clinical documents

Asked 21/10, 2013 at 20:53 Answered 26/1, 2018 at 19:11

I am trying to figure out how to run the Clinical Document Pipeline from Java. I have a set of clinical documents as plain texts. I want to parse these documents and extract a list of that is in document doc_ID, there is CUI with frequency of freq. I spent several days installing cTAKES and looking for a solution. I narrow it down to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline with a AnaylisisEngineDescription. Here is a part of the code:

String documentText = "Text of document to test goes here, such as the following. No edema, some soreness, denies pain.";
InputStream inStream = InputStreamCollectionReader.convertToByteArrayInputStream(documentText);
CollectionReader collectionReader = InputStreamCollectionReader.getCollectionReader(inStream);
AnalysisEngineDescription pipelineIncludingUmlsDictionaries = AnalysisEngineFactory.createAnalysisEngineDescription(
            "desc/analysis_engine/AggregatePlaintextUMLSProcessor");
AnalysisEngineDescription xWriter = AnalysisEngineFactory.createPrimitiveDescription(
            XWriter.class,
            XWriter.PARAM_OUTPUT_DIRECTORY_NAME,
            AssertionConst.evalOutputDir,
            XWriter.PARAM_XML_SCHEME_NAME,
            XWriter.XMI,
            XWriter.PARAM_FILE_NAMER_CLASS_NAME,
            CtakesFileNamer.class.getName());
SimplePipeline.runPipeline(collectionReader, pipelineIncludingUmlsDictionaries, xWriter);
System.out.println("Done at " + new Date());

The problem is it can not find "InputStreamCollectionReader". I searched for it but no success so far! Would you please give me a hint or show some directions? thanks for any help!

Mccarver answered 21/10, 2013 at 20:53 Comment(0)

Is there any particular reason why you want to use InputStreamCollectionReader? Otherwise, there are examples on how to use TextReader here.

Tisbee answered 22/10, 2013 at 6:6 Comment(2)

Thank you for your response Renaud. Yes, I'm using cTAKES to extract UMLS CUI (Concept Unique Identifier) related to each word. I found this code in the cTAKES documentations. However, "InputStreamCollectionReader" can not be found. I'm new to this maven and Eclips world. Sorry if it is a stupid question! I appreciate any comments and hints. – Mccarver 23/10, 2013 at 16:56

Ok, have you tried to use TextReader instead? It should work for you. – Tisbee 24/10, 2013 at 12:22

We have implemented a REST service for cTAKES that enables us to send clinical text as request and get back the analyzed output as JSON response.

You can have a look at the cTAKES REST module in the following github repo.I feel this should be the way to go for cTAKES users who are interested in web access.

Liles answered 26/1, 2018 at 19:11 Comment(4)

Does cTAKES have any kind of API documentation of its own? – Discredit 12/8, 2019 at 10:14

@matanster github.com/GoTeamEpsilon/ctakes-rest-service - checkout readme – Liles 12/8, 2019 at 10:23

Was asking about cTAKES itself – Discredit 13/8, 2019 at 0:37

@matanster what kind of document to be precise? – Liles 13/8, 2019 at 9:8

Recommended topics

Hot tags