using cTAKES to parse clinical documents
Asked Answered
M

2

8

I am trying to figure out how to run the Clinical Document Pipeline from Java. I have a set of clinical documents as plain texts. I want to parse these documents and extract a list of that is in document doc_ID, there is CUI with frequency of freq. I spent several days installing cTAKES and looking for a solution. I narrow it down to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline with a AnaylisisEngineDescription. Here is a part of the code:

String documentText = "Text of document to test goes here, such as the following. No edema, some soreness, denies pain.";
InputStream inStream = InputStreamCollectionReader.convertToByteArrayInputStream(documentText);
CollectionReader collectionReader = InputStreamCollectionReader.getCollectionReader(inStream);
AnalysisEngineDescription pipelineIncludingUmlsDictionaries = AnalysisEngineFactory.createAnalysisEngineDescription(
            "desc/analysis_engine/AggregatePlaintextUMLSProcessor");
AnalysisEngineDescription xWriter = AnalysisEngineFactory.createPrimitiveDescription(
            XWriter.class,
            XWriter.PARAM_OUTPUT_DIRECTORY_NAME,
            AssertionConst.evalOutputDir,
            XWriter.PARAM_XML_SCHEME_NAME,
            XWriter.XMI,
            XWriter.PARAM_FILE_NAMER_CLASS_NAME,
            CtakesFileNamer.class.getName());
SimplePipeline.runPipeline(collectionReader, pipelineIncludingUmlsDictionaries, xWriter);
System.out.println("Done at " + new Date());

The problem is it can not find "InputStreamCollectionReader". I searched for it but no success so far! Would you please give me a hint or show some directions? thanks for any help!

Mccarver answered 21/10, 2013 at 20:53 Comment(0)
T
3

Is there any particular reason why you want to use InputStreamCollectionReader? Otherwise, there are examples on how to use TextReader here.

Tisbee answered 22/10, 2013 at 6:6 Comment(2)
Thank you for your response Renaud. Yes, I'm using cTAKES to extract UMLS CUI (Concept Unique Identifier) related to each word. I found this code in the cTAKES documentations. However, "InputStreamCollectionReader" can not be found. I'm new to this maven and Eclips world. Sorry if it is a stupid question! I appreciate any comments and hints.Mccarver
Ok, have you tried to use TextReader instead? It should work for you.Tisbee
L
0

We have implemented a REST service for cTAKES that enables us to send clinical text as request and get back the analyzed output as JSON response.

You can have a look at the cTAKES REST module in the following github repo.I feel this should be the way to go for cTAKES users who are interested in web access.

Liles answered 26/1, 2018 at 19:11 Comment(4)
Does cTAKES have any kind of API documentation of its own?Discredit
@matanster github.com/GoTeamEpsilon/ctakes-rest-service - checkout readmeLiles
Was asking about cTAKES itselfDiscredit
@matanster what kind of document to be precise?Liles

© 2022 - 2024 — McMap. All rights reserved.