How to use cTAKES from the command line?
Asked Answered
B

2

5

I wonder how to use Apache cTAKES from the command line.

E.g. :

  • I have a file note.txt that contains some text like "Patient had elevated blood sugar but tests confirm no diabetes. Patient's father had adult onset diabetes."
  • I want to use the provided analysis engine \apache-ctakes-3.2.2-bin\apache-ctakes-3.2.2\desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextUMLSProcessor.xml

How can I get the analyse engine's output (viz. the annotations) using the command line (i.e. without using graphical user interfaces such as UIMA CAS Visual Debugger or the Collection Processing Engine)? I'd prefer to use the provided JAR files rather than having to compile the code.

The question is fairly simple but I couldn't find the information in cTAKES's README or on Confluence.

Band answered 4/10, 2015 at 23:40 Comment(0)
M
6

Please try the following steps to use cTAKES CPE from the command line (the key class is "org.apache.uima.examples.cpe.SimpleRunCPE"):

  1. Change directory to $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/

  2. Copy test_plaintext.xml to another file (e.g., "test_plaintext_test.xml").

  3. Edit "test_plaintext_test.xml" to set input directory; find "nameValuePair" with name = "InputDirectory", and set the value string to the input directory. The following example set the input directory as "$CTAKES_HOME/note_input":

    <nameValuePair>
        <name>InputDirectory</name>
        <value>
            <string>note_input</string>
        </value>
    </nameValuePair>
    
  4. Similarly, edit "test_plaintext_test.xml" to set the output directory ("$CTAKES_HOME/result_output" in the following example):

    <nameValuePair>
        <name>OutputDirectory</name>
        <value>
            <string>result_output</string>
        </value>
    </nameValuePair>
    
  5. Save "test_plaintext_test.xml" and change directory to $CTAKES_HOME/bin.

  6. Copy runctakesCPE.sh to another file (e.g., "runctakesCPE_CLI.sh").

  7. Edit "runctakesCPE_CLI.sh"; replace the last line ("java ...") to the following line ("USER" and "PW" should be replaced by your UMLS Username and Password, and the memory setting Xms and Xms may be adjusted based on the size of memory on your machine):

    java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -cp $CTAKES_HOME/lib/*:$CTAKES_HOME/desc/:$CTAKES_HOME/resources/ -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g org.apache.uima.examples.cpe.SimpleRunCPE $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext_test.xml
    
  8. Save "runctakesCPE_CLI.sh", and then create the input directory ("$CTAKES_HOME/note_input") and the output directory ("$CTAKES_HOME/result_output").

  9. Put your note.txt to the input directory (e.g., "$CTAKES_HOME/note_input/note.txt"), and then run "runctakesCPE_CLI.sh".

  10. cTAKES CPE will start running under command line mode, and the resulting file will be generated in the output directory (e.g., "$CTAKES_HOME/result_output/note.txt.xml").

I actually used your note.txt to run the steps above and here are the first several lines of the generated note.txt.xml:

    <?xml version="1.0" encoding="UTF-8"?><CAS version="2">
        <uima.cas.Sofa _indexed="0" _id="3" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="Patient had elevated blood sugar but tests confirm no diabetes. Patient's father had adult onset diabetes.&#10;"/>
        <org.apache.ctakes.typesystem.type.structured.DocumentID _indexed="1" _id="1" documentID="note.txt"/>
        <uima.tcas.DocumentAnnotation _indexed="1" _id="10" _ref_sofa="3" begin="0" end="107" language="x-unspecified"/>
        <org.apache.ctakes.typesystem.type.textspan.Segment _indexed="1" _id="15" _ref_sofa="3" begin="0" end="107" id="SIMPLE_SEGMENT"/>
        <org.apache.ctakes.typesystem.type.textspan.Sentence _indexed="1" _id="21" _ref_sofa="3" begin="0" end="63" sentenceNumber="0"/>

Hope this helps :-)

Marietta answered 7/10, 2015 at 3:38 Comment(5)
Thanks for the steps, but I'm getting this when I run the script "Error: Could not find or load main class org.apache.uima.tools.cpm.CpmFrame" Note: I'm using cygwin on windows 10Corking
@MokhtarAshour Could you try this: java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -classpath $CTAKES_HOME/lib/*;$CTAKES_HOME/desc/;$CTAKES_HOME/resources/ -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g org.apache.uima.examples.cpe.SimpleRunCPE $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext_test.xmlMarietta
I tried it, now it gives me "Error: Could not find or load main class org.apache.uima.examples.cpe.SimpleRunCPE". Do you think I should try it on linux or something?Corking
@MokhtarAshour It looks like the problem of classpath (cannot find the correct class file). Could you try the following: java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -classpath lib/*;desc/;resources/ -Dlog4j.configuration=file:config/log4j.xml -Xms2g -Xmx3g org.apache.uima.examples.cpe.SimpleRunCPE desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext.xmlMarietta
Well, I ignored the "runctakesCPE_CLI.sh" and edited the "runctakesCPE.bat" file to use the "org.apache.uima.examples.cpe.SimpleRunCPE" class and it works now. A related question, what are the possible output formats can I get from cTakes (ex: xml, json, ...) and how to get them. ThanksCorking
C
4

java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -cp $CTAKES_HOME/lib/*;$CTAKES_HOME/desc/;$CTAKES_HOME/resources‌​/ - Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g to_replace $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_p‌​rocessing_engine/tes‌​t_plaintext_test.xml

replace "to_replace" with either

org.apache.ctakes.ytex.tools.RunCPE or org.apache.ctakes.core.cpe.CmdLineCpeRunner

Crackbrained answered 13/6, 2017 at 8:32 Comment(2)
after following @Tsung-Ting Kuo's steps had the error others were having, "Error: Could not find or load main class org.apache.uima.examples.cpe.SimpleRunCPE", replaced it with org.apache.ctakes.ytex.tools.RunCPE and it works well!Syllabus
Was anyone able to run SideEffectSentenceCPE.xml with this approach?Longley

© 2022 - 2024 — McMap. All rights reserved.