Highlighting Text in java
Asked Answered
B

3

6

We are developing a plagiarism detection framework. In there i have to highlight the possible plagiarized phrases in the document. The document gets preprocessed with stop word removal, stemming and number removal first. So the highlighting gets difficult with the preprocessed token As and example:

Orginal Text: "Extreme programming is one approach of agile software development which emphasizes on frequent releases in short development cycles which are called time boxes. This result in reducing the costs spend for changes, by having multiple short development cycles, rather than one long one. Extreme programming includes pair-wise programming (for code review, unit testing). Also it avoids implementing features which are not included in the current time box, so the schedule creep can be minimized. "

phrase want to highlight: Extreme programming includes pair-wise programming

preprocessed token : Extrem program pair-wise program

Is there anyway I can highlight the preprocessed token in the original document????

Thanx

Bernardina answered 30/6, 2011 at 4:58 Comment(9)
@Bernardina are you not satisfied with #6360517 , then your accepted rate is reason :-)Claymore
... what was the reason for adding the swing and jtextarea tags?Retroflexion
@Andreas_D Nuwan uses JTextArea in his program. Although, he doesn't point to it in this question. See his other questions.Exum
It's not clear what final result do you want to get. Please, add modified original text with words, which you want to highlight, formatted as bold.Exum
@MackerTim: I have highlighted the text (which are in bold) .. thanx :)Bernardina
So. You know how to find the part of the text to highlight. The only thing you need is how to make it highlighted in swing JTextArea. Am I right?Exum
@MockerTim: No ...once the text get preprocessed all the stop words and the stems will be removed. So as a result I get phrases like "Extrem program pair-wise program" instead of the original text "Extreme programming includes pair-wise programming".. So I want to highlight this preprocessed phrase mapped original text in the original document.. Hope u undderstand the problem.. :)Bernardina
Sorry, nope! I can't get it. I simply don't understand what for this preprocessing is done. Can you give me some links to the back theory of your task? I really want to understand this.Exum
This preprocessing is done as a part of our plagiarism detection framework. by preprocessing we remove all the unnecessary words so that we can easily detect the paraphrasing plagiarism...Bernardina
E
4

You'd better use JTextPane or JEditorPane, instead of JTextArea.

A text area is a "plain" text component, which means taht although it can display text in any font, all of the text is in the same font.

So, JTextArea is not a convenient component to make any text formatting.

On the contrary, using JTextPane or JEditorPane, it's quite easy to change style (highlight) of any part of loaded text.

See How to Use Editor Panes and Text Panes for details.

Update:

The following code highlights the desired part of your text. It's not exectly what you want. It simply finds the exact phrase in the text.

But I hope that if you apply your algorithms, you can easily modify it to fit your needs.

import java.lang.reflect.InvocationTargetException;
import javax.swing.*;
import javax.swing.text.*;
import java.awt.*;

public class LineHighlightPainter {

    String revisedText = "Extreme programming is one approach "
            + "of agile software development which emphasizes on frequent"
            + " releases in short development cycles which are called "
            + "time boxes. This result in reducing the costs spend for "
            + "changes, by having multiple short development cycles, "
            + "rather than one long one. Extreme programming includes "
            + "pair-wise programming (for code review, unit testing). "
            + "Also it avoids implementing features which are not included "
            + "in the current time box, so the schedule creep can be minimized. ";
    String token = "Extreme programming includes pair-wise programming";

    public static void main(String args[]) {
        try {
            SwingUtilities.invokeAndWait(new Runnable() {

                public void run() {
                    new LineHighlightPainter().createAndShowGUI();
                }
            });
        } catch (InterruptedException ex) {
            // ignore
        } catch (InvocationTargetException ex) {
            // ignore
        }
    }

    public void createAndShowGUI() {
        JFrame frame = new JFrame("LineHighlightPainter demo");
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

        JTextArea area = new JTextArea(9, 45);
        area.setLineWrap(true);
        area.setWrapStyleWord(true);
        area.setText(revisedText);

        // Highlighting part of the text in the instance of JTextArea
        // based on token.
        highlight(area, token);

        frame.getContentPane().add(new JScrollPane(area), BorderLayout.CENTER);
        frame.pack();
        frame.setVisible(true);
    }

    // Creates highlights around all occurrences of pattern in textComp
    public void highlight(JTextComponent textComp, String pattern) {
        // First remove all old highlights
        removeHighlights(textComp);

        try {
            Highlighter hilite = textComp.getHighlighter();
            Document doc = textComp.getDocument();
            String text = doc.getText(0, doc.getLength());

            int pos = 0;
            // Search for pattern
            while ((pos = text.indexOf(pattern, pos)) >= 0) {
                // Create highlighter using private painter and apply around pattern
                hilite.addHighlight(pos, pos + pattern.length(), myHighlightPainter);
                pos += pattern.length();
            }

        } catch (BadLocationException e) {
        }
    }

    // Removes only our private highlights
    public void removeHighlights(JTextComponent textComp) {
        Highlighter hilite = textComp.getHighlighter();
        Highlighter.Highlight[] hilites = hilite.getHighlights();

        for (int i = 0; i < hilites.length; i++) {
            if (hilites[i].getPainter() instanceof MyHighlightPainter) {
                hilite.removeHighlight(hilites[i]);
            }
        }
    }
    // An instance of the private subclass of the default highlight painter
    Highlighter.HighlightPainter myHighlightPainter = new MyHighlightPainter(Color.red);

    // A private subclass of the default highlight painter
    class MyHighlightPainter
            extends DefaultHighlighter.DefaultHighlightPainter {

        public MyHighlightPainter(Color color) {
            super(color);
        }
    }
}

This example is based on Highlighting Words in a JTextComponent.

Exum answered 30/6, 2011 at 8:1 Comment(0)
R
1

From a technical point of view: You can either choose or develop a markup language and add annotations or tags to the original document. Or you want to create a second file that records all potential plagiarisms.

With markup, your text could look like this:

[...] rather than one long one. <plag ref="1234">Extreme programming 
includes pair-wise programming</plag> (for code review, unit testing). [...]

(with ref referencing to some metadata record that describes the original)

Retroflexion answered 30/6, 2011 at 5:9 Comment(0)
M
1

You could use java.text.AttributedString to annotate the preprocessed tokens in the original document. Then apply TextAttributes to the relevant ones (which whould take effect in the original document.

Metachromatism answered 30/6, 2011 at 6:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.