How to evaluate a search/retrieval engine using trec_eval?
Asked Answered
P

1

16

Is there any body who has used TREC_EVAL? I need a "Trec_EVAL for dummies".

I'm trying to evaluate a few search engines to compare parameters like Recall-Precision, ranking quality, etc for my thesis work. I can not find how to use TREC_EVAL to send queries to the search engine and get a result file which can be used with TREC_EVAL.

Pegg answered 25/11, 2010 at 10:3 Comment(2)
are you still interested in this topic?Ransell
I have a related doubt, how to handle non-binary relevance labels?Utrillo
I
22

Basically, for trec_eval you need a (human generated) ground truth. That has to be in a special format:

query-number 0 document-id relevance

Given a collection like 101Categories (wikipedia entry) that would be something like

Q1046   0   PNGImages/dolphin/image_0041.png    0
Q1046   0   PNGImages/airplanes/image_0671.png  128
Q1046   0   PNGImages/crab/image_0048.png   0

The query-number identifies therefore a query (e.g. a picture from a certain category to find similiar ones). The results from your search engine has then to be transformed to look like

query-number    Q0  document-id rank    score   Exp

or in reality

Q1046   0   PNGImages/airplanes/image_0671.png  1   1   srfiletop10
Q1046   0   PNGImages/airplanes/image_0489.png  2   0.974935    srfiletop10
Q1046   0   PNGImages/airplanes/image_0686.png  3   0.974023    srfiletop10

as described here. You might have to adjust the path names for the "document-id". Then you can calculate the standard metrics trec_eval groundtrouth.qrel results. trec_eval --help should give you some ideas to choose the right parameters for using the measurements needed for your thesis.

trec_eval does not send any queries, you have to prepare them yourself. trec_eval does only the analysis given a ground trouth and your results.

Some basic information can be found here and here.

Inflationary answered 17/11, 2011 at 22:35 Comment(7)
Hi @mbx, How did you calculate the numbers under the 'score' column above? (it says: 1, 0.974935, 0.974023). I've read that they represent the degrees between the row's result doc and the correct relevant doc, but I can't find how one would arrive at those numbers (except for '1'- which I assume indicates 100% accuracy).Lustrum
@NoonTime iirc the first number is the position in the output (of topX) and the second is the ranking of the answer "how close does this output get if your input is 1" - so it completely depends on the algorithm you want to measure.Inflationary
ok thanks @mbx, but mathematically, how did you get that 0.974935 number? I know it's derived from the {last_position - 1}, are you dividing that by the total number of retrieved results and using that fraction? Like if you had 100 results, so the second row's (second result's) score would be (100-1)/100 so .99 ?Lustrum
@NoonTime to have an exact answer I'd have to recover my gitosis to look into my scripts for generating the trac_eval input. But it should depend on the data and its rating according to your metric. Consider color values in RGB. If your db contains black 000 red F00 yellow FF0 green 0F0 and white FFF and you value each color channel the same (you shouldn't, but for simplicity) searching for nearest 4 matches of white FFF should give you whiteFFF 1 1, yellowFF0 2 0.66, redF00 3 0.33, green 4 0.33. Your algo even could swap green and red as they'd have the same distance in this metric.Inflationary
@Inflationary 10.2452/551-AH Q0 H-810631-57604S3 1 543.528 Exp this is what RetEval command generates for me. (it's one of 1000 line of output file). when I run trec_eval to comparing it's give me an error with this message: Segmentation fault (core dumped). what can I do now to fix this problem?Octagon
@SaeedZhiany A segfault indicates a bug in trec_eval, it should work without core dumping on any given input, even if wrong/unexpected. You should have seen a parse error instead.Inflationary
@Inflationary I make and make install trec_eval with ubuntu 17.10 and gcc 7.2.0. is it ok?Octagon

© 2022 - 2024 — McMap. All rights reserved.