Evaluation of Indexing Systems

1) How do you distinguish between the efficiency and effectiveness of an IR system?
By effectiveness we mean the level up to which the given IR system attains its stated objectives, whereas efficiency refers to how economically the given IR system is achieving its stated objectives. In order to know the effectiveness of the system we measure the extent to which the given IR system can retrieve relevant documents while withholding non-relevant documents. But, we measure only the cost factors like response time, user effort, cost involved per search, etc. for determination of the efficiency of the system.

2) What are the different levels of evaluation?
An information retrieval system can be evaluated at the following levels:
a) System effectiveness;
b) Cost effectiveness; and
c) Cost-benefit evaluation

3) What are the different criteria used in evaluating an IR system?
Different criteria used in evaluating an IR system are :
a) Recall;
b) Precision;
c) Response time;
d) User effort;
e) Form of presentation; and
f) Collection coverage.

4) How recall and precision ratios help in measuring the performance of an indexing system?
The term Recall relates to the ability of the system to retrieve relevant documents. Recall ratio is nothing but the proportion of relevant documents retrieved. This ratio helps in measuring the extent to which the retrieval of relevant documents occurs in a given system. The formula for the calculation
of Recall ratio is:
Relevant retrieval/Total relevant*100
On the other hand, precision ratio is the ratio between the relevant retrieved
and total retrieved documents. It measures how precisely an indexing system
functions. The formula for the calculation of the precision ratio is:
Relevant retrieval/Total retrieval*100

5) What are the major steps involved in the evaluation of the performance of an IR system?
The following major steps are involved in the evaluation of the performance of an IR system:
a) Defining the scope of evaluation;
b) Designing the evaluation programme;
c) Execution of the evaluation programme;
d) Analysis and interpretation of results; and
e) Modifying the system in the light of the evaluation results.

6) What were the objectives and input elements of the Cranfield 1 project?
The objective of the Cranfield 1 project was to investigate the comparative efficiency of four indexing systems: UDC, Faceted classification, Alphabetical subject heading list and Uniterm indexing system. Input elements for this project consist of :
a) Four indexing systems as stated above;
b) Three indexers with varied backgrounds;
c) Variable indexing time periods;
d) 100 micro documents on the general field of aeronautics and the specialized
field of high-speed aerodynamics;
e) 18,000 indexed items (100 micro documents × 3 indexers × 4 indexing systems × 5 time periods × 3 runs); and
f) Descriptors.

7) What was the scale used in MEDLARS test for calculating precision ratio?
In the MEDLARS test, the user was asked to characterize each retrieved item in a randomly selected sample output of 25 to 30 items by using the following scale: H1 – of major value; H2 – of minor value; W1 – of no value; W2 – of value unknown (because in foreign language). From this assessment, precision ratios were calculated: If L items were in the sample output, the overall precision ratio was 100(H1 + H2)/L; the ‘major value’ precision ratio was 100H1/L.

8) What were the methods adopted for automatic content analysis of documents in the SMART retrieval experiment?
 A number of methods was adopted for automatic content analysis of documents in the SMART retrieval experiment. Some notable examples include:
a) Word suffix cut-off methods,
b) Thesaurus look-up procedures,
c) Phrase generation methods,
d) Statistical term associations,
e) Hierarchical term expansion and so on.

9) Mention the different information retrieval strategies that were tested in the TREC series of experiments.
A wide range of information retrieval strategies was tested in different TREC experiments—that is, from TREC 1 in 1992 to TREC 12 in 2003. Some notable examples are:
a) Boolean search;
b) Statistical and probabilistic indexing and term weighting strategies;
c) Passage or paragraph retrieval;
d) Combining the results of more than one search;
e) Retrieval based on prior relevance assessments;
f) Natural language-based and statistically based phrase indexing;
g) Query expansion and query reduction;
h) String and concept-based searching;
i) Dictionary-based searching;
j) Question-answering; and
k) Content-based multimedia retrieval.


Clumps : A group of words that tend to ‘adhere’. Various criteria can be used to define the boundaries of a clump. In general, each word is associated with the others as a group above some given threshold. The formation of word classes is likely to be based on more than simple concurrence. Instead, two words are considered to be related if they co-occur more frequently than one expects them to co-occur probabilistically.
Clustering : A group formation mechanism that combines a number of related or associated terms. Four kinds of group may be found in clustering: strings, stars,cliques and clumps. String occurs when term A is strongly associated with term B, term B with term C, and so on. In practice, strings tend to
form loops fairly quickly: term A->B->C->D->E- >A. Stars are found when one term is equally
strongly related each to other. Cliques occur when a set of terms are strongly related each to the
other. Clumps are weaker form of clique, in which a term is related to one or more of the others in
the clump, but not necessarily to all.
Cost effectiveness : It is the evaluation in terms of how to satisfy user requirements in the most efficient and economical way. A cost effectiveness evaluation relates measures of effectiveness to measure of cost. It takes into considerations unit cost per relevant citation retrieval, per new relevant citation retrieval and per relevant document retrieval.
Cost-benefit : It is the evaluation of the worth of the system by relating the cost of providing the service to the benefits of having the service available.
Elimination factor : The proportion of non-retrieved items (both relevant and non-relevant) over the total items in the collection.
Evaluation : It refers to the act of measuring the performance of the system in terms of its retrieval efficiency (ease of approach, speed and accuracy) to the users, and its internal operating efficiency, cost effectiveness and cost benefit to the managers of the system in order to ascertain the level of its
performance or its value.
Exhaustivity : It is the measure of the extent to which all the distinct topics discussed in the documents are considered.
Failure analysis : A diagnostic procedure, which involves the analysis of recall and precision failures encountered to the principal subsystems of an IR system like indexing, searching, index language, and user-system interface. It consists of a careful examination of the documents involved, how they were indexed, the request posed to the system, the search strategy used, and the complete
assessment forms of the requesters.
Fallout ratio : It is the proportion of the irrelevant items in the file, which are retrieved by a request. The fall out thus represents the fractional recall of irrelevant items. It has been referred to as the
conditional probability of ‘false drops’. It is also called ‘discard’.
Generality number : It expresses the number of items relevant to a particular request over the total number of items in the collection. The higher the generality number the greater the density of relevant items to total collection, and the greater this density the easier the search tends to be.
Noise factor : The proportion of retrieved items those are not relevant. This factor is considered as the complement of the precision ratio.
Novelty ratio : The proportion of relevant documents retrieved in a search that are new to the requester, that is, brought to his attention for the first time by the search.
Omission factor : The proportion of non-relevant items retrieved over the total number of non-retrieved items in the collection.
Performance Curve : A performance curve is nothing but a line representing graphically a series of variable performance points in terms of recall and precision ratios affected by search strategy at varying levels, from very broad to very specific. This curve expresses the inverse relationship between recall and precision.
Resolution factor : The proportion of total items retrieved over a total number of items in the collection.
Semantic factoring : Involves the analysis of a word into largest number of ideas implied. During the 1950s a team at Case Western Reserve University worked on a system of analysis known as semantic factoring. The objective was to break down every concept into a set of fundamental concepts called semantic factors. Because of their fundamental nature, there would only be a limited number of these factors. For example, word such as ‘urinalysis’ factors into ‘urine’ and ‘analysis’, whereas ‘magneto-hydrodynamics’ factors into ‘fluid flow’, ‘magnetism’ and ‘electrical conductivity’.
Specificity : It is measure of the degree of preciseness of the subject to express the thought content of the document.
System effectiveness : It is the evaluation of the system performance in terms of degree to which it meets the users’ requirements. It considers the users’ satisfaction and is measured by determining the utility to the users of the documents retrieved in answer to a user query. It takes into consideration cost, time and quality criteria.
Usability : The value of references retrieved in terms of such factors as their reliability, comprehensibility, currency, etc.
User effort : It refers to the intellectual as well as physical effort required from the user in obtaining answers to the search requests. The effort is measured by the amount of time user spends in conducting the search or negotiating his enquiry with the system.

Source: IGNOU Study Material