Information Retrieval Processes and Techniques

1) Mention some synonyms of information retrieval?
Information retrieval is also called by: information storage and retrieval, information organisation and retrieval, information processing and retrieval, text retrieval, information representation and retrieval, and information access.
2) How does the original connotation of IR differ from that of modern information retrieval?
Originally the term information retrieval was used to mean retrieval of bibliographic information from stored document databases. Since they were primarily dealing with text databases, they were also called text retrieval systems. Modern information retrieval systems deal with multimedia information comprising text, audio, images and video. While many features of conventional text retrieval systems are equally applicable to multimedia information retrieval, the specific nature of
audio, image and video information have called for the development of many new tools and techniques for information retrieval. Modern information retrieval systems deal with storage, organisation and access to text, as well as multimedia information resources.

3) What is a database? Give examples of three bibliographic databases.
A database is an organised collection of related sets of data that can be accessed by more than one user by simple means and can be searched to reveal those that touch upon a particular need. Examples of bibliographic databases in clued: Medline, ERIC, BIOSIS, LISA, ISA, etc.

4) Discuss two major differences between a DBMS and an IRS?
A database management system stores and retrieves discrete data elements that are structured, as opposed to a typical information retrieval system that is designed to deal with unstructured data e.g., the full texts of documents. In a typical database management search, we expect to retrieve discrete data, e.g. the price of an item, date of birth of an employee, and so on, whereas in information retrieval search we retrieve an entire document or part of it containing the information required by
the user.

5) What is an inverted file? What role does it play in an information retrieval process?
An inverted file is an index file that contains all the index terms, drawn automatically from the document records according to the indexing technique adopted for the purpose. Each index term in the inverted file is associated with a pointer that specifies position(s) in the database where that term appears. Therefore, in an inverted file system, the searcher first consults the index file, which then refers to the position in the main text database where the desired record appears.

6) What is the difference between a single key and a multi-key query?
In a single key search, the value of a single search key (say the name of the author) is used as the retrieval criterion, whereas in a multiple key search a number of search keys (say the name of the author, subject name, date of publication, and so on). An example of a multi-key search could be: ‘papers written by Salton on information retrieval systems between 1980 and 1990’. For single key searches, the whole file can be maintained in an order according to the value of the given
single set of keys. In a telephone directory, for example, users search through the names of subscribers and therefore the names of subscribers are arranged in alphabetical order. File access in multi-key searches is complicated by the fact that it is not possible to order the file simultaneously in accordance with the values of the different search keys.

7) What is the role of a vocabulary control tool in an information retrieval process?
There are two major objectives of vocabulary control in an information retrieval environment:

  • to promote the consistent representation of subject matter by indexers and searchers, thereby avoiding the dispersion of related materials. 
  • to facilitate the conduct of a comprehensive search on some topic by linking together terms whose meanings are related.

8) What is the difference between a subject heading list and a thesaurus from the perspectives of information retrieval?
Subject heading lists were initially developed to prepare entries/headings in a subject catalogue that could replicate the classified arrangement of document records. Therefore, they include rather broader subject terms or headings, and they are used primarily to assign subject headings for bibliographic items in a library catalogue or index. On the other hand thesauri have been developed on specific subject fields with a view to bringing together the various representations of terms
(synonyms, spelling variants, homonyms, etc.) along with an indication of a mapping of that term in the universe of knowledge by indicating the broader (super-ordinate), narrower (subordinate), and related (coordinate and collateral) terms. Thesauri are now used for indexing and searching electronic databases.

9) What is a user interface?
A user interface is the means by which information is transferred between the user and the computer and vice-versa. It is an important component of an information retrieval system since it connects the users to the organized information resources.

10) What are the two major functions of the user interface in an information retrieval system?
User interfaces basically perform two major functions: (1) they allow users to search or browse an information collection, and (2) they display the results of a search, and often allows users to perform further tasks, like sorting, saving and/or printing the search results, modifying the search query, etc.

11) What is meant by online information retrieval?
By online information retrieval systems we mean those that have been designed to provide access to remote database(s) to a variety of users. Such services appeared about four decades ago, and are available mostly on a commercial basis, and there are a number of vendors that handle this sort of service. A prominent example of online search service provider is Dialog. However, in today’s world
many people use the term online information retrieval to mean retrieval of information from remote databases (and such services are not necessarily restricted to those that are provided by vendors like Dialog).

12) How does an OPAC search differ from an online search?
An OPAC search is conducted on a given library’s catalogue. Therefore the search results are expected to be in the collection of the given library. An online search on the hand does not have any such restrictions, i.e., the search output is not restricted to any library’s collection. The search features in an OPAC environment are much more limited compared to an online search environment.
An OPAC search is conducted on the index file created by selecting data from some selected fields in a library’s catalogue like author, subject heading, keywords, title, ISBN, etc. However, in case of an online search the search terms may come from the bibliographic details as well as the abstract or the full text of the documents.

13) What is a web search engine and what role does it play in the context of the web?
Search engines are the most commonly used tools for finding information on the web. There are three main components of a search engine: (1) the spider, i.e., the program that automatically collects information about millions of pages on the web, (2) the index that stores information collected by the spider on the various web pages, and (3) the search engine software and interface with which the users interact to conduct a web search.

14) How does the traditional online information retrieval differ from web information retrieval?
Traditional online information retrieval differs from web information retrieval in a number of way. For example, online information retrieval is proprietary; one has to register and pay for the service, while web information retrieval through web search engines is available to everyone for free. However, online search services guarantee quality information because they follow specific selection criteria for selection of input resources, whereas web search engines do not have such a quality control, and therefore the search output may not always be of high quality. Web search engines are continuously evolving and they offer many sophisticated retrieval facilities that are not available in online search services.

15) What is meant by intelligent information retrieval?
There are several definitions of intelligent information and one can choose anyone:
1) A computer system with inferential capabilities such that it can use prior knowledge to establish a connection between a user’s request and a candidate set of relevant documents.
2) A system that carries out intelligent retrieval. An intelligent retrieval is defined as the use, by a computer system, of the stored knowledge of its world of documents, users, etc. and of information about the user and his/her problem to infer which documents would enable that particular user to resolve or manage his/her problem in a better way.

16) How does intelligent information retrieval differ from a conventional information retrieval?
A conventional information retrieval system is based on matching the search and the index terms. So, it retrieves items that match the search terms. Since user search terms and index terms may differ in a number of ways, information conventional information retrieval systems often fail to produce the desired results – they may produce unwanted materials, or may fail to produce the relevant materials.
Intelligent information retrieval systems use intelligent techniques such as those used in artificial intelligence and expert systems field, as well as natural language processing techniques to overcome the problems of conventional information retrieval systems that suffer from term matching. Such systems aim to use expert techniques, in the line of the actions of human experts, in finding information relevant to a user need. As a result they are more resource intensive, they are very expensive to build and maintain, and often have applications in a limited subject domain.

KEYWORDS
Data Mining : The process of analysing data to identify patterns or relationships. For example, a data
mining program might analyze millions of document supply orders to determine trends among top-requesting customers, such as their likelihood to obtain again, or their likelihood to switch to a different vendor.
Database : An organised collection of related sets of data that can be accessed by more than one user
by simple means and can be searched to reveal those that touch upon a particular need.
DLI : Digital Library Initiative.
Expert System : A computer system that embodies knowledge about a specific problem domain and can solve problems from the domain using its knowledge with a degree of expertise that is comparable to that of a human expert.
Intelligent Information : A system that carries out intelligent retrieval, i.e., the use, Retrieval System by a computer system, of the stored knowledge of its world of documents, users, etc. and of information about the user and his/her problem to infer which documents would enable that
particular user to resolve or manage his/her problem in a better way.
NDLTD : The Networked Digital Library of Theses and Dissertations (NDLTD) is an international
organisation dedicated to promotion and dissemination of electronic theses and dissertations.
Search Engine : A tool that is used to conduct search in an information retrieval environment. The term is typically used in the context of web information retrieval where a web search engine is a
program that allows the user to enter search terms – keywords and/or phrases – that are run against a database containing information on the web pages collected automatically by programs called spiders. At the end of a search session, the search engine retrieves web pages from its database that match the search terms entered by the searcher.
User Interface : The contact point or the means by which information is transferred between the user and the computer and vice-versa.

Source: IGNOU Study Material