Comparative Analysis of IDF Methods to Determine Word Relevance in Web Document
Inverse document frequency (IDF) is one of the most useful and
widely used concepts in information retrieval. When it is used in
combination with the term frequency (TF), the result is a very
effective term weighting scheme (TF-IDF) that has been applied
in information retrieval to determine the weight of the terms.
Terms with high TF-IDF values imply a strong relationship with
the document they appear in. If that term appears in a query, the
document can be of most interest to the user. Term frequency is
computed as the number of occurrences of a term in a document
whereas there are various methods for measuring the value of
IDF; one of the most famous derivations follows from the
Robertson-Spark Jones relevance weight. Besides the most
famous method for computation of IDF, there are also various
methods for computation of inverse document frequency that
affects the relevance of a document. In this paper, we have
discussed and compared different derivations of inverse
document frequency to measure the weight of terms.
Keywords: Information Retrieval, Term-Frequency, IDF, Vector space model.
Download Full-Text
ABOUT THE AUTHOR
Jitendra Nath Singh
Research Scholar in Department of computer Science at Babasaheb Bhimrao Ambedkar University, Lucknow - 226025 (U.P.) India. His research interest is search engines and its performance evaluation, and web technology.
Jitendra Nath Singh
Research Scholar in Department of computer Science at Babasaheb Bhimrao Ambedkar University, Lucknow - 226025 (U.P.) India. His research interest is search engines and its performance evaluation, and web technology.