Download e-book for iPad: A Theory of Indexing by Gerard Salton
By Gerard Salton
Offers a concept of indexing in a position to score index phrases, or topic identifiers in lowering order of significance. This results in the alternative of excellent rfile representations, and likewise debts for the function of words and of glossary periods within the indexing technique.
This research is standard of theoretical paintings in automated details association and retrieval, in that thoughts are used from arithmetic, desktop technology, and linguistics. an entire concept of details retrieval may possibly emerge from a suitable mix of those 3 disciplines.
Read Online or Download A Theory of Indexing PDF
Best probability books
Probabilists and fuzzy fanatics are inclined to disagree approximately which philosophy is healthier they usually infrequently interact. hence, textbooks often recommend just one of those equipment for challenge fixing, yet now not either. This e-book, with contributions from 15 specialists in likelihood and fuzzy common sense, is an exception.
An creation to likelihood on the undergraduate levelChance and randomness are encountered every day. Authored through a hugely certified professor within the box, chance: With functions and R delves into the theories and functions necessary to acquiring a radical realizing of likelihood.
- Stochastic variational approach to quantum-mechanical few-body problems
- Introduction to Probability Theory
- Probability and Statistics for Engineers and Scientists 3e Solutions
- What are the odds?: chance in everyday life
- Stochastic Optimization Models in Finance
- Probability Measures on Groups X
Extra info for A Theory of Indexing
D. Information value experiments. The experiments dealing with the use of information values are covered separately, because the methodology must necessarily be different in this case from that used earlier. In particular, since the generation of information values depends on a number of user-system interactions involving the processing of user queries against the available document collections, it is necessary to break the query set into two parts: a set of test queries must first be used for the generation and modification of term weights by means of interactive query processing; a new set of queries, not previously used, can then serve for evaluation purposes.
Since such a relatively small deletion percentage does not lead to substantial losses in performance for any collection, and may in fact produce considerable improvements, the ten percent deletion percentage may be productive in all environments. It may be useful, as a final exercise, to determine whether a clear-cut policy is available for choosing among various significance rankings for term deletion purposes. In particular, the discrimination value rankings can be compared with the inverse document frequency rankings previously examined.
Whereas no clear correlation was found to exist between the S/N ratings and the document or collection frequencies of the corresponding terms, a direct relation appears to exist for the discrimination value rankings. As the discrimination values decrease from good to average to poor, the document and collection frequencies of the terms go from average, to low, and finally to quite high. This correspondence is used as a basis for a theory of indexing in the last section of this study. In summary, a study of the frequency distributions of the terms ranked according to a number of different measures of term significance reveals the following characteristics: (a) When the terms are ranked in decreasing order of collection frequency F k , or document frequency Bk, the best terms are those with universal occurrence A THEORY OF INDEXING 23 characteristics; such terms may help in producing high recall output, but the retrieval results will certainly not be sufficiently precise for most purposes.
A Theory of Indexing by Gerard Salton