-
"In view of current concerns, the problem is not so much that excessive publications are made as they have far exceeded our present capacity to make real use of them (...) Professionally our methods of transmitting and reviewing the results of the scientific research are several generations old and, for now, totally inadequate in its purpose ...
-
-
-
He proposed to use words as units of indexing for documents and to measure the superposition of words as a criterion of recovery.
-
The Cranfield Institute of Technology and other associated entities, tests began that marked the beginning of the recovery of information as an empirical discipline. These tests made a strong influence on the evolution of the discipline. With them, an evaluation methodology was developed that is still in use by the IR systems nowadays.
-
-
The group (of Harvard and Cornell universities), managed by Salton, produced numerous technical reports, establishing ideas and concepts still important research areas today. Areas as the formalization of algorithms to classify documents about a query, an approach in which documents and queries were visualized as vectors within an n-dimensional space, and later, the similarity between a document and the query vector, to be measured through the cosine of the angle between the two vectors.
-
One of the key developments of this period was the weighting of the frequency of terms (TF) of Luhn (based on the occurrence of words within a document), complemented by the work of Sparck Jones on the occurrence of words in the documents of a collection. Likewise, Salton synthesized the results of his group’s work on vectors to produce the vector space model.
-
-
An alternative means of modeling IR systems involved expanding the idea of Maron et al. [86] to use probability theory. Robertson defined the principle of probability classification, which determined how to classify best documents based on probabilistic measures concerning the defined evaluation measures.
-
Creation of new stemming algorithms, the process of matching words to their lexical variants, which, although they were known since 1960, had an important improvement with the contribution of Porter and other authors, which are still used today.
-
-
-
An initiative of Voorhees and Harman, as an annual exercise in which numerous international research groups collaborate to build test datasets larger than those that existed before. With the large collections of text available under TREC, many old techniques were modified, new ones were developed, and are still being developed for effective recovery.
-
-
-
-
-