Relevance ranking in Articles+ occurs according to an algorithm that focuses on two scores: dynamic rank and static rank. The scores received in the below areas are summed up for each citation in order to get a final relevance score, which in turn effects its ranking in the results.
Dynamic Rank: search terms are analyzed according to:
- Proximity - If words are closer together, the record is more relevant, phrase gets a boost.
- Term frequency - the more often a word appears in a record, the more relevant the record.
- Field weighting - Title, Subtitle and subject are the highest weighted fields. The Author and Abstract fields are weighted lower than these, but higher than other metadata fields. The FullText field is weighted the lowest.
- Known Item Search- records matching a combination of two or more fields (such as title, author, pub title, year) are boosted.
- Document size - smaller documents can be demoted.
- Field Length Normalization - the more exact a query term matches the content in a field of a document, the higher the ranking.
- Inverse document frequency - If a term appears less often across the whole database, but is in a record, that record is more relevant.
- Term stemming - plurals are matched, however an exact match of a word is ranked higher than variation of the word.
- Language Processing - relevance is influenced by the interface language the user selects.
- Full-text matching - The full text is searched along with the other metadata fields. An increase in relevancy is applied to those items for which the search terms appear in the full text within 200 words of each other.
- Stopword processing - common words are ignored, but special consideration is given to multiple stopwords -- to be or not to be
- Known item - if a citation or excerpt is pasted into search, that citation and works that cite that reference boosted.
- Synonym mapping & expansion - example theater is automatically mapped to theater, heart attack is expanded to search myocardial infarction.
Static Rank: characteristics of the citation:
- Content type - some content types are boosted over others, like books and journal articles, over newspapers articles and book reviews.
- Publication date - generally, items with newer publication dates are boosted over older items.
- Scholarly or Peer Review - scholarly content is boosted over non-scholarly content. As of late 2018, there is a boost based on the SCImago Journal & Country Rank List..
- Citation counts - highly cited items are boosted. Web of Science citation counts, and other sources, are used to boost relevance.
- Content Size - since longer works are not necessarily more relevant, content size is taken into consideration.
- Anonymous author - Anonymous author items are demoted. Anonymous items may include editor's notes, letter's to the editor, obituaries, and other non-primary articles in journals.
- Local collections - content in local collections, such as an institutional repository, is boosted (Penn will be adding local content in the future).