This is a demonstration of my imlementation of the ranking algorithm
presented in the paper
Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings
[https://arxiv.org/abs/1806.05085]
The algorithm also described in This blog.
The paper's core algorithm uses a probabilistic method to estimate the final ranking
of a set of items evaluated by different reviewers.
It leverages the relative rankings provided by the reviewers
(i.e., if a reviewer scores item A higher than item B, A is ranked higher than B),
as well as the raw scores.
The method works as follows:
It first establishes a baseline ranking based solely on the reviewers' relative preferences.
Then, for items that do not conflict with these relative rankings,
it adjusts the order by flipping items with a probability x.
This probability x is higher when the score difference between the two items being considered
for a flip is large.
Means, an item with a significantly higher score is more likely to be ranked higher.
Mathematically, the probability of flipping is defined as x = 1+w(score_diff)⁄2,
where score_diff is the absolute difference in scores between the two items,
and w is a strictly increasing function that maps score differences to the range [0,1].