TopKSignal: A convex optimization tool for signal reconstruction from multiple ranked lists

Package installation

For the CRAN version please use:

#install.packages("TopKSignal")
library(TopKSignal)

A development version of our R-package TopKSignal is also available from GitHub. It can be installed using devtools.

# install.packages("devtools")
# library(devtools)
# install_github("pievos101/TopKSignal")
library(TopKSignal)

Introduction

The ranking of items is widely used to rate their relative quality or relevance across multiple assessments (by humans or machines). Beyond classical rank aggregation, it is of special interest to estimate the, usually unobservable, latent signals that inform a consensus ranking. Under the only assumption of independent assessments, which can be incomplete, the package offers indirect inference via convex optimization in combination with classical Bootstrap and computationally more efficient Poisson Bootstrap. Users can decide between a linear or a quadratic objective function. The input for the optimization problem is based complete pairwise comparisons of all items with respect to their rank positions (the so-called ‘full approach’). The transitivity property of rank scales can be adopted for a substantial reduction of the computational burden (the so-called ‘restricted approach’). The order relations are represented by sets of constraints. The method is adept to both \(n \gg p\) and \(n \ll p\) data problems. The implemented package provides results with less computational demand comparable to the machine learning rank centrality algorithm. The primary output is the signal estimates, the estimated signal standard errors, and the consolidated rank list. The latter is an alternative to a conventional list of aggregated ranks.

The optimization tool gurobi

Gurobi is a powerful tool for solving optimization problems. It can be used in combination with TopKSignal in order to efficiently estimate the underlying consensus signals of multiple ranked lists. Instructions about the installation of gurobi on your computer system can be found at the official gurobi webpage. There is a special license for academic purposes at no cost. As an alternative to gurobi you can use the nloptr package, an R package freely available on CRAN.

Input format

We provide in-build functions to simulate multiple ranked lists based on half-Gaussian distributions. The generate.rank.matrix() function requires the user to specify the number of items (objects), denoted by \(p\), and the number of assessors (rankers), denoted by \(n\).

#install.packages("TopKSignal")
library("TopKSignal")
set.seed(1421)
p = 8
n = 10
input <- generate.rank.matrix(p, n)
rownames(input$R.input) <- c("a","b","c","d","e","f","g","h")

The input data need to be a numerical matrix as shown below. Row names and column names are required.

input$R.input
#>   R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
#> a  4  4  4  5  4  4  3  4  7   3
#> b  8  7  7  8  7  8  5  7  4   8
#> c  1  3  2  2  1  2  2  1  2   4
#> d  2  1  1  1  2  1  1  2  1   1
#> e  6  5  3  3  3  3  6  6  6   2
#> f  3  6  6  7  5  5  7  5  5   5
#> g  5  2  5  4  6  6  4  3  3   6
#> h  7  8  8  6  8  7  8  8  8   7

Each row represents the rank of an item assigned by a specific assessor (the columns). It should be noted, that TopKSignal does not support tied ranks. Missing rank assignments are allowed.

The input rank matrix is obtained from the model

\[ X_{i,j} = \theta_i + Z_{i,j}, \\ \\ \\ i = 1,...,p, j=1,..n,\]

where each signal parameter \(\theta\) represents a ‘true value’ in the sense of a latent variable, sometimes referred to as ‘score’ in machine learning, reflecting the signal intensity. Each of the rankings is determined by a signal plus an assessor-specific random error \(Z\).

The underlying signal is simulated as \(\theta_i \sim \mid\mathcal{N}(0,1)\mid\).

Then the true values of the objects are:

input$theta.true
#> [1] 1.13165900 0.49865423 1.46090430 1.94160472 0.90177821 0.86116828 0.96630207
#> [8] 0.01599649

Accordingly, the consensus ranks of the objects are:

rank(-input$theta.true)
#> [1] 3 7 2 1 5 6 4 8

It is assumed that each value assigned by an assessor to an item has a different noise due to uncertainty about its true value. This noise is simulated with an assessor-specific standard deviation \(\sigma\), where \(\sigma \sim \mid \mathcal{U}(0.4,0.6)\mid\).

input$sigmas
#>  [1] 0.5572544 0.5628995 0.5989554 0.4587600 0.4383926 0.5281393 0.5872520
#>  [8] 0.5946055 0.5670205 0.5331913

Then the random errors \(Z_{ij}\) are drawn from the half-normal distribution \(\mid \mathcal{N}(0,\sigma^2)\mid\).

The global noise due to all assessor assignments is available from

input$matrixNoise
#>            [,1]       [,2]       [,3]       [,4]       [,5]      [,6]
#> [1,] 0.24016661 0.25927759 0.21989205 0.01146533 0.31825863 0.5114472
#> [2,] 0.14525567 0.03154682 0.41912210 0.26111783 0.35563369 0.1583095
#> [3,] 0.76427470 0.14926301 0.45156718 0.03224348 0.56933560 0.5546087
#> [4,] 0.22059548 1.64370263 0.09772405 0.39986535 0.01129501 0.8440531
#> [5,] 0.01938969 0.03862170 0.60550112 0.41511458 0.65409480 0.7602909
#> [6,] 0.66246317 0.01244451 0.13576505 0.15279015 0.37379026 0.5963584
#> [7,] 0.13536000 0.74211771 0.13796546 0.32731549 0.24160195 0.4155777
#> [8,] 0.86383666 0.37459008 0.01659712 1.12147974 0.19877962 0.6569208
#>            [,7]      [,8]       [,9]      [,10]
#> [1,] 0.37084415 0.4440556 0.04754995 0.61075728
#> [2,] 0.77510053 0.5094751 1.05925282 0.12490883
#> [3,] 0.37632099 0.8444838 0.21873436 0.08608435
#> [4,] 0.10842937 0.3420660 1.37663294 1.38858731
#> [5,] 0.16348476 0.3247210 0.45624367 0.88700532
#> [6,] 0.07077489 0.3942307 0.55090074 0.59823992
#> [7,] 0.49555525 0.6150434 0.63211304 0.36321616
#> [8,] 0.11412096 0.5714455 0.02939719 0.60980830

By ranking \(X_{i,j} = \theta_i + Z_{i,j}\) column-wise we obtain the final rank matrix. The rank matrix is built by ranking the columns of the theta values plus their noises.

The convex optimization problem

Next, we briefly explain how the convex optimization problem is formulated. First, we define the \(\pi(i,j)\) function that returns the index of the item in position \(i\) for the assessor \(j\).

The full approach

The full approach considers the complete set of constraints comprising all order relations produced by the rankers.

Given a fixed ranker \(j\) and starting with the item ranked first, we force this item with the latent parameter \(\theta_{\pi(1,j)}\) plus error \(z_{(\pi(1,j),j)}\) to have a greater equal relation to the item ranked second with \(\theta_{\pi(2,j)}\) plus \(z_{(\pi(2,j),j)}\). Thus, the first constraint for the ranker \(j\) is \(\theta_{\pi(1,j)} + z_{(\pi(1,j),j)} - \theta_{\pi(2,j)} - z_{(\pi(2,j),j)} \geq b\), where \(b > 0\) is a scaling constant, that can be arbitrarily chosen. In the second constraint for the same ranker, we require \(\theta_{\pi(1,j)}\) plus \(z_{(\pi(1,j),j)}\) to obey a greater equal relation to the item ranked third, represented by \(\theta_{\pi(3,j)}\) plus \(z_{(\pi(3,j),j)}\). In formal notation, \(\theta_{\pi(1,j)} + z_{(\pi(1,j),j)} - \theta_{\pi(3,j)} - z_{(\pi(3,j),j)} \geq b\). Analogue calculations are carried out for all other items, moving from one ranker to the next. This procedure allows the user to infer each underlying latent signal across the rankers via convex optimization. The number of constraints is equal to \(n \times [(p-1) * p]/2\) and the number of variables is equal to \((n \times p)+p\).

The restricted approach

Because the number of constraints in the full approach is growing in quadratic time, we suggest to use the restricted method that achieves a substantially higher computational efficiency. The restricted approach considers the minimum required set of constraints derived from the full set, applying the transitivity property of rank scales. The number of variables \((n \times p)+p\) remains the same but the number of constraints is reduced from \(n \times \frac{(p-1)p}{2}\) to \(n \times (p-1)\).

The minimization term

Once the model constraints are built, two different objective functions can be applied: the first concerns the minimization of the sums of the (weighted) noise variables \(z_{i,j}\) and the second involves the minimization of the sums of the (weighted) squared noise variables \(z_{i,j}\). A convex linear optimization problem has to be solved in the first case, and a convex quadratic optimization problem in the second case.

Estimating the latent signals from multiple ranked lists

The main function that estimates the underlying signal is estimateTheta(). Parameters required are: (i) An input rank matrix as described in the previous section, (ii) the number of Bootstrap samples (we recommend at least 200), and (iii) a value for the scaling parameter \(b>0\), usually set to 0.1. The current implementation is compatible with two different solvers, gurobi and nloptr. Furthermore, four different estimation procedures are implemented, restricedQuadratic, restrictedLinear, fullQuadratic and fullLinear, which can be chosen by the type parameter. Also, two different statistical Bootstrap techniques are available, the classic.bootstrap and the poisson.bootstrap. The number of cores used for the Bootstrap procedures can be set by the nCore parameter.

#estimatedSignal <- estimateTheta(R.input = input$R.input, num.boot = 50, b = 0.1, solver = "gurobi", type = "restrictedQuadratic", bootstrap.type = "poisson.bootstrap",nCore = 1)
# For the above call Gurobi must be installed.
data(estimatedSignal)

The Bootstrap estimated signals and their standard errors can be obtained from

estimatedSignal$estimation
#>   id signal.estimate         SE
#> 1  a      0.04713784 0.37388405
#> 2  b     -1.00984648 0.14202959
#> 3  c      1.14191863 0.18207133
#> 4  d      1.66214317 0.13578035
#> 5  e     -0.12808726 0.14420550
#> 6  f     -0.45845537 0.15297076
#> 7  g     -0.07762988 0.18298899
#> 8  h     -1.17718065 0.09834366

The estimated signal-based consensus rank is

rank(-estimatedSignal$estimation$signal.estimate)
#> [1] 3 7 2 1 5 6 4 8

For each object (id) the estimated signal (theta) value and its standard error is displayed. The higher the estimated theta value, the lower is the associated rank position, i.e. towards the top of the consensus rank list.

The Bootstrap procedure also produces an estimate of the noises from the matrix of constraints which was minimized by means of convex optimization.

estimatedSignal$estimatedMatrixNoise
#>                R1          R2         R3          R4          R5           R6
#> [1,] 0.1136943085 0.082119554 0.10830642 0.071148321 0.153003627 0.1367300818
#> [2,] 0.0000000000 0.050791067 0.04431641 0.000000000 0.051089492 0.0000000000
#> [3,] 0.2053120584 0.002774818 0.04860142 0.096868929 0.205627969 0.0922747486
#> [4,] 0.0008455522 0.063259163 0.04030975 0.068734842 0.007048935 0.0559346749
#> [5,] 0.0042381509 0.038127235 0.24859919 0.300229662 0.302758066 0.2726090665
#> [6,] 0.3727390647 0.025724595 0.01940830 0.001138546 0.177830618 0.1629507364
#> [7,] 0.0785855126 0.362069603 0.03737784 0.187242959 0.000000000 0.0005081033
#> [8,] 0.1255331763 0.000000000 0.00000000 0.247695890 0.000000000 0.1234763926
#>               R7           R8         R9         R10
#> [1,] 0.252028886 0.1464245504 0.00000000 0.310335198
#> [2,] 0.299990394 0.0559681849 0.42996697 0.000000000
#> [3,] 0.082723025 0.1991909675 0.12002892 0.000000000
#> [4,] 0.055601420 0.0008529412 0.09312905 0.129815858
#> [5,] 0.007752905 0.0000000000 0.04565284 0.446468124
#> [6,] 0.002050601 0.1488350646 0.21496855 0.173715441
#> [7,] 0.165698801 0.2660637351 0.30872744 0.002048362
#> [8,] 0.000000000 0.0000000000 0.00000000 0.124786358

The results of each Bootstrap iteration are also available. Each column represents an item and each row the Bootstrap sample estimates.

estimatedSignal$allBootstraps
#>                    a          b         c        d           e           f
#> boot1  -0.0405637128 -0.8924687 1.0639163 1.834006 -0.19091497 -0.62902758
#> boot2   0.0200131696 -1.0766301 0.9426111 1.923102 -0.03658319 -0.65929995
#> boot3  -0.5633356692 -0.7666570 1.4617449 1.584822 -0.19410411 -0.38441289
#> boot4   0.5991787880 -1.0774574 1.2186615 1.459573 -0.23913931 -0.52550971
#> boot5  -0.1839442640 -0.9999477 1.2364467 1.700213 -0.15880556 -0.61584612
#> boot6  -0.1655912894 -0.9141464 1.5381918 1.500330 -0.10241875 -0.42436180
#> boot7  -0.4912113757 -0.9145834 1.2583254 1.714780 -0.25539711 -0.14193285
#> boot8   0.5352211000 -1.0820727 1.1814489 1.462837 -0.20342352 -0.29790934
#> boot9  -0.6480786623 -0.8180687 1.2508877 1.753005 -0.22448356 -0.27071006
#> boot10 -0.2378120654 -0.8555969 0.9766968 1.854232 -0.01316303 -0.43437997
#> boot11 -0.4945100528 -0.6943695 1.1130585 1.846357 -0.19089850 -0.59896117
#> boot12 -0.2516499635 -0.8781706 1.0691029 1.795569 -0.13494347 -0.41095246
#> boot13  0.5653801907 -1.2029712 0.7149099 1.703106  0.12329235 -0.31879548
#> boot14 -0.5767885294 -0.8078081 1.3940446 1.632260 -0.18050418 -0.41152380
#> boot15 -0.3152316512 -0.9797566 1.3020158 1.666471  0.08075716 -0.58332813
#> boot16 -0.0501896466 -0.9494664 0.7798534 1.930578  0.03445040 -0.39787955
#> boot17  0.4155824542 -1.0735910 1.0535569 1.671710 -0.19181001 -0.63270048
#> boot18 -0.3691262459 -1.1511876 1.3292750 1.490786 -0.30340711  0.08146349
#> boot19  0.6023274144 -1.1582846 1.0441799 1.492830 -0.27457973 -0.28137746
#> boot20  0.4925392286 -1.0466012 1.2552382 1.501219 -0.27961033 -0.66568511
#> boot21  0.5833134784 -1.1941976 1.0561940 1.580828 -0.19712374 -0.35787745
#> boot22 -0.5226944326 -0.7876260 1.1637445 1.773765 -0.10323324 -0.36816480
#> boot23  0.5268892889 -1.1742179 1.2348759 1.456737 -0.30882434 -0.33850426
#> boot24 -0.3380299898 -0.8686876 1.1797186 1.685483 -0.23031344 -0.59401534
#> boot25  0.3954636486 -1.2583653 0.8580530 1.715136 -0.04442726 -0.48431816
#> boot26  0.1690421564 -0.9847036 1.3792403 1.544061 -0.15908738 -0.60012171
#> boot27 -0.1789912169 -1.2126158 1.1247095 1.749287 -0.01086692 -0.11495608
#> boot28 -0.2395637716 -0.9608158 1.3191600 1.650072 -0.19149383 -0.39756583
#> boot29  0.2659373668 -1.0683974 1.2554387 1.608798 -0.16226235 -0.59046207
#> boot30 -0.1151953814 -0.9584500 0.9962605 1.826362 -0.01676448 -0.42372254
#> boot31 -0.0006894202 -0.8308542 1.1461506 1.734668 -0.19378318 -0.61588598
#> boot32  0.4401013989 -1.1802931 1.1084208 1.586325 -0.03406936 -0.37009587
#> boot33 -0.1072458882 -0.9071689 1.3150342 1.662290 -0.14033327 -0.52375111
#> boot34 -0.0848262235 -0.8801778 1.2823600 1.552677 -0.33687604 -0.56997761
#> boot35  0.3097052796 -1.1046143 1.2616252 1.522737 -0.27673350 -0.51817556
#> boot36  0.2537884889 -1.0797800 1.0935154 1.723129 -0.18340246 -0.62059340
#> boot37 -0.2009073377 -0.8924917 1.0017490 1.833780  0.01074752 -0.47065064
#> boot38  0.5653608241 -1.2578183 1.0211556 1.494481 -0.32869815 -0.34622872
#> boot39 -0.4691341820 -0.8557555 1.2417369 1.732334 -0.08251284 -0.45341745
#> boot40 -0.1474646206 -1.0897647 1.0167867 1.797349  0.24313750 -0.44573686
#> boot41  0.2417107487 -1.0106637 1.1596232 1.622915 -0.17574741 -0.59320556
#> boot42  0.3012683311 -1.0722310 1.3157696 1.480475 -0.27623082 -0.67423093
#> boot43  0.4296901555 -1.1843736 0.9317404 1.662379  0.10165049 -0.40039977
#> boot44  0.2739222349 -0.9662416 1.1998268 1.577360 -0.13946570 -0.55285363
#> boot45  0.3251554953 -1.0189226 1.4488929 1.407273 -0.26241961 -0.42400277
#> boot46  0.3804353426 -1.2145581 1.2105081 1.566908 -0.18691605 -0.49647649
#> boot47  0.4767640650 -1.1818542 0.9032659 1.674545  0.05026223 -0.37623960
#> boot48  0.0905576351 -0.9047187 0.9582965 1.778680  0.02678892 -0.43896489
#> boot49 -0.0797755739 -1.0808706 0.8097688 1.907414  0.31766236 -0.58493521
#> boot50 -0.0299053339 -0.9712589 0.9181435 1.681124 -0.17734027 -0.57410775
#>                    g          h
#> boot1  -0.0162434653 -1.1287035
#> boot2  -0.2419698082 -0.8712435
#> boot3  -0.0151813270 -1.1228760
#> boot4  -0.2170633464 -1.2182434
#> boot5   0.0870943515 -1.0652104
#> boot6  -0.3261285015 -1.1058750
#> boot7  -0.0243674745 -1.1456131
#> boot8  -0.2489422337 -1.3471596
#> boot9   0.0291215474 -1.0716738
#> boot10 -0.0131630295 -1.2768139
#> boot11  0.1217558790 -1.1024321
#> boot12  0.0780319263 -1.2669877
#> boot13 -0.3187954833 -1.2661266
#> boot14  0.0471223856 -1.0968020
#> boot15 -0.1185563729 -1.0523713
#> boot16 -0.0848064754 -1.2625395
#> boot17 -0.0918521820 -1.1508955
#> boot18  0.1204831160 -1.1982865
#> boot19 -0.0774455644 -1.3476499
#> boot20 -0.1180193480 -1.1390801
#> boot21 -0.3054420714 -1.1656949
#> boot22  0.0512964024 -1.2070872
#> boot23 -0.1930578958 -1.2038978
#> boot24  0.3394676080 -1.1736231
#> boot25 -0.0217287719 -1.1598136
#> boot26 -0.2155397752 -1.1328912
#> boot27 -0.3325387564 -1.0240279
#> boot28 -0.0129048534 -1.1668878
#> boot29 -0.2119979387 -1.0970545
#> boot30 -0.0609164250 -1.2475733
#> boot31  0.0133513826 -1.2529570
#> boot32 -0.3700958741 -1.1802931
#> boot33 -0.1319481901 -1.1668767
#> boot34  0.3123308430 -1.2755105
#> boot35  0.0006109565 -1.1951551
#> boot36 -0.1024775068 -1.0841791
#> boot37 -0.0274515496 -1.2547757
#> boot38  0.1095660512 -1.2578183
#> boot39  0.0233063251 -1.1365577
#> boot40 -0.2845422528 -1.0897647
#> boot41  0.0038710882 -1.2485034
#> boot42  0.0659112033 -1.1407312
#> boot43 -0.3101974235 -1.2304897
#> boot44 -0.0761920365 -1.3163558
#> boot45 -0.2697643003 -1.2062121
#> boot46 -0.1914943000 -1.0684061
#> boot47 -0.3174998248 -1.2292433
#> boot48 -0.1401667389 -1.3704725
#> boot49 -0.2859947050 -1.0032692
#> boot50  0.4896709065 -1.3363265

Also the execution time is provided.

estimatedSignal$time
#> Time difference of 2.092914 secs

Plots

Finally, a violin plot is available to display the distribution of the theta estimates for each object obtained from different runs.

vp <- violinPlot(estimation = estimatedSignal,trueSignal = input$theta.true,title = "Violin plot")
vp