Mathematics and Computing

City trip search, principle of operation and advantages

Goals of the program and of the current text

The goal of the "city trip search" search engine is to show every interested person the advantages and principle of operation of a search engine sorting items by k-Pareto optimality. Therefore the web interface gives the user the opportunity to interact in a very direct way with the server. For real world usage the web interface can be simplified and customised. Nevertheless the interface is simple to use and the main goal of this text is to use the search engine and its interface as an example in order to explain explain the methodology of k-Pareto optimality based sorting.

For data availability reasons the locations in Western Europe available available on the websites geonames.org and wikipedia.org have been loaded into the search engine. Other uses of the search engine might however be more important or profitable from a financial point of view:

Use case

We assume a person uses the search engine for finding a location to spend his holidays.

The upper selection boxes found under the heading "Select a subset of cities" offers the possibility to select a subset of cities. This part of the search engine works in a straightforward way. The number of retained locations is displayed above the results.

The lower selection boxes found under the heading "sort by preferences" offer the user the possibility to sort the obtained subset of locations according to his preferences. The 50 first locations of this preference based sort are finally displayed. Basically this preference based sort works as follows.

The preference based sort

The part "Sort the subset by preferences" of the interface offers the user the opportunity to encode his preferences as a composite preorder relation. He might prefer a location to another one if it is closer to Paris, has a smaller population and contains a castle.

The sub-relation "has a smaller population" might also be replaced by "has a population between 500 and 1000 inhabitants". The meaning of this kind of sub-relation "belongs to an interval" is the following.

Selection of an interval
Sub-relation "belongs to an interval"

There are many ways to sort results while respecting a user's preferences. For example from this point of view lexicographic sorting and sorting by weighted means of differences to the desired values are both satisfactory.

This search engine respects the user's preferences by sorting according to an innovative criterion called k-Pareto optimality. The k-Pareto optimality of a location is the percentage of locations that are preferable to that location. It is the product of the percentages of locations that are preferable for each sub-criterion. Instead of computing products one takes logarithms which add-up. These additive sub-scores are called self-information and are visualised by the length of horizontal blue bars.

Horizontal blue bars indicate scores
Horizontal blue bars: total score and sub-scores

It is insightful to select two or three intervals, to change their lengths and to observe the results and scores.

Advantages of the k-Pareto optimality based sort

Because of its interesting mathematical properties the k-Pareto optimality based sorting has many advantages for a user.

All those properties have been proved in a mathematical text and published at the AISTATS 2022 conference.

The above methodology is best adapted to cases where at least three search criteria need to be taken into account. For example three independent search criteria where each one filters out 9 results out of 10 together filter out approximately 999 results out of 1000. When therefore there are no-or no more results that fit all criteria exactly, k-Pareto optimality based sorting falls back as well to items that fit two criteria very well and one not at all, as to results that fit all three criteria quite well but not very well. In this way k-Pareto optimality enables the user to choose the compromise that seems best to him.

Indeed n numerical criteria make-up an n-dimensional space. From a geometric point of view we select in this space regions delimited by hyperboloid-like surfaces of constant k-Pareto optimality. Weighted means or Euclidean distances delimit planes or spheres and thus tend to only keep results that are an average fit for all criteria.

The above methodology applies well to any decision process that is performed in two steps. The first step is a preselection of a limited number of items out of a large collection according to the above methodology. The second step consists of the final selection of a unique item out of the preselected ones. This step might be performed by an expert via a complex and costly analysis or via an arbitrary person selecting according to his very own tastes.