HiCAL - A System for Efficient High-Recall Retrieval

Create a user and login

Once your docker images are up and running, open your browser to http://localhost:9000/. You should be able to access the web interface for the system. Create a user (or click on practice) and login.

Figure 1: Login page of HiCAL. You can also use the practice account by clicking on the "click to practice" button.

Homepage HiCAL. — Figure 2: After logging in to HiCAL, create or select a topic of search. This will start a session and train the Machine learning model.

Once you login, create a topic of search. Once created, navigate to the CAL or Search components using the side bar.

How to use

HiCAL Web

There are 2 retrieval components, CAL (Figure 4a) and Search (Figure 4b). Documents judged from the Search component are also used to update prime the machine learning model used in the CAL interface.

CAL Interface

In this interface, the machine learning model will select the next most-likely relevant document to show you. The model initially uses the seed query you have entered when creating the topic of search to train the model. After each iteration of reviewing and judging, the model improves by using your previous judgments to further update the model.

Note! It is a good idea to first "prime" the model by judging documents from the search engine. After a few judgments from Search, go to the CAL interface to view the documents the machine learning model has selected.

Keyword Highlighting

You can highlight keywords by entering it in the highlighting search bar. Words or part of words that match your entered keywords will be highlighted.

Judging

You can judge a document by clicking at any of the judging buttons (not relevant, relevant, highly relevant) or using the keyboard shortcuts (s, r, h). To rejudge a document you have previously judged, you can click on the "Latest judgment" buttons on the top right corner, or use the keyboard shortcut "u").

Once you judge a document, the model will retrain and the next most-likely relevant document will be returned. A cache-queue is implemented in the front-end to speed the reviewing process while waiting for the model to complete the training. Once the training is done, the queue is automatically flushed and updated with the new set of documents. This increases the responsiveness of the system and removes any perceptible interface lag.

The model will not return anything once all documents in the corpus are judged.

What features people like?

An experiment with several people using our system showed that participants liked different features.

Take a look at our paper to learn more.

Judging using a paragraph excerpt

Often times, documents contains few paragraphs that can used to determine the relevancy of the document. Instead of reading or searching the whole document for relevant material, assessing the relevancy of a document by reading the most relevant paragraph saves you the time of reading unnecessary content.

By default, the interface will show you what the model has selected as the most-likely relevant paragraph. Reviewing paragraphs instead of full documents reduces effort and review time.

If you believe you need to review the full document to determine its relevancy, you can click on the "view full document" button to view the whole document content. The full document content will appear below the paragraph.

You can also customize the interface (see documentation) to your needs (e.g. only show paragraph excerpts, only show full document content, etc).

Search Interface

The search interface allows you to search for documents using your own queries. By default, the interface will return 10 results from the search engine. You can select the number of documents you want to return using the dropdown button next to the search bar.

Each Search Engine Result Page (SERP) item contains a snippet generated by the search engine and a set of judging buttons. Judged documents have a vertical indicator bar showing the relevancy judgment of the document.

You can click at any SERP item to view or search the full document content.

Any document you judge in the Search interface will be used to update the machine learning model (accessible in the CAL interface).

What interface should I be using?

We've ran an experiment with several people using our system to complete several search tasks. Do people like to use the Search interface? Do they like the CAL interface? Do they like the ability to view the full document along with a paragraph excerpt?

Our participants have indicated that they like to have the full-fledged system (Search enabled, CAL enabled with paragraph excerpt and ability to view the full document). Users want full control of the system!

However, this comes at a cost! Full control causes users to be slow and waste time. Our analysis shows that performance is highest when their interactions are limited to producing relevance judgments on paragraph length excerpts from the CAL interface.

To learn more, click here to read our paper.

Exporting judgments

You can click on the archive button on the side bar to access the list of documents you have judged for your current topic. Click on "export to csv" to get a csv file of your judgments.

Figure 5. A list of the documents you have judged for your current topic is shown in the archive page. You can export your document judgments by clicking on the "export to csv" button.

HiCAL Engine

The CAL Interface is powered by a modified C++ implementation of the AutoTAR algorithm. This tool can be used through the command line interface or HTTP API. Detailed documentation on how to use this tool is available here.

Bugs, issues or feature requests?

Please report here.