Abstract
We developed CHISSL, a human-machine interface that utilizes a combination of unsupervised and semi-supervised machine learning to enable a non-expert user to organize large amounts of data instances by her own mental model. The user interacts with individual examples by dragging and dropping to move items between groups, or double-clicking to create new groups. The algorithm rapidly re-evaluates the distance from all instances to those provided by the user. This is used to re-classify the un-labeled data and also to provide recommendations for what recommendations the user sees for each group she has created. Our main contribution is the technique that allows user feedback to be incorporated rapidly, incrementally, and predictably, in a manner that scales easily beyond hundreds of thousands of instances. Our algorithm is partitioned between a lightweight client and a heavyweight server. The server is responsible for initial batch processing and representation of the data. A tree representation of this data is sent to the client, without the need to send the full representation of all instances. This saves an extraordinary amount of memory and bandwidth. All computation that incorporates user feedback is performed in in a web browser without the need to return to the server. This decreases the latency of user interactions and decreases server load, theoretically allowing many analysts to use the system simultaneously.
Exploratory License
Eligible for exploratory license
Market Sector
Data Sciences