cath-cluster

A simple way to complete-linkage cluster arbitrary data.

Screenshot
Above: A complete-linkage clustering of randomly generated points based on the distances between them

Features

Fast
Simple

Usage

The current full --help usage information is:

Usage: cath-cluster --link_dirn <dirn> [options] <input_file>

Cluster items based on the links between them.

When <input_file> is -, the links are read from standard input.

The clustering is complete-linkage.

Miscellaneous:
  -h [ --help ]                 Output help message
  -v [ --version ]              Output version information

Input:
  --link_dirn <dirn>            Interpret each link value as <dirn>, one of:
                                   DISTANCE - A higher value means the corresponding two entries are more distant
                                   STRENGTH - A higher value means the corresponding tow entries are more connected
  --column_idx <colnum> (=3)    Parse the link values (distances/strengths) from column number <colnum>
                                Must be ≥ 3 because columns 1 and 2 must contain the IDs
  --names-infile <file>         [RECOMMENDED] Read names and sorting scores from file <file> (or '-' for stdin)

Clustering:
  --levels <levels>             Cluster at levels <levels>, which is ordered values separated by commas (eg 35,60,95,100)

Output:
  --clusters-to-file <file>     Write the clustering to file <file> (or '-' for stdout)
  --merges-to-file <file>       Write the ordered list of merges to file <file> (or '-' for stdout)
  --clust-spans-to-file <file>  Write links that form spanning trees for each cluster to file <file> (or '-' for stdout)
  --reps-to-file <file>         Write the list of representatives to file <file> (or '-' for stdout)

Links input format: `id1 id2 other columns afterwards`
...where --column_idx can be used to specify the column that contains the values

Names input format: `id score`
...where score is used to sort such that lower-scored entries appear earlier

Please tell us your cath-tools bugs/suggestions : https://github.com/UCLOrengoGroup/cath-tools/issues/new

Feedback

Please tell us about your cath-tools bugs/suggestions here.