MultiIndex
Public Member Functions | Private Member Functions | Private Attributes
MultiIndexer< Record > Class Template Reference

#include <indexer.h>

Collaboration diagram for MultiIndexer< Record >:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 MultiIndexer (const int multiplicity=2)
void BuildMultiIndex (const string &points_filename, const string &metainfo_filename, const int points_count, const vector< Centroids > &coarse_vocabs, const vector< Centroids > &fine_vocabs, const RerankMode &mode, const bool build_coarse_quantization, const string &files_prefix, const string &coarse_quantization_filename="")

Private Member Functions

void PrepareCoarseQuantization (const string &points_filename, const int points_count, const vector< Centroids > &coarse_vocabs)
void GetCoarseQuantizationsForSubset (const string &points_filename, const int start_pid, const int subset_size, const vector< Centroids > &coarse_vocabs, vector< vector< ClusterId > > *transposed_coarse_quantizations)
void SerializeCoarseQuantizations (const vector< vector< ClusterId > > &transposed_coarse_quantizations, const string &filename)
void SerializeMultiIndexFiles ()
void ConvertPointsInCellsCountToCellEdges ()
void FillMultiIndex (const string &points_filename, const int points_count, const vector< Centroids > &coarse_vocabs, const vector< Centroids > &fine_vocabs, const RerankMode &mode)
void FillMultiIndexForSubset (const string &points_filename, const PointId start_pid, const int points_count, const vector< Centroids > &coarse_vocabs, const vector< Centroids > &fine_vocabs, const RerankMode &mode, Multitable< int > *points_written_in_index)
void GetPointCoarseQuantization (const PointId pid, const string &filename, vector< ClusterId > *coarse_quantization)
void FillPointRerankInfo (const Point &point, const PointId pid, const vector< Centroids > &fine_vocabs)
void RestorePointsInCellsCountFromCourseQuantization (const string &points_filename, const int points_count, const vector< Centroids > &coarse_vocabs)
int GetInputCoordSizeof ()
void ReadPoint (ifstream &input, Point *point)
void InitBlasStructures (const vector< Centroids > &coarse_vocabs)

Private Attributes

string files_prefix_
string coarse_quantization_filename_
int multiplicity_
Multitable< int > point_in_cells_count_
MultiIndex< Record > multiindex_
boost::mutex cell_counts_mutex_
vector< float * > coarse_vocabs_matrices_
vector< vector< float > > coarse_centroids_norms_

Detailed Description

template<class Record>
class MultiIndexer< Record >

This is the main class for creating multiindex for a set of points in a multidimensional space. Clusterization and vocabs learning happen outside of this class, multiindexer receives prepared vocabs in input


Constructor & Destructor Documentation

template<class Record >
MultiIndexer< Record >::MultiIndexer ( const int  multiplicity = 2)

This is the simple MultiIndexer constructor

Parameters:
multiplicityhow many parts input points will be divide on

Member Function Documentation

template<class Record >
void MultiIndexer< Record >::BuildMultiIndex ( const string &  points_filename,
const string &  metainfo_filename,
const int  points_count,
const vector< Centroids > &  coarse_vocabs,
const vector< Centroids > &  fine_vocabs,
const RerankMode mode,
const bool  build_coarse_quantization,
const string &  files_prefix,
const string &  coarse_quantization_filename = "" 
)

This is the main function of MultiIndexer

Parameters:
points_filenamefile with points in .fvecs or .bvecs format
points_counthow many points should we index
coarse_vocabsvocabularies for coarse quantization
fine_vocabsvocabularies for fine quantization for reranking
modedetermines the way of rerank info calculating
build_coarse_quantizationshould we get coarse quantization or not
files_prefixall index filenames will have this prefix
coarse_quantization_filenamefile with coarse quantization (if exists)
template<class Record >
void MultiIndexer< Record >::ConvertPointsInCellsCountToCellEdges ( ) [private]

This function converts counts of points in cells to cell edges

template<class Record >
void MultiIndexer< Record >::FillMultiIndex ( const string &  points_filename,
const int  points_count,
const vector< Centroids > &  coarse_vocabs,
const vector< Centroids > &  fine_vocabs,
const RerankMode mode 
) [private]

This function fills multiindex data structures.

Parameters:
points_filenamefile with points in .fvecs or .bvecs format
points_counthow many points should we index
coarse_vocabsvocabularies for coarse quantization
fine_vocabsvocabularies for fine quantization for reranking
modedetermines the way of rerank info calculating
template<class Record >
void MultiIndexer< Record >::FillMultiIndexForSubset ( const string &  points_filename,
const PointId  start_pid,
const int  points_count,
const vector< Centroids > &  coarse_vocabs,
const vector< Centroids > &  fine_vocabs,
const RerankMode mode,
Multitable< int > *  points_written_in_index 
) [private]

This function fills multiindex data structures.

Parameters:
points_filenamefile with points in .fvecs or .bvecs format
start_pididentifier of the first point in subset
subset_sizepoints count in subset
coarse_vocabsvocabularies for coarse quantization
fine_vocabsvocabularies for fine quantization for reranking
modedetermines the way of rerank info calculating
points_written_in_indexauxillary structure for correct index filling
template<class Record >
void MultiIndexer< Record >::FillPointRerankInfo ( const Point point,
const PointId  pid,
const vector< Centroids > &  fine_vocabs 
) [private]

This function calculates rerank info for point

Parameters:
pointtarget point
pididentifier of target point
fine_vocabsvocabularies for rerank info calculation
template<class Record >
void MultiIndexer< Record >::GetCoarseQuantizationsForSubset ( const string &  points_filename,
const int  start_pid,
const int  subset_size,
const vector< Centroids > &  coarse_vocabs,
vector< vector< ClusterId > > *  transposed_coarse_quantizations 
) [private]

This function prepares for each point in subset its coarse quantization

Parameters:
points_filenamefile with points in .fvecs or .bvecs format
start_pididentifier of the first point in subset
subset_sizepoints count in subset
coarse_vocabsvocabularies for coarse quantization
transposed_coarse_quantizationsresult
template<class Record >
int MultiIndexer< Record >::GetInputCoordSizeof ( ) [private]

This simple function returns size of one coordinate of input point

template<class Record >
void MultiIndexer< Record >::GetPointCoarseQuantization ( const PointId  pid,
const string &  filename,
vector< ClusterId > *  coarse_quantization 
) [private]

This function reads point coarse quantization from file

Parameters:
pididentifier of target point
filenamefile with coarse quantizations
coarse_quantizationresult
template<class Record >
void MultiIndexer< Record >::InitBlasStructures ( const vector< Centroids > &  coarse_vocabs) [private]

Initialize all structures for BLAS operations

Parameters:
coarse_vocabscoarse vocabularies
template<class Record >
void MultiIndexer< Record >::PrepareCoarseQuantization ( const string &  points_filename,
const int  points_count,
const vector< Centroids > &  coarse_vocabs 
) [private]

This function prepares for each point its coarse quantization

Parameters:
points_filenamefile with points in .fvecs or .bvecs format
points_counthow many points should we handle
coarse_vocabsvocabularies for coarse quantization
template<class Record >
void MultiIndexer< Record >::ReadPoint ( ifstream &  input,
Point point 
) [private]

This simple function reads one point from input stream

Parameters:
inputinput stream
pointresult point
template<class Record >
void MultiIndexer< Record >::RestorePointsInCellsCountFromCourseQuantization ( const string &  points_filename,
const int  points_count,
const vector< Centroids > &  coarse_vocabs 
) [private]

This function restores counts of points from coarse quantizations

Parameters:
points_filenamefile with points in .fvecs or .bvecs format
points_counthow many points should we index
coarse_vocabsvocabularies for coarse quantization We need them to init counts table correctly
template<class Record >
void MultiIndexer< Record >::SerializeCoarseQuantizations ( const vector< vector< ClusterId > > &  transposed_coarse_quantizations,
const string &  filename 
) [private]

This function serializes prepared coarse quantizations to file

Parameters:
transposed_coarse_quantizationsquantizations to serialize. They are transposed because of effective memory usage
filenamefile we should serialize to
template<class Record >
void MultiIndexer< Record >::SerializeMultiIndexFiles ( ) [private]

This function saves index to files. All filenames start form the common files prefix


Member Data Documentation

template<class Record >
boost::mutex MultiIndexer< Record >::cell_counts_mutex_ [private]

Mutex for critical section in filling index stage

template<class Record >
vector<vector<float> > MultiIndexer< Record >::coarse_centroids_norms_ [private]

Struct for BLAS

template<class Record >
string MultiIndexer< Record >::coarse_quantization_filename_ [private]

Filename of file with coarse quantizations

template<class Record >
vector<float*> MultiIndexer< Record >::coarse_vocabs_matrices_ [private]

Struct for BLAS

template<class Record >
string MultiIndexer< Record >::files_prefix_ [private]

All index filenames will start from this prefix

template<class Record >
MultiIndex<Record> MultiIndexer< Record >::multiindex_ [private]

Multiindex

template<class Record >
int MultiIndexer< Record >::multiplicity_ [private]

Multiplicity (how many parts point space is divided on)

template<class Record >
Multitable<int> MultiIndexer< Record >::point_in_cells_count_ [private]

Table with number of points in each cell


The documentation for this class was generated from the following file:
 All Classes Files Functions Variables Typedefs Enumerations Enumerator