OmniSciDB
a5dc49c757
|
Driver for running cleanup processes on a table. TableOptimizer provides functions for various cleanup processes that improve performance on a table. Only tables that have been modified using updates or deletes are candidates for cleanup. If the table descriptor corresponds to a sharded table, table optimizer processes each physical shard. More...
#include <TableOptimizer.h>
Public Member Functions | |
TableOptimizer (const TableDescriptor *td, Executor *executor, const Catalog_Namespace::Catalog &cat) | |
void | recomputeMetadata () const |
Recomputes per-chunk metadata for each fragment in the table. Updates and deletes can cause chunk metadata to become wider than the values in the chunk. Recomputing the metadata narrows the range to fit the chunk, as well as setting or unsetting the nulls flag as appropriate. More... | |
void | recomputeMetadataUnlocked (const TableUpdateMetadata &table_update_metadata) const |
Recomputes column chunk metadata for the given set of fragments. The caller of this method is expected to have already acquired the executor lock. More... | |
void | vacuumDeletedRows () const |
Compacts fragments to remove deleted rows. When a row is deleted, a boolean deleted system column is set to true. Vacuuming removes all deleted rows from a fragment. Note that vacuuming is a checkpointing operation, so data on disk will increase even though the number of rows for the current epoch has decreased. More... | |
void | vacuumFragmentsAboveMinSelectivity (const TableUpdateMetadata &table_update_metadata) const |
Private Member Functions | |
DeletedColumnStats | recomputeDeletedColumnMetadata (const TableDescriptor *td, const std::set< size_t > &fragment_indexes={}) const |
void | recomputeColumnMetadata (const TableDescriptor *td, const ColumnDescriptor *cd, const std::unordered_map< int, size_t > &tuple_count_map, std::optional< Data_Namespace::MemoryLevel > memory_level, const std::set< size_t > &fragment_indexes) const |
std::set< size_t > | getFragmentIndexes (const TableDescriptor *td, const std::set< int > &fragment_ids) const |
void | vacuumFragments (const TableDescriptor *td, const std::set< int > &fragment_ids={}) const |
DeletedColumnStats | getDeletedColumnStats (const TableDescriptor *td, const std::set< size_t > &fragment_indexes) const |
Private Attributes | |
const TableDescriptor * | td_ |
Executor * | executor_ |
const Catalog_Namespace::Catalog & | cat_ |
Static Private Attributes | |
static constexpr size_t | ROW_SET_SIZE {1000000000} |
Driver for running cleanup processes on a table. TableOptimizer provides functions for various cleanup processes that improve performance on a table. Only tables that have been modified using updates or deletes are candidates for cleanup. If the table descriptor corresponds to a sharded table, table optimizer processes each physical shard.
Definition at line 38 of file TableOptimizer.h.
TableOptimizer::TableOptimizer | ( | const TableDescriptor * | td, |
Executor * | executor, | ||
const Catalog_Namespace::Catalog & | cat | ||
) |
|
private |
Definition at line 222 of file TableOptimizer.cpp.
References anonymous_namespace{TableOptimizer.cpp}::build_ra_exe_unit(), cat_, CHECK_EQ, DeletedColumnStats::chunk_stats_per_fragment, ColumnDescriptor::columnId, CPU, executor_, anonymous_namespace{TableOptimizer.cpp}::get_compilation_options(), anonymous_namespace{TableOptimizer.cpp}::get_execution_options(), get_logical_type_info(), get_table_infos(), Catalog_Namespace::Catalog::getDatabaseId(), Catalog_Namespace::Catalog::getDeletedColumn(), TableDescriptor::hasDeletedCol, kCOUNT, LOG, anonymous_namespace{TableOptimizer.cpp}::set_metadata_from_results(), TableDescriptor::tableId, DeletedColumnStats::total_row_count, DeletedColumnStats::visible_row_count_per_fragment, and logger::WARNING.
Referenced by recomputeDeletedColumnMetadata(), and vacuumFragmentsAboveMinSelectivity().
|
private |
Definition at line 421 of file TableOptimizer.cpp.
References CHECK, shared::contains(), and TableDescriptor::fragmenter.
Referenced by recomputeMetadataUnlocked(), and vacuumFragmentsAboveMinSelectivity().
|
private |
Definition at line 322 of file TableOptimizer.cpp.
References anonymous_namespace{TableOptimizer.cpp}::build_ra_exe_unit(), cat_, CHECK, CHECK_EQ, ColumnDescriptor::columnId, ColumnDescriptor::columnName, ColumnDescriptor::columnType, CPU, executor_, TableDescriptor::fragmenter, anonymous_namespace{TableOptimizer.cpp}::get_compilation_options(), anonymous_namespace{TableOptimizer.cpp}::get_execution_options(), get_logical_type_info(), get_table_infos(), Catalog_Namespace::Catalog::getDatabaseId(), logger::INFO, kCOUNT, kINT, kMAX, kMIN, LOG, anonymous_namespace{TableOptimizer.cpp}::set_metadata_from_results(), TableDescriptor::tableId, and logger::WARNING.
Referenced by recomputeMetadata(), and recomputeMetadataUnlocked().
|
private |
Definition at line 206 of file TableOptimizer.cpp.
References cat_, CHECK, TableDescriptor::fragmenter, Catalog_Namespace::Catalog::getDeletedColumn(), getDeletedColumnStats(), TableDescriptor::hasDeletedCol, and report::stats.
Referenced by recomputeMetadata(), and recomputeMetadataUnlocked().
void TableOptimizer::recomputeMetadata | ( | ) | const |
Recomputes per-chunk metadata for each fragment in the table. Updates and deletes can cause chunk metadata to become wider than the values in the chunk. Recomputing the metadata narrows the range to fit the chunk, as well as setting or unsetting the nulls flag as appropriate.
Definition at line 134 of file TableOptimizer.cpp.
References cat_, CHECK_GE, Catalog_Namespace::DBMetadata::dbId, DEBUG_TIMER, executor_, Catalog_Namespace::Catalog::getAllColumnMetadataForTable(), Catalog_Namespace::Catalog::getCurrentDB(), Catalog_Namespace::Catalog::getDataMgr(), Catalog_Namespace::Catalog::getPhysicalTablesDescriptors(), lockmgr::TableLockMgrImpl< TableDataLockMgr >::getWriteLockForTable(), logger::INFO, LOG, TableDescriptor::nShards, recomputeColumnMetadata(), recomputeDeletedColumnMetadata(), ROW_SET_SIZE, report::stats, TableDescriptor::tableId, TableDescriptor::tableName, and td_.
Referenced by Parser::OptimizeTableStmt::execute(), and migrations::MigrationMgr::migrateDateInDaysMetadata().
void TableOptimizer::recomputeMetadataUnlocked | ( | const TableUpdateMetadata & | table_update_metadata | ) | const |
Recomputes column chunk metadata for the given set of fragments. The caller of this method is expected to have already acquired the executor lock.
Definition at line 178 of file TableOptimizer.cpp.
References cat_, CHECK, TableUpdateMetadata::columns_for_metadata_update, Data_Namespace::CPU_LEVEL, DEBUG_TIMER, getFragmentIndexes(), Catalog_Namespace::Catalog::getMetadataForTable(), recomputeColumnMetadata(), recomputeDeletedColumnMetadata(), and report::stats.
void TableOptimizer::vacuumDeletedRows | ( | ) | const |
Compacts fragments to remove deleted rows. When a row is deleted, a boolean deleted system column is set to true. Vacuuming removes all deleted rows from a fragment. Note that vacuuming is a checkpointing operation, so data on disk will increase even though the number of rows for the current epoch has decreased.
Definition at line 435 of file TableOptimizer.cpp.
References cat_, Catalog_Namespace::Catalog::checkpoint(), File_Namespace::GlobalFileMgr::compactDataFiles(), DEBUG_TIMER, Catalog_Namespace::Catalog::getDatabaseId(), Catalog_Namespace::Catalog::getDataMgr(), Data_Namespace::DataMgr::getGlobalFileMgr(), Catalog_Namespace::Catalog::getPhysicalTablesDescriptors(), Catalog_Namespace::Catalog::getTableEpochs(), lockmgr::TableLockMgrImpl< TableDataLockMgr >::getWriteLockForTable(), Catalog_Namespace::Catalog::removeFragmenterForTable(), Catalog_Namespace::Catalog::setTableEpochsLogExceptions(), TableDescriptor::tableId, td_, and vacuumFragments().
Referenced by Parser::OptimizeTableStmt::execute(), and anonymous_namespace{DdlCommandExecutor.cpp}::vacuum_table_if_required().
|
private |
Definition at line 498 of file TableOptimizer.cpp.
References cat_, UpdelRoll::catalog, CHECK_EQ, CHUNK_KEY_COLUMN_IDX, CHUNK_KEY_FRAGMENT_IDX, ColumnDescriptor::columnId, shared::contains(), Data_Namespace::CPU_LEVEL, anonymous_namespace{TableOptimizer.cpp}::delete_cpu_chunks(), TableDescriptor::fragmenter, anonymous_namespace{TableOptimizer.cpp}::get_uncached_cpu_chunk_keys(), Chunk_NS::Chunk::getChunk(), Data_Namespace::DataMgr::getChunkMetadataVecForKeyPrefix(), Catalog_Namespace::Catalog::getDatabaseId(), Catalog_Namespace::Catalog::getDataMgr(), Catalog_Namespace::Catalog::getDeletedColumn(), Catalog_Namespace::Catalog::getLogicalTableId(), UpdelRoll::logicalTableId, UpdelRoll::memoryLevel, UpdelRoll::stageUpdate(), UpdelRoll::table_descriptor, and TableDescriptor::tableId.
Referenced by vacuumDeletedRows(), and vacuumFragmentsAboveMinSelectivity().
void TableOptimizer::vacuumFragmentsAboveMinSelectivity | ( | const TableUpdateMetadata & | table_update_metadata | ) | const |
Vacuums fragments with a deleted rows percentage that exceeds the configured minimum vacuum selectivity threshold.
Definition at line 546 of file TableOptimizer.cpp.
References cat_, Catalog_Namespace::Catalog::checkpoint(), Catalog_Namespace::Catalog::checkpointWithAutoRollback(), DEBUG_TIMER, Data_Namespace::DISK_LEVEL, executor_, TableUpdateMetadata::fragments_with_deleted_rows, g_vacuum_min_selectivity, Catalog_Namespace::Catalog::getDatabaseId(), getDeletedColumnStats(), getFragmentIndexes(), Catalog_Namespace::Catalog::getMetadataForTable(), Catalog_Namespace::Catalog::getTableEpochs(), lockmgr::TableLockMgrImpl< TableDataLockMgr >::getWriteLockForTable(), TableDescriptor::persistenceLevel, shared::printContainer(), ROW_SET_SIZE, Catalog_Namespace::Catalog::setTableEpochsLogExceptions(), TableDescriptor::tableId, td_, vacuumFragments(), DeletedColumnStats::visible_row_count_per_fragment, and VLOG.
|
private |
Definition at line 98 of file TableOptimizer.h.
Referenced by getDeletedColumnStats(), recomputeColumnMetadata(), recomputeDeletedColumnMetadata(), recomputeMetadata(), recomputeMetadataUnlocked(), vacuumDeletedRows(), vacuumFragments(), and vacuumFragmentsAboveMinSelectivity().
|
private |
Definition at line 97 of file TableOptimizer.h.
Referenced by getDeletedColumnStats(), recomputeColumnMetadata(), recomputeMetadata(), and vacuumFragmentsAboveMinSelectivity().
|
staticprivate |
Definition at line 101 of file TableOptimizer.h.
Referenced by recomputeMetadata(), and vacuumFragmentsAboveMinSelectivity().
|
private |
Definition at line 96 of file TableOptimizer.h.
Referenced by recomputeMetadata(), vacuumDeletedRows(), and vacuumFragmentsAboveMinSelectivity().