OmniSciDB
a5dc49c757
|
#include "OneHotEncoder.h"
#include "QueryEngine/TableFunctions/SystemFunctions/os/Shared/TableFunctionsCommon.hpp"
#include "Shared/ThreadInfo.h"
#include <tbb/parallel_for.h>
#include <tbb/parallel_sort.h>
Go to the source code of this file.
Classes | |
struct | TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap |
A struct that creates a bytemap to map each key to its corresponding one-hot column index. More... | |
Namespaces | |
TableFunctions_Namespace | |
TableFunctions_Namespace::OneHotEncoder_Namespace | |
Functions | |
NEVER_INLINE HOST std::pair < std::vector< int32_t >, bool > | TableFunctions_Namespace::OneHotEncoder_Namespace::get_top_k_keys (const Column< TextEncodingDict > &text_col, const int32_t top_k, const double min_perc_col_total_per_key) |
This function calculates the top k most frequent keys (categories) in the provided column based on a given minimum percentage of the column total per key. It returns the top k keys along with a boolean value indicating whether there are other keys beyond the top k keys. More... | |
template<typename F > | |
NEVER_INLINE HOST std::vector < std::vector< F > > | TableFunctions_Namespace::OneHotEncoder_Namespace::allocate_one_hot_cols (const int64_t num_one_hot_cols, const int64_t col_size) |
Allocates memory for the one-hot encoded columns and initializes them to zero. It takes the number of one-hot columns and the column size as input and returns a vector of one-hot encoded columns. More... | |
std::pair< int32_t, int32_t > | TableFunctions_Namespace::OneHotEncoder_Namespace::get_min_max_keys (const std::vector< int32_t > &top_k_keys) |
Finds the minimum and maximum keys in a given vector of keys and returns them as a pair. More... | |
template<typename F > | |
NEVER_INLINE HOST OneHotEncodedCol< F > | TableFunctions_Namespace::OneHotEncoder_Namespace::one_hot_encode (const Column< TextEncodingDict > &text_col, const TableFunctions_Namespace::OneHotEncoder_Namespace::OneHotEncodingInfo &one_hot_encoding_info) |
Takes a column of text-encoded data and one-hot encoding information as input. It performs the one-hot encoding process and returns an object containing the one-hot encoded columns and their corresponding categorical features. More... | |
template<typename F > | |
NEVER_INLINE HOST std::vector < OneHotEncodedCol< F > > | TableFunctions_Namespace::OneHotEncoder_Namespace::one_hot_encode (const ColumnList< TextEncodingDict > &text_cols, const std::vector< TableFunctions_Namespace::OneHotEncoder_Namespace::OneHotEncodingInfo > &one_hot_encoding_infos) |
One-hot encode multiple columns of text-encoded data in a column list, given a vector of one-hot encoding information for each column. More... | |
Variables | |
constexpr int16_t | TableFunctions_Namespace::OneHotEncoder_Namespace::INVALID_COL_IDX {-1} |