OmniSciDB
a5dc49c757
|
Classes | |
class | InsufficientBufferSizeException |
class | DelimitedParserException |
Functions | |
size_t | find_beginning (const char *buffer, size_t begin, size_t end, const CopyParams ©_params) |
Finds the closest possible row beginning in the given buffer. More... | |
size_t | find_end (const char *buffer, size_t size, const import_export::CopyParams ©_params, unsigned int &num_rows_this_buffer, size_t buffer_first_row_index, bool &in_quote, size_t offset) |
size_t | get_max_buffer_resize () |
Gets the maximum size to which thread buffers should be automatically resized. More... | |
void | set_max_buffer_resize (const size_t max_buffer_resize) |
Sets the maximum size to which thread buffers should be automatically resized. This function is only used for testing. More... | |
size_t | find_row_end_pos (size_t &alloc_size, std::unique_ptr< char[]> &buffer, size_t &buffer_size, const CopyParams ©_params, const size_t buffer_first_row_index, unsigned int &num_rows_in_buffer, FILE *file, foreign_storage::FileReader *file_reader=nullptr) |
Finds the closest possible row ending to the end of the given buffer. The buffer is resized as needed, with more content read from the file, until an end of row is found or a configured max buffer limit is reached. More... | |
template<typename T > | |
const char * | get_row (const char *buf, const char *buf_end, const char *entire_buf_end, const import_export::CopyParams ©_params, const bool *is_array, std::vector< T > &row, std::vector< std::unique_ptr< char[]>> &tmp_buffers, bool &try_single_thread, bool filter_empty_lines) |
Parses the first row in the given buffer and inserts fields into given vector. More... | |
template const char * | get_row (const char *buf, const char *buf_end, const char *entire_buf_end, const import_export::CopyParams ©_params, const bool *is_array, std::vector< std::string > &row, std::vector< std::unique_ptr< char[]>> &tmp_buffers, bool &try_single_thread, bool filter_empty_lines) |
template const char * | get_row (const char *buf, const char *buf_end, const char *entire_buf_end, const import_export::CopyParams ©_params, const bool *is_array, std::vector< std::string_view > &row, std::vector< std::unique_ptr< char[]>> &tmp_buffers, bool &try_single_thread, bool filter_empty_lines) |
void | parse_string_array (const std::string &s, const import_export::CopyParams ©_params, std::vector< std::string > &string_vec, bool truncate_values=false) |
Parses given string array and inserts into given vector of strings. More... | |
void | extend_buffer (std::unique_ptr< char[]> &buffer, size_t &buffer_size, size_t &alloc_size, FILE *file, foreign_storage::FileReader *file_reader, size_t max_buffer_resize) |
Variables | |
static size_t | max_buffer_resize = max_import_buffer_resize_byte_size |
void import_export::delimited_parser::extend_buffer | ( | std::unique_ptr< char[]> & | buffer, |
size_t & | buffer_size, | ||
size_t & | alloc_size, | ||
FILE * | file, | ||
foreign_storage::FileReader * | file_reader, | ||
size_t | max_buffer_resize | ||
) |
Extends the given buffer to the lesser of max_buffer_resize or twice the given allocation size and reads new content from the file into the newly allocated buffer.
buffer | - buffer that will be extended |
buffer_size | - current buffer size |
alloc_size | - current allocation size |
file | - handle for file to be read from (one of file or file_reader must be present) |
file_reader | - reader for file to be read from (one of file or file_reader must be present) |
max_buffer_resize | - maximum size that the buffer can be extended to |
Definition at line 376 of file DelimitedParserUtils.cpp.
References CHECK, logger::INFO, LOG, and foreign_storage::FileReader::read().
Referenced by find_row_end_pos(), and foreign_storage::RegexFileBufferParser::findRowEndPosition().
size_t import_export::delimited_parser::find_beginning | ( | const char * | buffer, |
size_t | begin, | ||
size_t | end, | ||
const CopyParams & | copy_params | ||
) |
Finds the closest possible row beginning in the given buffer.
buffer | Given buffer which has the rows in csv format. (NOT OWN) |
begin | Start index of buffer to look for the beginning. |
end | End index of buffer to look for the beginning. |
copy_params | Copy params for the table. |
Definition at line 67 of file DelimitedParserUtils.cpp.
References import_export::CopyParams::line_delim.
Referenced by import_export::import_thread_delimited(), and foreign_storage::CsvFileBufferParser::parseBuffer().
size_t import_export::delimited_parser::find_end | ( | const char * | buffer, |
size_t | size, | ||
const import_export::CopyParams & | copy_params, | ||
unsigned int & | num_rows_this_buffer, | ||
size_t | buffer_first_row_index, | ||
bool & | in_quote, | ||
size_t | offset | ||
) |
Definition at line 85 of file DelimitedParserUtils.cpp.
References import_export::CopyParams::escape, import_export::CopyParams::line_delim, import_export::CopyParams::quote, import_export::CopyParams::quoted, and to_string().
Referenced by find_row_end_pos().
size_t import_export::delimited_parser::find_row_end_pos | ( | size_t & | alloc_size, |
std::unique_ptr< char[]> & | buffer, | ||
size_t & | buffer_size, | ||
const CopyParams & | copy_params, | ||
const size_t | buffer_first_row_index, | ||
unsigned int & | num_rows_in_buffer, | ||
FILE * | file, | ||
foreign_storage::FileReader * | file_reader = nullptr |
||
) |
Finds the closest possible row ending to the end of the given buffer. The buffer is resized as needed, with more content read from the file, until an end of row is found or a configured max buffer limit is reached.
alloc_size | Allocation size of subsequent buffer. This is adjusted as needed, if the buffer has to be resized. |
buffer | Given buffer which has the rows in csv format. |
buffer_size | Size of the buffer. |
copy_params | Copy params for the table. |
buffer_first_row_index | Index of first row in the buffer. |
num_rows_in_buffer | Number of rows until the closest possible row ending. |
file | Handle to CSV file being parsed. (optional) |
file_reader | Handle to a FileReader class, must be valid if file isnt |
Definition at line 166 of file DelimitedParserUtils.cpp.
References CHECK, extend_buffer(), find_end(), get_max_buffer_resize(), foreign_storage::FileReader::isScanFinished(), and max_buffer_resize.
Referenced by foreign_storage::CsvFileBufferParser::findRowEndPosition(), and import_export::Importer::importDelimited().
size_t import_export::delimited_parser::get_max_buffer_resize | ( | ) |
Gets the maximum size to which thread buffers should be automatically resized.
Definition at line 158 of file DelimitedParserUtils.cpp.
References max_buffer_resize.
Referenced by find_row_end_pos().
const char * import_export::delimited_parser::get_row | ( | const char * | buf, |
const char * | buf_end, | ||
const char * | entire_buf_end, | ||
const import_export::CopyParams & | copy_params, | ||
const bool * | is_array, | ||
std::vector< T > & | row, | ||
std::vector< std::unique_ptr< char[]>> & | tmp_buffers, | ||
bool & | try_single_thread, | ||
bool | filter_empty_lines | ||
) |
Parses the first row in the given buffer and inserts fields into given vector.
buf | Given buffer which has the rows in csv format. (NOT OWN) |
buf_end | End of the sliced buffer for the thread. (NOT OWN) |
entire_buf_end | End of the entire buffer. (NOT OWN) |
copy_params | Copy params for the table. |
is_array | Array of bools which tells if a column is an array type. |
row | Given vector to be populated with parsed fields. |
try_single_thread | In case of parse errors, this will tell if parsing should continue with single thread. |
filter_empty_lines | Whether to skip empty lines (used when parsing single columns returned by s3 select, as nulls may be encoded as empty lines) |
Definition at line 206 of file DelimitedParserUtils.cpp.
References import_export::CopyParams::array_begin, import_export::CopyParams::array_end, import_export::CopyParams::delimiter, logger::ERROR, import_export::CopyParams::escape, field(), anonymous_namespace{DelimitedParserUtils.cpp}::is_eol(), LOG, import_export::CopyParams::quote, import_export::CopyParams::quoted, anonymous_namespace{DelimitedParserUtils.cpp}::trim_quotes(), import_export::trim_space(), and import_export::CopyParams::trim_spaces.
Referenced by import_export::import_thread_delimited(), parse_string_array(), foreign_storage::CsvFileBufferParser::parseBuffer(), import_export::Detector::split_raw_data(), and foreign_storage::CsvFileBufferParser::validateExpectedColumnCount().
template const char* import_export::delimited_parser::get_row | ( | const char * | buf, |
const char * | buf_end, | ||
const char * | entire_buf_end, | ||
const import_export::CopyParams & | copy_params, | ||
const bool * | is_array, | ||
std::vector< std::string > & | row, | ||
std::vector< std::unique_ptr< char[]>> & | tmp_buffers, | ||
bool & | try_single_thread, | ||
bool | filter_empty_lines | ||
) |
template const char* import_export::delimited_parser::get_row | ( | const char * | buf, |
const char * | buf_end, | ||
const char * | entire_buf_end, | ||
const import_export::CopyParams & | copy_params, | ||
const bool * | is_array, | ||
std::vector< std::string_view > & | row, | ||
std::vector< std::unique_ptr< char[]>> & | tmp_buffers, | ||
bool & | try_single_thread, | ||
bool | filter_empty_lines | ||
) |
void import_export::delimited_parser::parse_string_array | ( | const std::string & | s, |
const import_export::CopyParams & | copy_params, | ||
std::vector< std::string > & | string_vec, | ||
bool | truncate_values = false |
||
) |
Parses given string array and inserts into given vector of strings.
s | Given string array |
copy_params | Copy params for the table. |
string_vec | Given vector to be populated with parsed fields. |
Definition at line 326 of file DelimitedParserUtils.cpp.
References import_export::CopyParams::array_begin, import_export::CopyParams::array_delim, import_export::CopyParams::array_end, import_export::CopyParams::delimiter, get_row(), StringDictionary::MAX_STRLEN, import_export::CopyParams::null_str, and to_string().
Referenced by import_export::TypedImportBuffer::add_value(), import_export::TypedImportBuffer::addDefaultValues(), RowToColumnLoader::convert_string_to_column(), foreign_storage::anonymous_namespace{LogFileBufferParser.cpp}::create_map_from_arrays(), and data_conversion::StringViewToArrayEncoder< ScalarEncoderType >::encodeScalarData().
void import_export::delimited_parser::set_max_buffer_resize | ( | const size_t | max_buffer_resize_param | ) |
Sets the maximum size to which thread buffers should be automatically resized. This function is only used for testing.
Definition at line 162 of file DelimitedParserUtils.cpp.
References max_buffer_resize.
|
static |
Definition at line 156 of file DelimitedParserUtils.cpp.
Referenced by find_row_end_pos(), foreign_storage::RegexFileBufferParser::findRowEndPosition(), get_max_buffer_resize(), set_max_buffer_resize(), and foreign_storage::RegexFileBufferParser::setMaxBufferResize().