OmniSciDB  a5dc49c757
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
create_table.SyntheticTable Class Reference

Public Member Functions

def __init__
 
def createDataAndImportTable
 
def generateColumnsSchema
 
def getCreateTableCommand
 
def getCopyFromCommand
 
def generateData
 
def generateDataParallel
 
def createExpectedTableDetails
 
def doesTableHasExpectedSchemaInDB
 
def doesTableHasExpectedNumEntriesInDB
 
def createTableInDB
 
def importDataIntoTableInDB
 

Public Attributes

 table_name
 
 fragment_size
 
 num_fragments
 
 db_name
 
 db_user
 
 db_password
 
 db_server
 
 db_port
 
 data_dir_path
 
 num_entries
 
 column_list
 
 is_remote_server
 
 data_file_name_base
 

Detailed Description

Definition at line 47 of file create_table.py.

Constructor & Destructor Documentation

def create_table.SyntheticTable.__init__ (   self,
  kwargs 
)
    kwargs:
table_name(str): synthetic table's name in the database
fragment_size(int): fragment size (number of entries per fragment)
num_fragment(int): total number of fragments for the synthetic table
db_user(str): database username
db_password(str): database password
db_port(int): database port
db_name(str): database name
db_server(str): database server name
data_dir_path(str): path to directory that will include the generated data
is_remote_server(Bool): if True, it indicates that this class is not created on the 
same machine that is going to host the server.  

Definition at line 48 of file create_table.py.

48 
49  def __init__(self, **kwargs):
50  """
51  kwargs:
52  table_name(str): synthetic table's name in the database
53  fragment_size(int): fragment size (number of entries per fragment)
54  num_fragment(int): total number of fragments for the synthetic table
55  db_user(str): database username
56  db_password(str): database password
57  db_port(int): database port
58  db_name(str): database name
59  db_server(str): database server name
60  data_dir_path(str): path to directory that will include the generated data
61  is_remote_server(Bool): if True, it indicates that this class is not created on the
62  same machine that is going to host the server.
63  """
64  self.table_name = kwargs["table_name"]
65  self.fragment_size = kwargs["fragment_size"]
66  self.num_fragments = kwargs["num_fragments"]
67  self.db_name = kwargs["db_name"]
68  self.db_user = kwargs["db_user"]
69  self.db_password = kwargs["db_password"]
70  self.db_server = kwargs["db_server"]
71  self.db_port = kwargs["db_port"]
72  self.data_dir_path = kwargs["data_dir_path"]
75  self.data_dir_path = kwargs["data_dir_path"]
76  self.is_remote_server = kwargs["is_remote_server"]
77  if not os.path.isdir(self.data_dir_path):
78  os.mkdir(self.data_dir_path)
79  self.data_file_name_base = self.data_dir_path + "/data"

Member Function Documentation

def create_table.SyntheticTable.createDataAndImportTable (   self,
  skip_data_generation = False 
)

Definition at line 80 of file create_table.py.

References create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.data_file_name_base, create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), create_table.SyntheticTable.generateDataParallel(), create_table.SyntheticTable.importDataIntoTableInDB(), create_table.SyntheticTable.is_remote_server, create_table.SyntheticTable.num_entries, and split().

80 
81  def createDataAndImportTable(self, skip_data_generation=False):
82  # deciding whether it is required to generate data and import it into the database
83  # or the data already exists there:
84  if (
87  ):
88  print(
89  "Data already exists in the database, proceeding to the queries:"
90  )
91  else:
92  if self.is_remote_server:
93  # at this point, we abort the procedure as the data is
94  # either not present in the remote server or the schema/number of rows
95  # does not match of those indicated by this class.
96  raise Exception(
97  "Proper data does not exist in the remote server."
98  )
99  else:
100  # generate random synthetic data
101  if not skip_data_generation:
102  # choosing a relatively unique name for the generated csv files
103  current_time = str(datetime.datetime.now()).split()
104  self.data_file_name_base += "_" + current_time[0]
105 
106  self.generateDataParallel()
107  print(
108  "Synthetic data created: "
109  + str(self.num_entries)
110  + " rows"
111  )
112  # create a table on the database:
113  self.createTableInDB()
114  # import the generated data into the data base:
116  print("Data imported into the database")
std::vector< std::string > split(std::string_view str, std::string_view delim, std::optional< size_t > maxsplit)
split apart a string into a vector of substrings

+ Here is the call graph for this function:

def create_table.SyntheticTable.createExpectedTableDetails (   self)
Creates table details in the same format as expected 
from pymapd's get_table_details  

Definition at line 207 of file create_table.py.

References create_table.SyntheticTable.column_list.

Referenced by create_table.SyntheticTable.doesTableHasExpectedSchemaInDB().

208  def createExpectedTableDetails(self):
209  """
210  Creates table details in the same format as expected
211  from pymapd's get_table_details
212  """
213  return [
214  column.createColumnDetailsString() for column in self.column_list
215  ]

+ Here is the caller graph for this function:

def create_table.SyntheticTable.createTableInDB (   self)

Definition at line 277 of file create_table.py.

References create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, create_table.SyntheticTable.getCreateTableCommand(), and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

278  def createTableInDB(self):
279  try:
280  con = pymapd.connect(
281  user=self.db_user,
282  password=self.db_password,
283  host=self.db_server,
284  port=self.db_port,
285  dbname=self.db_name,
286  )
287  # drop the current table if exists:
288  con.execute("DROP TABLE IF EXISTS " + self.table_name + ";")
289  # create a new table:
290  con.execute(self.getCreateTableCommand())
291  except:
292  raise Exception("Failure in creating a new table.")

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB (   self)
    Verifies whether the existing table in the database has the expected
    number of entries in it as in this class.

Definition at line 253 of file create_table.py.

References create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, create_table.SyntheticTable.num_entries, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

255  """
256  Verifies whether the existing table in the database has the expected
257  number of entries in it as in this class.
258  """
259  try:
260  con = pymapd.connect(
261  user=self.db_user,
262  password=self.db_password,
263  host=self.db_server,
264  port=self.db_port,
265  dbname=self.db_name,
266  )
267  result = con.execute(
268  "select count(*) from " + self.table_name + ";"
269  )
270  if list(result)[0][0] == self.num_entries:
271  return True
272  else:
273  print("Expected num rows did not match:")
274  return False
275  except:
276  raise Exception("Pymapd's connection to the server has failed.")

+ Here is the caller graph for this function:

def create_table.SyntheticTable.doesTableHasExpectedSchemaInDB (   self)
    Verifies whether the existing table in the database has the expected
    schema or not. 

Definition at line 216 of file create_table.py.

References create_table.SyntheticTable.createExpectedTableDetails(), create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

218  """
219  Verifies whether the existing table in the database has the expected
220  schema or not.
221  """
222  try:
223  con = pymapd.connect(
224  user=self.db_user,
225  password=self.db_password,
226  host=self.db_server,
227  port=self.db_port,
228  dbname=self.db_name,
229  )
230  except:
231  raise Exception("Pymapd's connection to the server has failed.")
232  try:
233  table_details = con.get_table_details(self.table_name)
234  except:
235  # table does not exist
236  print("Table does not exist in the database")
237  return False
238 
239  if [
240  str(table_detail) for table_detail in table_details
241  ] == self.createExpectedTableDetails():
242  return True
243  else:
244  print("Schema does not match the expected one:")
245  print(
246  "Observed table details: "
247  + str([str(table_detail) for table_detail in table_details])
248  )
249  print(
250  "Expected table details: "
251  + str(self.createExpectedTableDetails())
252  )

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def create_table.SyntheticTable.generateColumnsSchema (   self)

Definition at line 117 of file create_table.py.

118  def generateColumnsSchema(self):
119  column_list = []
120  # columns with uniform distribution and step=1
121  column_list.append(Column("x10", "INT", 1, 10))
122  column_list.append(Column("y10", "INT", 1, 10))
123  column_list.append(Column("z10", "INT", 1, 10))
124  column_list.append(Column("x100", "INT", 1, 100))
125  column_list.append(Column("y100", "INT", 1, 100))
126  column_list.append(Column("z100", "INT", 1, 100))
127  column_list.append(Column("x1k", "INT", 1, 1000))
128  column_list.append(Column("x10k", "INT", 1, 10000))
129  column_list.append(Column("x100k", "INT", 1, 100000))
130  column_list.append(Column("x1m", "INT", 1, 1000000))
131  column_list.append(Column("x10m", "INT", 1, 10000000))
132 
133  # columns with step != 1
134  # cardinality = 10k, range = 100m
135  column_list.append(Column("x10k_s10k", "BIGINT", 1, 10000, 10000))
136  column_list.append(Column("x100k_s10k", "BIGINT", 1, 100000, 10000))
137  column_list.append(Column("x1m_s10k", "BIGINT", 1, 1000000, 10000))
138  return column_list
def create_table.SyntheticTable.generateData (   self,
  thread_idx,
  size 
)
    Single-thread random data generation based on the provided schema.
    Data is stored in CSV format.

Definition at line 161 of file create_table.py.

References create_table.SyntheticTable.column_list, create_table.SyntheticTable.data_file_name_base, join(), and heavyai.open().

Referenced by create_table.SyntheticTable.generateDataParallel().

162  def generateData(self, thread_idx, size):
163  """
164  Single-thread random data generation based on the provided schema.
165  Data is stored in CSV format.
166  """
167  file_name = (
168  self.data_file_name_base + "_part" + str(thread_idx) + ".csv"
169  )
170  with open(file_name, "w") as f:
171  for i in range(size):
172  f.write(
173  ",".join(
174  map(
175  str,
176  [col.generateEntry() for col in self.column_list],
177  )
178  )
179  )
180  f.write("\n")
std::string join(T const &container, std::string const &delim)
int open(const char *path, int flags, int mode)
Definition: heavyai_fs.cpp:66

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def create_table.SyntheticTable.generateDataParallel (   self)
    Uses all available CPU threads to generate random data based on the 
    provided schema. Data is stored in CSV format.

Definition at line 181 of file create_table.py.

References create_table.SyntheticTable.generateData(), and create_table.SyntheticTable.num_entries.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

182  def generateDataParallel(self):
183  """
184  Uses all available CPU threads to generate random data based on the
185  provided schema. Data is stored in CSV format.
186  """
187  num_threads = cpu_count()
188  num_entries_per_thread = int(
189  (self.num_entries + num_threads - 1) / num_threads
190  )
191  thread_index = [i for i in range(0, num_threads)]
192 
193  # making sure we end up having as many fragments as the user asked for
194  num_balanced_entries = [
195  num_entries_per_thread for _ in range(num_threads)
196  ]
197  if self.num_entries != num_entries_per_thread * num_threads:
198  last_threads_portion = (
199  self.num_entries - num_entries_per_thread * (num_threads - 1)
200  )
201  num_balanced_entries[-1] = last_threads_portion
202 
203  arguments = zip(thread_index, num_balanced_entries)
204 
205  with Pool(num_threads) as pool:
206  pool.starmap(self.generateData, arguments)

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def create_table.SyntheticTable.getCopyFromCommand (   self)

Definition at line 154 of file create_table.py.

References create_table.SyntheticTable.data_file_name_base, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.importDataIntoTableInDB().

155  def getCopyFromCommand(self):
156  copy_sql = "COPY " + self.table_name + " FROM '"
157  copy_sql += (
158  self.data_file_name_base + "*.csv' WITH (header = 'false');"
159  )
160  return copy_sql

+ Here is the caller graph for this function:

def create_table.SyntheticTable.getCreateTableCommand (   self)

Definition at line 139 of file create_table.py.

References create_table.SyntheticTable.column_list, create_table.SyntheticTable.fragment_size, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createTableInDB().

140  def getCreateTableCommand(self):
141  create_sql = "CREATE TABLE " + self.table_name + " ( "
142  for column_idx in range(len(self.column_list)):
143  column = self.column_list[column_idx]
144  create_sql += column.column_name + " " + column.sql_type
145  if column_idx != (len(self.column_list) - 1):
146  create_sql += ", "
147  create_sql += ")"
148  if self.fragment_size != 32000000:
149  create_sql += (
150  " WITH (FRAGMENT_SIZE = " + str(self.fragment_size) + ")"
151  )
152  create_sql += ";"
153  return create_sql

+ Here is the caller graph for this function:

def create_table.SyntheticTable.importDataIntoTableInDB (   self)

Definition at line 293 of file create_table.py.

References create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, and create_table.SyntheticTable.getCopyFromCommand().

Referenced by create_table.SyntheticTable.createDataAndImportTable().

294  def importDataIntoTableInDB(self):
295  try:
296  con = pymapd.connect(
297  user=self.db_user,
298  password=self.db_password,
299  host=self.db_server,
300  port=self.db_port,
301  dbname=self.db_name,
302  )
303  # import generated data:
304  con.execute(self.getCopyFromCommand())
305  except:
306  raise Exception("Failure in importing data into the table")
307 

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

Member Data Documentation

create_table.SyntheticTable.column_list

Definition at line 73 of file create_table.py.

Referenced by create_table.SyntheticTable.createExpectedTableDetails(), create_table.SyntheticTable.generateData(), and create_table.SyntheticTable.getCreateTableCommand().

create_table.SyntheticTable.data_dir_path

Definition at line 71 of file create_table.py.

create_table.SyntheticTable.data_file_name_base

Definition at line 78 of file create_table.py.

Referenced by create_table.SyntheticTable.createDataAndImportTable(), create_table.SyntheticTable.generateData(), and create_table.SyntheticTable.getCopyFromCommand().

create_table.SyntheticTable.db_name

Definition at line 66 of file create_table.py.

Referenced by create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), create_table.SyntheticTable.importDataIntoTableInDB(), heavydb.thrift.ttypes.TDBInfo.read(), heavydb.thrift.ttypes.TQueryInfo.read(), heavydb.thrift.ttypes.TDBInfo.write(), and heavydb.thrift.ttypes.TQueryInfo.write().

create_table.SyntheticTable.db_password

Definition at line 68 of file create_table.py.

Referenced by create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), and create_table.SyntheticTable.importDataIntoTableInDB().

create_table.SyntheticTable.db_port

Definition at line 70 of file create_table.py.

Referenced by create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), and create_table.SyntheticTable.importDataIntoTableInDB().

create_table.SyntheticTable.db_server

Definition at line 69 of file create_table.py.

Referenced by create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), and create_table.SyntheticTable.importDataIntoTableInDB().

create_table.SyntheticTable.db_user

Definition at line 67 of file create_table.py.

Referenced by create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), and create_table.SyntheticTable.importDataIntoTableInDB().

create_table.SyntheticTable.fragment_size

Definition at line 64 of file create_table.py.

Referenced by create_table.SyntheticTable.getCreateTableCommand(), heavydb.thrift.ttypes.TTableDetails.read(), and heavydb.thrift.ttypes.TTableDetails.write().

create_table.SyntheticTable.is_remote_server

Definition at line 75 of file create_table.py.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

create_table.SyntheticTable.num_entries

Definition at line 72 of file create_table.py.

Referenced by create_table.SyntheticTable.createDataAndImportTable(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), and create_table.SyntheticTable.generateDataParallel().

create_table.SyntheticTable.num_fragments

Definition at line 65 of file create_table.py.

create_table.SyntheticTable.table_name

Definition at line 63 of file create_table.py.

Referenced by create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), create_table.SyntheticTable.getCopyFromCommand(), create_table.SyntheticTable.getCreateTableCommand(), heavydb.thrift.ttypes.TTableMeta.read(), heavydb.thrift.Heavy.get_table_details_args.read(), heavydb.thrift.Heavy.get_table_details_for_database_args.read(), heavydb.thrift.Heavy.get_internal_table_details_args.read(), heavydb.thrift.Heavy.get_internal_table_details_for_database_args.read(), heavydb.thrift.Heavy.set_table_epoch_by_name_args.read(), heavydb.thrift.Heavy.get_table_epoch_by_name_args.read(), heavydb.thrift.Heavy.load_table_binary_args.read(), heavydb.thrift.Heavy.load_table_binary_columnar_args.read(), heavydb.thrift.Heavy.load_table_binary_arrow_args.read(), heavydb.thrift.Heavy.load_table_args.read(), heavydb.thrift.Heavy.create_table_args.read(), heavydb.thrift.Heavy.import_table_args.read(), heavydb.thrift.Heavy.import_geo_table_args.read(), heavydb.thrift.ttypes.TTableMeta.write(), heavydb.thrift.Heavy.get_table_details_args.write(), heavydb.thrift.Heavy.get_table_details_for_database_args.write(), heavydb.thrift.Heavy.get_internal_table_details_args.write(), heavydb.thrift.Heavy.get_internal_table_details_for_database_args.write(), heavydb.thrift.Heavy.set_table_epoch_by_name_args.write(), heavydb.thrift.Heavy.get_table_epoch_by_name_args.write(), heavydb.thrift.Heavy.load_table_binary_args.write(), heavydb.thrift.Heavy.load_table_binary_columnar_args.write(), heavydb.thrift.Heavy.load_table_binary_arrow_args.write(), heavydb.thrift.Heavy.load_table_args.write(), heavydb.thrift.Heavy.create_table_args.write(), heavydb.thrift.Heavy.import_table_args.write(), and heavydb.thrift.Heavy.import_geo_table_args.write().


The documentation for this class was generated from the following file: