OmniSciDB  a5dc49c757
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
generate_TableFunctionsFactory_init Namespace Reference

Functions

def line_is_incomplete
 
def find_signatures
 
def format_function_args
 
def build_template_function_call
 
def build_preflight_function
 
def must_emit_preflight_function
 
def format_annotations
 
def is_template_function
 
def uses_manager
 
def is_cpu_function
 
def is_gpu_function
 
def parse_annotations
 
def call_methods
 

Variables

string separator = '$=>$'
 
list input_files = [os.path.join(os.path.dirname(__file__), 'test_udtf_signatures.hpp')]
 
tuple cpu_output_header = os.path.splitext(output_filename)
 
tuple gpu_output_header = os.path.splitext(output_filename)
 
list add_stmts = []
 
list cpu_template_functions = []
 
list gpu_template_functions = []
 
list cpu_address_expressions = []
 
list gpu_address_expressions = []
 
list cond_fns = []
 
list canonical_input_files = [input_file[input_file.find("/QueryEngine/") + 1:] for input_file in input_files]
 
list header_file = ['#include "' + canonical_input_file + '"' for canonical_input_file in canonical_input_files]
 
tuple dirname = os.path.dirname(output_filename)
 
tuple add_tf_generated_files
 
tuple cpu_generated_files
 
tuple gpu_generated_files
 
string content
 

Detailed Description

Given a list of input files, scan for lines containing UDTF
specification statements in the following form:

  UDTF: function_name(<arguments>) -> <output column types> (, <template type specifications>)?

where <arguments> is a comma-separated list of argument types. The
argument types specifications are:

- scalar types:
    Int8, Int16, Int32, Int64, Float, Double, Bool, TextEncodingDict, etc
- column types:
    ColumnInt8, ColumnInt16, ColumnInt32, ColumnInt64, ColumnFloat, ColumnDouble, ColumnBool, etc
- column list types:
    ColumnListInt8, ColumnListInt16, ColumnListInt32, ColumnListInt64, ColumnListFloat, ColumnListDouble, ColumnListBool, etc
- cursor type:
    Cursor<t0, t1, ...>
  where t0, t1 are column or column list types
- output buffer size parameter type:
    RowMultiplier<i>, ConstantParameter<i>, Constant<i>, TableFunctionSpecifiedParameter<i>
  where i is a literal integer.

The output column types is a comma-separated list of column types, see above.

In addition, the following equivalents are suppored:

  Column<T> == ColumnT
  ColumnList<T> == ColumnListT
  Cursor<T, V, ...> == Cursor<ColumnT, ColumnV, ...>
  int8 == int8_t == Int8, etc
  float == Float, double == Double, bool == Bool
  T == ColumnT for output column types
  RowMultiplier == RowMultiplier<i> where i is the one-based position of the sizer argument
  when no sizer argument is provided, Constant<1> is assumed

Argument types can be annotated using `|' (bar) symbol after an
argument type specification. An annotation is specified by a label and
a value separated by `=' (equal) symbol. Multiple annotations can be
specified by using `|` (bar) symbol as the annotations separator.
Supported annotation labels are:

- name: to specify argument name
- input_id: to specify the dict id mapping for output TextEncodingDict columns.
- default: to specify a default value for an argument (scalar only)

If argument type follows an identifier, it will be mapped to name
annotations. For example, the following argument type specifications
are equivalent:

  Int8 a
  Int8 | name=a

Template type specifications is a comma separated list of template
type assignments where values are lists of argument type names. For
instance:

  T = [Int8, Int16, Int32, Float], V = [Float, Double]

Function Documentation

def generate_TableFunctionsFactory_init.build_preflight_function (   fn_name,
  sizer,
  input_types,
  output_types,
  uses_manager 
)

Definition at line 205 of file generate_TableFunctionsFactory_init.py.

References format_function_args().

Referenced by parse_annotations().

206 def build_preflight_function(fn_name, sizer, input_types, output_types, uses_manager):
207 
208  def format_error_msg(err_msg, uses_manager):
209  if uses_manager:
210  return " return mgr.error_message(%s);\n" % (err_msg,)
211  else:
212  return " return table_function_error(%s);\n" % (err_msg,)
213 
214  cpp_args, _ = format_function_args(input_types,
215  output_types,
216  uses_manager,
217  use_generic_arg_name=False,
218  emit_output_args=False)
219 
220  if uses_manager:
221  fn = "EXTENSION_NOINLINE int32_t\n"
222  fn += "%s(%s) {\n" % (fn_name.lower() + "__preflight", cpp_args)
223  else:
224  fn = "EXTENSION_NOINLINE int32_t\n"
225  fn += "%s(%s) {\n" % (fn_name.lower() + "__preflight", cpp_args)
226 
227  for typ in input_types:
228  if isinstance(typ, declbracket.Declaration):
229  ann = typ.annotations
230  for key, value in ann:
231  if key == 'require':
232  err_msg = '"Constraint `%s` is not satisfied."' % (value[1:-1])
233 
234  fn += " if (!(%s)) {\n" % (value[1:-1].replace('\\', ''),)
235  fn += format_error_msg(err_msg, uses_manager)
236  fn += " }\n"
237 
238  if sizer.is_arg_sizer():
239  precomputed_nrows = str(sizer.args[0])
240  if '"' in precomputed_nrows:
241  precomputed_nrows = precomputed_nrows[1:-1]
242  # check to see if the precomputed number of rows > 0
243  err_msg = '"Output size expression `%s` evaluated in a negative value."' % (precomputed_nrows)
244  fn += " auto _output_size = %s;\n" % (precomputed_nrows)
245  fn += " if (_output_size < 0) {\n"
246  fn += format_error_msg(err_msg, uses_manager)
247  fn += " }\n"
248  fn += " return _output_size;\n"
249  else:
250  fn += " return 0;\n"
251  fn += "}\n\n"
252 
253  return fn
254 

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.build_template_function_call (   caller,
  called,
  input_types,
  output_types,
  uses_manager 
)

Definition at line 191 of file generate_TableFunctionsFactory_init.py.

References format_function_args().

Referenced by parse_annotations().

192 def build_template_function_call(caller, called, input_types, output_types, uses_manager):
193  cpp_args, name_args = format_function_args(input_types,
194  output_types,
195  uses_manager,
196  use_generic_arg_name=True,
197  emit_output_args=True)
198 
199  template = ("EXTENSION_NOINLINE int32_t\n"
200  "%s(%s) {\n"
201  " return %s(%s);\n"
202  "}\n") % (caller, cpp_args, called, name_args)
203  return template
204 

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.call_methods (   add_stmts)

Definition at line 483 of file generate_TableFunctionsFactory_init.py.

484 def call_methods(add_stmts):
485  n_add_funcs = linker.GenerateAddTableFunctionsFiles.get_num_generated_files()
486  return [ 'table_functions::add_table_functions_%d();' % (i) for i in range(n_add_funcs+1) ]
487 
def generate_TableFunctionsFactory_init.find_signatures (   input_file)
Returns a list of parsed UDTF signatures.

Definition at line 84 of file generate_TableFunctionsFactory_init.py.

References join(), line_is_incomplete(), heavyai.open(), split(), and run_benchmark_import.type.

Referenced by parse_annotations().

84 
85 def find_signatures(input_file):
86  """Returns a list of parsed UDTF signatures."""
87  signatures = []
88 
89  last_line = None
90  for line in open(input_file).readlines():
91  line = line.strip()
92  if last_line is not None:
93  line = last_line + ' ' + line
94  last_line = None
95  if not line.startswith('UDTF:'):
96  continue
97  if line_is_incomplete(line):
98  last_line = line
99  continue
100  last_line = None
101  line = line[5:].lstrip()
102  i = line.find('(')
103  j = line.find(')')
104  if i == -1 or j == -1:
105  sys.stderr.write('Invalid UDTF specification: `%s`. Skipping.\n' % (line))
106  continue
107 
108  expected_result = None
109  if separator in line:
110  line, expected_result = line.split(separator, 1)
111  expected_result = expected_result.strip().split(separator)
112  expected_result = list(map(lambda s: s.strip(), expected_result))
113 
114  ast = parser.Parser(line).parse()
115 
116  if expected_result is not None:
117  # Treat warnings as errors so that one can test TransformeWarnings
118  warnings.filterwarnings("error")
119 
120  # Template transformer expands templates into multiple lines
121  try:
122  result = transformers.Pipeline(
123  transformers.TemplateTransformer,
124  transformers.AmbiguousSignatureCheckTransformer,
125  transformers.FieldAnnotationTransformer,
126  transformers.TextEncodingDictTransformer,
127  transformers.DefaultValueAnnotationTransformer,
128  transformers.SupportedAnnotationsTransformer,
129  transformers.RangeAnnotationTransformer,
130  transformers.CursorAnnotationTransformer,
131  transformers.FixRowMultiplierPosArgTransformer,
132  transformers.RenameNodesTransformer,
133  transformers.AstPrinter)(ast)
134  except (transformers.TransformerException, transformers.TransformerWarning) as msg:
135  result = ['%s: %s' % (type(msg).__name__, msg)]
136  assert len(result) == len(expected_result), "\n\tresult: %s \n!= \n\texpected: %s" % (
137  '\n\t\t '.join(result),
138  '\n\t\t '.join(expected_result)
139  )
140  assert set(result) == set(expected_result), "\n\tresult: %s != \n\texpected: %s" % (
141  '\n\t\t '.join(result),
142  '\n\t\t '.join(expected_result),
143  )
144 
145  else:
146  signature = transformers.Pipeline(
147  transformers.TemplateTransformer,
148  transformers.AmbiguousSignatureCheckTransformer,
149  transformers.FieldAnnotationTransformer,
150  transformers.TextEncodingDictTransformer,
151  transformers.DefaultValueAnnotationTransformer,
152  transformers.SupportedAnnotationsTransformer,
153  transformers.RangeAnnotationTransformer,
154  transformers.CursorAnnotationTransformer,
155  transformers.FixRowMultiplierPosArgTransformer,
156  transformers.RenameNodesTransformer,
157  transformers.DeclBracketTransformer)(ast)
158 
159  signatures.extend(signature)
160 
161  return signatures
162 
std::string join(T const &container, std::string const &delim)
std::vector< std::string > split(std::string_view str, std::string_view delim, std::optional< size_t > maxsplit)
split apart a string into a vector of substrings
int open(const char *path, int flags, int mode)
Definition: heavyai_fs.cpp:66

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.format_annotations (   annotations_)

Definition at line 265 of file generate_TableFunctionsFactory_init.py.

References join().

Referenced by parse_annotations().

266 def format_annotations(annotations_):
267  def fmt(k, v):
268  # type(v) is not always 'str'
269  if k == 'require' or k == 'default' and v[0] == "\"":
270  return v[1:-1]
271  return v
272 
273  s = "std::vector<std::map<std::string, std::string>>{"
274  s += ', '.join(('{' + ', '.join('{"%s", "%s"}' % (k, fmt(k, v)) for k, v in a) + '}') for a in annotations_)
275  s += "}"
276  return s
277 
std::string join(T const &container, std::string const &delim)

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.format_function_args (   input_types,
  output_types,
  uses_manager,
  use_generic_arg_name,
  emit_output_args 
)

Definition at line 163 of file generate_TableFunctionsFactory_init.py.

References join().

Referenced by build_preflight_function(), and build_template_function_call().

164 def format_function_args(input_types, output_types, uses_manager, use_generic_arg_name, emit_output_args):
165  cpp_args = []
166  name_args = []
167 
168  if uses_manager:
169  cpp_args.append('TableFunctionManager& mgr')
170  name_args.append('mgr')
171 
172  for idx, typ in enumerate(input_types):
173  cpp_arg, name = typ.format_cpp_type(idx,
174  use_generic_arg_name=use_generic_arg_name,
175  is_input=True)
176  cpp_args.append(cpp_arg)
177  name_args.append(name)
178 
179  if emit_output_args:
180  for idx, typ in enumerate(output_types):
181  cpp_arg, name = typ.format_cpp_type(idx,
182  use_generic_arg_name=use_generic_arg_name,
183  is_input=False)
184  cpp_args.append(cpp_arg)
185  name_args.append(name)
186 
187  cpp_args = ', '.join(cpp_args)
188  name_args = ', '.join(name_args)
189  return cpp_args, name_args
190 
std::string join(T const &container, std::string const &delim)

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.is_cpu_function (   sig)

Definition at line 287 of file generate_TableFunctionsFactory_init.py.

References uses_manager().

Referenced by parse_annotations().

288 def is_cpu_function(sig):
289  # Any function that does not have _gpu_ suffix is a cpu function.
290  i = sig.name.rfind('_gpu_')
291  if i >= 0 and '__' in sig.name[:i + 1]:
292  if uses_manager(sig):
293  raise ValueError('Table function {} with gpu execution target cannot have TableFunctionManager argument'.format(sig.name))
294  return False
295  return True
296 

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.is_gpu_function (   sig)

Definition at line 297 of file generate_TableFunctionsFactory_init.py.

References uses_manager().

Referenced by parse_annotations().

298 def is_gpu_function(sig):
299  # A function with TableFunctionManager argument is a cpu-only function
300  if uses_manager(sig):
301  return False
302  # Any function that does not have _cpu_ suffix is a gpu function.
303  i = sig.name.rfind('_cpu_')
304  return not (i >= 0 and '__' in sig.name[:i + 1])
305 

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.is_template_function (   sig)

Definition at line 278 of file generate_TableFunctionsFactory_init.py.

Referenced by parse_annotations().

279 def is_template_function(sig):
280  i = sig.name.rfind('_template')
281  return i >= 0 and '__' in sig.name[:i + 1]
282 

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.line_is_incomplete (   line)

Definition at line 77 of file generate_TableFunctionsFactory_init.py.

Referenced by find_signatures().

77 
78 def line_is_incomplete(line):
79  # TODO: try to parse the line to be certain about completeness.
80  # `$=>$' is used to separate the UDTF signature and the expected result
81  return line.endswith(',') or line.endswith('->') or line.endswith(separator) or line.endswith('|')
82 
83 
# fmt: off

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.must_emit_preflight_function (   sig,
  sizer 
)

Definition at line 255 of file generate_TableFunctionsFactory_init.py.

Referenced by parse_annotations().

256 def must_emit_preflight_function(sig, sizer):
257  if sizer.is_arg_sizer():
258  return True
259  for arg_annotations in sig.input_annotations:
260  d = dict(arg_annotations)
261  if 'require' in d.keys():
262  return True
263  return False
264 

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.parse_annotations (   input_files)

Definition at line 306 of file generate_TableFunctionsFactory_init.py.

References build_preflight_function(), build_template_function_call(), find_signatures(), format_annotations(), is_cpu_function(), is_gpu_function(), is_template_function(), join(), and must_emit_preflight_function().

307 def parse_annotations(input_files):
308 
309  counter = 0
310 
311  add_stmts = []
312  cpu_template_functions = []
313  gpu_template_functions = []
314  cpu_function_address_expressions = []
315  gpu_function_address_expressions = []
316  cond_fns = []
317 
318  for input_file in input_files:
319  for sig in find_signatures(input_file):
320 
321  # Compute sql_types, input_types, and sizer
322  sql_types_ = []
323  input_types_ = []
324  input_annotations = []
325 
326  sizer = None
327  if sig.sizer is not None:
328  expr = sig.sizer.value
329  sizer = declbracket.Bracket('kPreFlightParameter', (expr,))
330 
331  uses_manager = False
332  for i, (t, annot) in enumerate(zip(sig.inputs, sig.input_annotations)):
333  if t.is_output_buffer_sizer():
334  if t.is_user_specified():
335  sql_types_.append(declbracket.Bracket.parse('int32').normalize(kind='input'))
336  input_types_.append(sql_types_[-1])
337  input_annotations.append(annot)
338  assert sizer is None # exactly one sizer argument is allowed
339  assert len(t.args) == 1, t
340  sizer = t
341  elif t.name == 'Cursor':
342  for t_ in t.args:
343  input_types_.append(t_)
344  input_annotations.append(annot)
345  sql_types_.append(declbracket.Bracket('Cursor', args=()))
346  elif t.name == 'TableFunctionManager':
347  if i != 0:
348  raise ValueError('{} must appear as a first argument of {}, but found it at position {}.'.format(t, sig.name, i))
349  uses_manager = True
350  else:
351  input_types_.append(t)
352  input_annotations.append(annot)
353  if t.is_column_any():
354  # XXX: let Bracket handle mapping of column to cursor(column)
355  sql_types_.append(declbracket.Bracket('Cursor', args=()))
356  else:
357  sql_types_.append(t)
358 
359  if sizer is None:
360  name = 'kTableFunctionSpecifiedParameter'
361  idx = 1 # this sizer is not actually materialized in the UDTF
362  sizer = declbracket.Bracket(name, (idx,))
363 
364  assert sizer is not None
365  ns_output_types = tuple([a.apply_namespace(ns='ExtArgumentType') for a in sig.outputs])
366  ns_input_types = tuple([t.apply_namespace(ns='ExtArgumentType') for t in input_types_])
367  ns_sql_types = tuple([t.apply_namespace(ns='ExtArgumentType') for t in sql_types_])
368 
369  sig.function_annotations.append(('uses_manager', str(uses_manager).lower()))
370 
371  input_types = 'std::vector<ExtArgumentType>{%s}' % (', '.join(map(util.tostring, ns_input_types)))
372  output_types = 'std::vector<ExtArgumentType>{%s}' % (', '.join(map(util.tostring, ns_output_types)))
373  sql_types = 'std::vector<ExtArgumentType>{%s}' % (', '.join(map(util.tostring, ns_sql_types)))
374  annotations = format_annotations(input_annotations + sig.output_annotations + [sig.function_annotations])
375 
376  # Notice that input_types and sig.input_types, (and
377  # similarly, input_annotations and sig.input_annotations)
378  # have different lengths when the sizer argument is
379  # Constant or TableFunctionSpecifiedParameter. That is,
380  # input_types contains all the user-specified arguments
381  # while sig.input_types contains all arguments of the
382  # implementation of an UDTF.
383 
384  if must_emit_preflight_function(sig, sizer):
385  fn_name = '%s_%s' % (sig.name, str(counter)) if is_template_function(sig) else sig.name
386  check_fn = build_preflight_function(fn_name, sizer, input_types_, sig.outputs, uses_manager)
387  cond_fns.append(check_fn)
388 
389  if is_template_function(sig):
390  name = sig.name + '_' + str(counter)
391  counter += 1
392  t = build_template_function_call(name, sig.name, input_types_, sig.outputs, uses_manager)
393  address_expression = ('avoid_opt_address(reinterpret_cast<void*>(%s))' % name)
394  if is_cpu_function(sig):
395  cpu_template_functions.append(t)
396  cpu_function_address_expressions.append(address_expression)
397  if is_gpu_function(sig):
398  gpu_template_functions.append(t)
399  gpu_function_address_expressions.append(address_expression)
400  add = ('TableFunctionsFactory::add("%s", %s, %s, %s, %s, %s, /*is_runtime:*/false);'
401  % (name, sizer.format_sizer(), input_types, output_types, sql_types, annotations))
402  add_stmts.append(add)
403 
404  else:
405  add = ('TableFunctionsFactory::add("%s", %s, %s, %s, %s, %s, /*is_runtime:*/false);'
406  % (sig.name, sizer.format_sizer(), input_types, output_types, sql_types, annotations))
407  add_stmts.append(add)
408  address_expression = ('avoid_opt_address(reinterpret_cast<void*>(%s))' % sig.name)
409 
410  if is_cpu_function(sig):
411  cpu_function_address_expressions.append(address_expression)
412  if is_gpu_function(sig):
413  gpu_function_address_expressions.append(address_expression)
414 
415  return add_stmts, cpu_template_functions, gpu_template_functions, cpu_function_address_expressions, gpu_function_address_expressions, cond_fns
416 
417 
418 
std::string join(T const &container, std::string const &delim)

+ Here is the call graph for this function:

def generate_TableFunctionsFactory_init.uses_manager (   sig)

Definition at line 283 of file generate_TableFunctionsFactory_init.py.

Referenced by table_functions::TableFunctionsFactory.add(), is_cpu_function(), is_gpu_function(), and com.mapd.parser.server.ExtensionFunctionSignatureParser.toSignature().

284 def uses_manager(sig):
285  return sig.inputs and sig.inputs[0].name == 'TableFunctionManager'
286 

+ Here is the caller graph for this function:

Variable Documentation

list generate_TableFunctionsFactory_init.add_stmts = []

Definition at line 434 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.add_tf_generated_files
Initial value:
1 = linker.GenerateAddTableFunctionsFiles(dirname, stmts,
2  header_file)

Definition at line 467 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.canonical_input_files = [input_file[input_file.find("/QueryEngine/") + 1:] for input_file in input_files]

Definition at line 441 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.cond_fns = []

Definition at line 439 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.content

Definition at line 488 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.cpu_address_expressions = []

Definition at line 437 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.cpu_generated_files
Initial value:
1 = linker.GenerateTemplateFiles(dirname, cpu_fns,
2  header_file, 'cpu')

Definition at line 473 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.cpu_output_header = os.path.splitext(output_filename)

Definition at line 430 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.cpu_template_functions = []

Definition at line 435 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.dirname = os.path.dirname(output_filename)

Definition at line 444 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.gpu_address_expressions = []

Definition at line 438 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.gpu_generated_files
Initial value:
1 = linker.GenerateTemplateFiles(dirname, gpu_fns,
2  header_file, 'gpu')

Definition at line 478 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.gpu_output_header = os.path.splitext(output_filename)

Definition at line 431 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.gpu_template_functions = []

Definition at line 436 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.header_file = ['#include "' + canonical_input_file + '"' for canonical_input_file in canonical_input_files]

Definition at line 442 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.input_files = [os.path.join(os.path.dirname(__file__), 'test_udtf_signatures.hpp')]

Definition at line 421 of file generate_TableFunctionsFactory_init.py.

string generate_TableFunctionsFactory_init.separator = '$=>$'

Definition at line 75 of file generate_TableFunctionsFactory_init.py.

Referenced by foreign_storage::anonymous_namespace{AbstractFileStorageDataWrapper.cpp}.append_file_path(), ai.heavy.jdbc.HeavyAIEscapeFunctions.appendCall(), foreign_storage::AbstractFileStorageDataWrapper.getFullFilePath(), import_export.import_thread_shapefile(), and pop_n_rows_from_merged_heaps_gpu().