SQL-based ingestion known issues
This page describes SQL-based batch ingestion using the druid-multi-stage-query
extension, new in Druid 24.0. Refer to the ingestion methods table to determine which
ingestion method is right for you.
Multi-stage query task runtime
Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.
Worker task stage outputs are stored in the working directory given by
druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including "No space left on device".
SELECT Statement
GROUPING SETSare not implemented. Queries using these features return a QueryNotSupported error.
INSERT and REPLACE Statements
The
INSERTandREPLACEstatements with column lists, likeINSERT INTO tbl (a, b, c) SELECT ..., is not implemented.INSERT ... SELECTandREPLACE ... SELECTinsert columns from theSELECTstatement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.INSERTandREPLACEdo not support all options available in ingestion specs, including thecreateBitmapIndexandmultiValueHandlingdimension properties, and theindexSpectuningConfigproperty.
EXTERN Function
The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the
signatureparameter of theEXTERNfunction.EXTERNwith input sources that match large numbers of files may exhaust available memory on the controller task.EXTERNrefers to external files. UseFROMto accessdruidinput sources.
WINDOW Function
- The maximum number of elements in a window cannot exceed a value of 100,000.
- To avoid
leafOperatorsin MSQ engine, window functions have an extra scan stage after the window stage for cases where native engine has a non-emptyleafOperator.
Automatic compaction
The following known issues and limitations affect automatic compaction with the MSQ task engine:
- The
metricSpecfield is only supported for certain aggregators. For more information, see Supported aggregators. - Only dynamic and range-based partitioning are supported.
- Set
rolluptotrueif and only ifmetricSpecis not empty or null. - You can only partition on string dimensions. However, multi-valued string dimensions are not supported.
- The
maxTotalRowsconfig is not supported inDynamicPartitionsSpec. UsemaxRowsPerSegmentinstead. - Segments can only be sorted on
__timeas the first column.