| The purpose of the **s3 select** engine is to create an efficient pipe between user client and storage nodes (the engine should be close as possible to storage).
| It enables selection of a restricted subset of (structured) data stored in an S3 object using an SQL-like syntax.
| It also enables for higher level analytic-applications (such as SPARK-SQL) , using that feature to improve their latency and throughput.
| For example, a s3-object of several GB (CSV file), a user needs to extract a single column which filtered by another column.
| As the following query:
|``select customer-id from s3Object where age>30 and age<65;``
| Currently the whole s3-object must retrieve from OSD via RGW before filtering and extracting data.
| By "pushing down" the query into OSD , it's possible to save a lot of network and CPU(serialization / deserialization).
|**The bigger the object, and the more accurate the query, the better the performance**.
Basic workflow
--------------
| S3-select query is sent to RGW via `AWS-CLI <https://docs.aws.amazon.com/cli/latest/reference/s3api/select-object-content.html>`_
| It passes the authentication and permission process as an incoming message (POST).
|**RGWSelectObj_ObjStore_S3::send_response_data** is the “entry point”, it handles each fetched chunk according to input object-key.
|**send_response_data** is first handling the input query, it extracts the query and other CLI parameters.
| Per each new fetched chunk (~4m), RGW executes s3-select query on it.
| The current implementation supports CSV objects and since chunks are randomly “cutting” the CSV rows in the middle, those broken-lines (first or last per chunk) are skipped while processing the query.
| Those “broken” lines are stored and later merged with the next broken-line (belong to the next chunk), and finally processed.
| Per each processed chunk an output message is formatted according to `AWS specification <https://docs.aws.amazon.com/AmazonS3/latest/API/archive-RESTObjectSELECTContent.html#archive-RESTObjectSELECTContent-responses>`_ and sent back to the client.
| RGW supports the following response: ``{:event-type,records} {:content-type,application/octet-stream} {:message-type,event}``.
| For aggregation queries the last chunk should be identified as the end of input, following that the s3-select-engine initiates end-of-process and produces an aggregate result.
Basic functionalities
~~~~~~~~~~~~~~~~~~~~~
|**S3select** has a definite set of functionalities that should be implemented (if we wish to stay compliant with AWS), currently only a portion of it is implemented.
| The implemented software architecture supports basic arithmetic expressions, logical and compare expressions, including nested function calls and casting operators, that alone enables the user reasonable flexibility.
| Any error occurs while the input query processing, i.e. parsing phase or execution phase, is returned to client as response error message.
| Fatal severity (attached to the exception) will end query execution immediately, other error severity are counted, upon reaching 100, it ends query execution with an error message.
| Currently only part of `AWS select command <https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-select.html>`_ is implemented, table below describes what is currently supported.
| NULL is a legit value in ceph-s3select systems similar to other DB systems, i.e. systems needs to handle the case where a value is NULL.
| The definition of NULL in our context, is missing/unknown, in that sense **NULL can not produce a value on ANY arithmetic operations** ( a + NULL will produce NULL value).
| The Same is with arithmetic comaprision, **any comparison to NULL is NULL**, i.e. unknown.
| Below is a truth table contains the NULL use-case.
| The `timestamp functionalities <https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-date.html>`_ is partially implemented.
| the casting operator( ``timestamp( string )`` ), converts string to timestamp basic type.
| Currently it can convert the following pattern ``yyyy:mm:dd hh:mi:dd``
|``extract( date-part , timestamp)`` : function return integer according to date-part extract from input timestamp.
| supported date-part : year,month,week,day.
|``dateadd(date-part , integer,timestamp)`` : function return timestamp, a calculation results of input timestamp and date-part.
| supported data-part : year,month,day.
|``datediff(date-part,timestamp,timestamp)`` : function returnan integer, a calculatedresult for differencebetween 2 timestampsaccording to date-part.
| supported date-part : year,month,day,hours.
|``utcnow()`` : return timestamp of current time.
Aggregation functions
~~~~~~~~~~~~~~~~~~~~~
|``count()`` : return integer according to number of rows matching condition(if such exist).
|``sum(expression)`` : return a summary of expression per all rows matching condition(if such exist).
|**Alias** programming-construct is an essential part of s3-select language, it enables much better programming especially with objects containing many columns or in the case of complex queries.
| Upon parsing the statement containing alias construct, it replaces alias with reference to correct projection column, on query execution time the reference is evaluated as any other expression.
| There is a risk that self(or cyclic) reference may occur causing stack-overflow(endless-loop), for that concern upon evaluating an alias, it is validated for cyclic reference.
| Alias also maintains result-cache, meaning upon using the same alias more than once, it’s not evaluating the same expression again(it will return the same result),instead it uses the result from cache.
| Of Course, per each new row the cache is invalidated.
Sending Query to RGW
--------------------
| Any http-client can send s3-select request to RGW, it must be compliant with `AWS Request syntax <https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#API_SelectObjectContent_RequestSyntax>`_.
| Sending s3-select request to RGW using AWS cli, should follow `AWS command reference <https://docs.aws.amazon.com/cli/latest/reference/s3api/select-object-content.html>`_.
|**Input serialization** (Implemented), it let the user define the CSV definitions; the default values are {\\n} for row-delimiter {,} for field delimiter, {"} for quote, {\\} for escape characters.
| it handle the **csv-header-info**, the first row in input object containing the schema.
|**Output serialization** is currently not implemented, the same for **compression-type**.
| s3-select engine contain a CSV parser, which parse s3-objects as follows.
| - each row ends with row-delimiter.
| - field-separator separates between adjacent columns, successive field separator define NULL column.
| - quote-character overrides field separator, meaning , field separator become as any character between quotes.
| - escape character disables any special characters, except for row delimiter.