Lenses SQL Engine¶
Lenses SQL Engine, trusted by world class customers, provides an enterprise-grade streaming integration to query and process real-time data streams over Apache Kafka. It makes building reliable, scalable and fault tolerant data pipelines easy while providing a full visual topology of your flows. You can run both bound and continous queries to get real-time insights on your data. The engine is supported by a large ecosystem of integration points like a JDBC driver, Python and Golang clients, rest endpoints and command line interfaces.
Querying Kafka for data with SQL is as simple as querying a classic database:
SELECT * FROM payments WHERE card_type='VISA' AND amount > 10000 LIMIT 10
Here is how the SQL studio page looks.
Why SQL for Streaming & Apache Kafka
Real-time data processing has been a hot topic for years now, and there are a few frameworks providing SQL support when it comes to tackling real time streaming analytics. But none of them offered a native integration with Kafka.
Our goal is to open the framework to more users who can make use of their existing SQL skills and not have to worry about the complexity Apache Kafka brings. When using Apache Kafka, people need a way to easily visualize data and for example, to search for specific messages. Enterprize grade streaming requires more than just data visualization, it requires joining, filtering and aggregation of streams of data to provide real time insights and reports. To achieve that natively with Apache Kafka requires developer skillsets (Java, Scala, Kotlin, or any Java Virtual Machine based language) and quite a steep learning curve to master the Kafka Streams API.
SQL, as a declarative and expressive language, enables building sophisticated real-time analytics easily, allowing enterprizes to utilize employees existing skills and gets your projects into production faster.
In addition, Lenses goes a step further and takes care of deployment and monitoring of real-time stream processing with SQL, while meeting enterprize requirements for security, auditing and data governance. Integration with Kubernetes, the de facto container orchestration framework, provides the scalability and fault tolerance mission critical applications require.
Lenses SQL gives you all the necessary building blocks to operate real time data in production, giving leverage to your team to focus on the business requirements rather than the platform infrastructure and learning new technology. It simplifies real-time analytics by encapsulating the underlying complexity of the data management operations, and provides significant cost saving and much faster time to implementation, since majority of the streaming use cases can be expressed via SQL.
The engine for any data format and any structure
A Kafka record is made of a key-value pair, header and metadata (offset, topic, timestamp). When stored in a topic it can take various format: AVRO, JSON, Protobuf, any other custom format. The SQL engine has been written to handle out of the box AVRO, JSON, XML, CSV, STRING, INT, LONG and via extensions it can handle Google’s Protobuf or any format one may use.
Being agnostic to the data structure a user can address nested fields and arrays and provides out of the box functions to support string, date or number manipulation.
The query below is agnostic of the payload type. It works the same if the Kafka record is stored as AVRO, JSON, XML or even Google’s Protobuf.
SELECT speed , longitude AS long , latitude AS lat , current_timestamp as ts FROM car_IoT WHERE speed > 0
Auto-discovery of topics record format
Lenses platform does its best to understand what the record format and its data structures (for both key and value) on each topic in you Kafka cluster. It can work out if a topic contains AVRO, JSON, XML, INT or Binary. Distinguishing between String, Bytes, Int, Long is not possible without incurring a risk of falsely setting the payload. For these topics and the ones containing Google’s Protobuf or custom format, user input is required. Lenses user interface makes it easy to set the payloads - an operation required once. When there is no data yet on a topic Lenses does not have enough context to determine the payload type, hence it is required for the user to set both key and value payload.
The SQL engine for Apache Kafka which enterprises need
Using the SQL in an enterprise environment needs to adhere to access control, data access, compliance and regulation rules. This spans beyond the scope of securing access to the process executing the code. In a corporate environment people have access to some topics and are restricted to others. Lenses and its engine makes sure a user can only access a topic if the access control rules allow it.
Traceability is another requirement. It is paramount to know who has accessed the data and when. Therefore, all queries made are tracked and audited. Any action (provisioning, pausing or deleting) involving SQL streaming is also recorded and tracks the user, the action itself and the time.
Lenses platform empowers you with all the necessary protections, fine grained access and security, as well as data governance capabilities your organization needs.
Monitoring is built in
It’s important to be able to monitor any executing SQL be it bound or continous, and it’s meant to give your team leverage in ensuring they meet their SLAs.
The SQL engine tracks queries. At the moment only the Lenses-Cli will show you the current running queries (with information on how much data it has returned, how many records, for how long it run) and an admin has the power to terminate them. Future releases will bring this functionality in the friendly web interface.
SQL streaming is a first class citizen. It recieves special treatment and gets out of the box real time performance metrics, its own topology graph (to see how it manipulates the data) and each SQL processor (Kafka Streams application) will be rendered in the powerful topology graph.
What you get with Lenses SQL
Lenses SQL enables data enrichment, to enhance, refine or improve raw data, to make it correct and insightful to work with or even visualise. It supports all the Kafka Streams API joins ( stream-stream, stream-table, table-table)
INSERT INTO enriched_data SELECT STREAM o.orderNumber , o.status , o.flags , od.productCode FROM order_details AS od INNER JOIN orders AS o ON o.orderNumber = od.orderNumber GROUP BY sliding(2,s);
Streaming Ingestion with SQL capabilities
Once you have analyzed, enriched, or otherwise modified data, you will often want to write this data out into a storage system - you “sink” the data. As part of Lenses SQL, we enhance the Kafka Connectors configuration with an idiomatic SQL syntax to instruct the target systems with special options, simply expressed with SQL. All our Stream Reactor connectors for Kafka support SQL statement configurations.
- AVRO and JSON support
- Multiple Kafka topic to target mapping (for sinks)
- Multiple sources to target Kafka topic (for sources)
- Field selection, extraction, and filtering
- Auto creation & auto evolution for the target storage
- Error handling policies
Joins, filters and aggreates, all fully supported allow for realtime streaming analytics to be performed simply:
- Anomaly and Threat Detection
- Predictive Analytics
- Sentiment Analysis
INSERT INTO payments_fraud WITH tableCards AS ( SELECT * FROM credit_cards ) SELECT STREAM p.currency , sum(p.amount) as total , count(*) usage FROM payments AS p LEFT JOIN tableCards AS c ON p._key = c._key WHERE c.blocked is true GROUP BY tumble(1,m), p.currency
By continuously filtering and transforming data, Lenses ensures that data is ready for scoring.
- Continuously train models with real time data
- Integrate with tools data scientists love ie. Jupyter notebooks
- Visualise data and topologies
Data Scientists spend most of their time cleaning and preparing their data before start looking for insights. Lenses SQL makes it easy to spin up continuously running queries, over massive amount of data to easily transform into an easy to work with format.
- Perform pre-analysis to rectify, fill, and make use of records affected by any lack of data
- Build streaming SQL queries (sums, averages, anonymize functions, etc.)
- Preview live data as it flows through so that you can see filters and transformations continuously throughout the pipeline
- Python native library, Jupyter integration