Data Engine Python Library
Overview
The Datatailr Data Engine Python library provides a high-level interface for connecting to, querying, and interacting with the Datatailr Data Engine. It is designed to make it easy for Python developers and data scientists to execute SQL queries, fetch results, and convert data into popular Python data structures such as pandas DataFrames, Polars DataFrames, and Arrow Tables.
The library abstracts away the complexities of authentication, connection management, and HTTP session handling, allowing users to focus on data analysis and manipulation.
Key Features
- Seamless Connection: Automatically connects to the Datatailr Data Engine using environment and user context.
- SQL Execution: Run SQL queries directly from Python.
- Flexible Result Fetching: Fetch results row-by-row or all at once.
- DataFrame Conversion: Convert query results to pandas, Polars, or Arrow for further analysis.
- Ibis Integration: Optionally connect using Ibis for advanced analytics and interoperability.
- Secure and Transparent: Handles authentication and session management under the hood.
Typical Use Cases
- Data Exploration: Quickly run ad-hoc queries and analyze results in Jupyter notebooks or scripts.
- ETL Pipelines: Integrate with data pipelines to extract, transform, and load data using SQL.
- Reporting and Analytics: Fetch data for dashboards, reports, or machine learning workflows.
- Interoperability: Use with pandas, Polars, or Arrow for seamless integration with the Python data ecosystem.
How It Works
The library provides a DataEngine
class that manages the connection to the Datatailr Data Engine. When instantiated, it:
- Authenticates the user and sets up a secure HTTP session.
- Connects to the Data Engine using the appropriate host and port.
- Exposes methods to execute SQL queries and fetch results.
- Provides conversion utilities to transform results into popular data formats.
Example Workflow
from dt.data_engine import DataEngine
# Initialize the Data Engine client
engine = DataEngine()
# Execute a SQL query
engine.execute("SELECT * FROM sales_data LIMIT 10")
# Fetch all results as a pandas DataFrame
df = engine.to_pandas()
print(df)
# Or fetch results as a Polars DataFrame
pl_df = engine.to_polars()
# Or as a PyArrow Table
arrow_table = engine.to_arrow()
Advanced Usage
-
Ibis Connection: For advanced analytics, you can obtain an Ibis connection:
ibis_conn = engine.ibis_connection(catalog="my_catalog", schema="public") table = ibis_conn.table("sales_data") result = table.filter(table.amount > 1000).execute()
-
Custom Query Execution: Use
fetch_one()
orfetch_all()
for fine-grained control over result fetching.
When to Use the Data Engine Library
- When you need to interact with the Datatailr Data Engine from Python.
- When you want to leverage SQL for data analysis but need results in Python-native formats.
- When you want to integrate Datatailr data with pandas, Polars, or Arrow-based workflows.
Summary
The Datatailr Data Engine Python library bridges the gap between Datatailr’s powerful data backend and the Python data science ecosystem. It simplifies data access, accelerates analytics, and enables seamless integration with modern data tools.
For more details, see the API Reference or explore the source code.
from dt.data_engine import DataEngine
de = DataEngine()
# Python iterator
response = de.execute("SHOW CATALOGS")
print(response.description()) # Response description
print(response.column_names()) # Response column names
catalog = response.fetch_one()
while catalog != None:
print(catalog)
catalog = response.fetch_one()
# Python list
catalogs = de.execute("SHOW CATALOGS").fetch_all()
print(catalogs)
# Pandas
catalogs = de.execute("SHOW CATALOGS").to_pandas()
print(catalogs)
# Polars
catalogs = de.execute("SHOW CATALOGS").to_polars()
print(catalogs)
# Arrow
catalogs = de.execute("SHOW CATALOGS").to_arrow()
print(catalogs)
Updated 1 day ago