Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Database

This section will explore how to work with SQL databases in Rust. It relies on the optional SQL section in the Data chapter of the setup, where a PostgreSQL server was set up and the Census data was loaded.

Direct queries

You can direct query the data using the appropriate crate: PostgreSQL, MySql, Sqlite, MSSQL, Oracle. Other databases should also be available through these crates, such as Mariadb (MySql), ClickHouse (MySql), Redshift (PostgreSQL), Azure SQL Database (MSSql). Run this code using cargo run -r --example 2_4_1_postgresql.

use postgres::{Client, NoTls};

// Connect to postgresql
let mut client = Client::connect("host=localhost user=postgres", NoTls).unwrap();

// Query the database, returns a vector of rows
let data = client
    .query("select count(*) as count from census", &[])
    .unwrap();

// Get the first data point from the first row
let data_point: i64 = data[0].get(0);

println!("{data_point}")

note

Note that these SQL libraries generally return row-oriented data, where Polars (using the Arrow memory model) uses column-oriented data. This makes moving data between SQL and Arrow complex.

Using this method, each type of databases will have their own special connection code and return their own specific data. It's useful for queries with simple outputs, but hard to work with large outputs. This is where the ConnectorX library comes in.

SQL to Polars

Using ConnectorX, you can move data from SQL servers to Polars with get_arrow().polars(). It will return a DataFrame (that can be converted to the latest version of Polars with df-interchange as explained in the concepts section of the setup). Run this code using cargo run -r --example 2_4_2_sql_to_polars.

use connectorx::prelude::*;
use std::convert::TryFrom;

// Connect to PostgreSQL through the ConnectorX
let source_conn =
    SourceConn::try_from("postgresql://postgres:postgres@localhost:5432").unwrap();

// Prepare query (london, aged 15 years and under)
let query = &[CXQuery::from(
    "SELECT * FROM census WHERE region = 'E12000007' and age_group = 1",
)];

// ConnectorX query PostgreSQL and return Polars object
let df = get_arrow(&source_conn, None, query, None)
    .unwrap()
    .polars()
    .unwrap();

// Print table
println!("{df}");

This example will move the "London" (region = 'E12000007') 0 to 15 years old (age_group = 1) data from the SQL server into memory as a DataFrame. Further manipulations or analysis can be done on this data using Polars. Loading the entirety of the server data into memory is generally not desired. As in this example, it is possible to pre-filter or to summarize the data using an SQL query. As a more complex example, you can collect the average income, by region, using the following query: "SELECT region, avg(income)::float FROM census group by region". This can then be converted into a DataFrame with .polars().

// Prepare query
let query = &[CXQuery::from(
    "SELECT region, avg(income)::float FROM census group by region",
)];

// ConnectorX query PostgreSQL and return Polars object
let df = get_arrow(&source_conn, None, query, None)
    .unwrap()
    .polars()
    .unwrap();

// Print table
println!("{df}");

This will return a DataFrame.

shape: (10, 2)
┌───────────┬──────────────┐
│ region    ┆ avg          │
│ ---       ┆ ---          │
│ str       ┆ f64          │
╞═══════════╪══════════════╡
│ E12000001 ┆ 55032.641723 │
│ E12000002 ┆ 54990.207499 │
│ E12000003 ┆ 55114.538138 │
│ E12000004 ┆ 54806.553317 │
│ E12000005 ┆ 55269.042885 │
│ E12000006 ┆ 55091.448925 │
│ E12000007 ┆ 54888.739957 │
│ E12000008 ┆ 55153.381614 │
│ E12000009 ┆ 55028.588204 │
│ W92000004 ┆ 55042.71009  │
└───────────┴──────────────┘