Excel

In the data analysis world, Excel is still the universal aggregate statistics exchange format and quick analysis tool. It is much simpler to send a fellow researcher an Excel file with some summary statistics or a few hundred rows of data than it is to send pretty much any other data format. With the following code, you can export any data from Polars to Excel, format it as needed and even add plots.

Run this code using cargo run -r --example 5_1_1_excel.

Setup

First, lets create some summary statistics to throw into the excel file. We will create a table of mean income by region and economic activity type, in a long format (e.g. 3 columns: “region”, “econ” and “income”) and wide format (e.g. “region” as row and “econ” as column). Both df_long and df_wide are brought into memory as DataFrame.

use polars::prelude::pivot::pivot_stable;
use polars::prelude::*;
use polars_excel_writer::PolarsExcelWriter;
use rust_xlsxwriter::{Chart, ChartLegendPosition, ChartType, Workbook};

// Connect to LazyFrame
let args = ScanArgsParquet::default();
let lf = LazyFrame::scan_parquet(PlPath::from_str("./data/large/partitioned"), args).unwrap();

// Filter
let lf: LazyFrame = lf
    .filter(col("keep_type").eq(lit(1))) // Usual resident
    .filter(col("income").is_not_null());

// Mean income by region and economic activity type
let df_long = lf
    .clone()
    .group_by([col("region"), col("econ")])
    .agg([col("income").mean().round(2, RoundMode::HalfAwayFromZero)])
    .sort(["region", "econ"], Default::default())
    .with_column(col("region").replace_strict(
        lit(Series::from_iter(vec![
            "E12000001",
            "E12000002",
            "E12000003",
            "E12000004",
            "E12000005",
            "E12000006",
            "E12000007",
            "E12000008",
            "E12000009",
            "W92000004",
        ])),
        lit(Series::from_iter(vec![
            "North East",
            "North West",
            "Yorkshire and The Humber",
            "East Midlands",
            "West Midlands",
            "East of England",
            "London",
            "South East",
            "South West",
            "Wales",
        ])),
        None,
        Some(DataType::String),
    ))
    .with_column(col("econ").replace_strict(
        lit(Series::from_iter(vec![1, 2, 3, 4])),
        lit(Series::from_iter(vec![
            "Employee",
            "Self-employed",
            "Unemployed",
            "Full-time student",
        ])),
        None,
        Some(DataType::String),
    ))
    .collect()
    .unwrap();

// Pivot by economic activity type
let df_wide = pivot_stable(
    &df_long,
    ["econ"],
    Some(["region"]),
    Some(["income"]),
    false,
    None,
    None,
)
.unwrap();

println!("Long:\n{df_long}");

println!("Wide:\n{df_wide}");

Long:

shape: (40, 3)
┌────────────┬───────────────────┬──────────┐
│ region     ┆ econ              ┆ income   │
│ ---        ┆ ---               ┆ ---      │
│ str        ┆ str               ┆ f64      │
╞════════════╪═══════════════════╪══════════╡
│ North East ┆ Employee          ┆ 55257.6  │
│ North East ┆ Self-employed     ┆ 54819.89 │
│ North East ┆ Unemployed        ┆ 55263.29 │
│ North East ┆ Full-time student ┆ 54537.93 │
│ North West ┆ Employee          ┆ 54967.04 │
│ …          ┆ …                 ┆ …        │
│ South West ┆ Full-time student ┆ 55387.28 │
│ Wales      ┆ Employee          ┆ 55091.85 │
│ Wales      ┆ Self-employed     ┆ 55160.3  │
│ Wales      ┆ Unemployed        ┆ 53905.93 │
│ Wales      ┆ Full-time student ┆ 55820.75 │
└────────────┴───────────────────┴──────────┘

Wide:

shape: (10, 5)
┌──────────────────────────┬──────────┬───────────────┬────────────┬───────────────────┐
│ region                   ┆ Employee ┆ Self-employed ┆ Unemployed ┆ Full-time student │
│ ---                      ┆ ---      ┆ ---           ┆ ---        ┆ ---               │
│ str                      ┆ f64      ┆ f64           ┆ f64        ┆ f64               │
╞══════════════════════════╪══════════╪═══════════════╪════════════╪═══════════════════╡
│ North East               ┆ 55257.6  ┆ 54819.89      ┆ 55263.29   ┆ 54537.93          │
│ North West               ┆ 54967.04 ┆ 55386.59      ┆ 55798.25   ┆ 54996.17          │
│ Yorkshire and The Humber ┆ 54990.37 ┆ 54984.78      ┆ 54432.41   ┆ 55206.45          │
│ East Midlands            ┆ 54583.73 ┆ 54791.05      ┆ 55793.84   ┆ 55471.23          │
│ West Midlands            ┆ 55306.15 ┆ 55257.88      ┆ 56108.3    ┆ 54742.85          │
│ East of England          ┆ 54797.87 ┆ 55034.25      ┆ 56679.84   ┆ 56950.45          │
│ London                   ┆ 54789.13 ┆ 54922.37      ┆ 55346.13   ┆ 55049.89          │
│ South East               ┆ 55151.04 ┆ 55352.0       ┆ 54001.1    ┆ 54065.4           │
│ South West               ┆ 55179.63 ┆ 54966.94      ┆ 53915.38   ┆ 55387.28          │
│ Wales                    ┆ 55091.85 ┆ 55160.3       ┆ 53905.93   ┆ 55820.75          │
└──────────────────────────┴──────────┴───────────────┴────────────┴───────────────────┘

Excel

To write this data to Excel, we can use the polars_excel_writer crate to write Polars data from a DataFrame into Excel. This crate uses the rust_xlsxwriter crate for this, and we can use the other options in the rust_xlsxwriter crate to do anything you can do in Excel.

First, lets create an Excel workbook and write our df_long to the “long” worksheet. Note that nothing has been saved yet, but the screenshots are taken as if it had been saved. The workbook is currently in-memory and will be written at the end of this section.

// Create a new Excel writer.
let mut excel_writer = PolarsExcelWriter::new();

// Create workbook
let mut workbook = Workbook::new();

// Write long table to "long" worksheet
let ws_long = workbook.add_worksheet().set_name("long").unwrap();
excel_writer
    .write_dataframe_to_worksheet(&df_long, ws_long, 0, 0)
    .unwrap();

Long Excel data

Next, we can add a second worksheet, called “wide” with the wide data from df_wide:

// Write wide table to "wide" worksheet
let ws_wide = workbook.add_worksheet().set_name("wide").unwrap();
excel_writer
    .write_dataframe_to_worksheet(&df_wide, ws_wide, 0, 0)
    .unwrap();

Wide Excel data

Now that we have the data into excel, we can use rust_xlsxwriter to manipulate the worksheet and add anything. Here, we add a bar chart based on the data from df_wide, found in the “wide” worksheet:

// Add a chart sheet for the "wide" data

// Get some info to limit the size and shape of the graph (e.g. rows, min/max values)
let row_num = df_wide.shape().0;
let col_num = df_wide.shape().1;
let min_val: f64 = df_long
    .column("income")
    .unwrap()
    .as_series()
    .unwrap()
    .min()
    .unwrap()
    .unwrap();
let max_val: f64 = df_long
    .column("income")
    .unwrap()
    .as_series()
    .unwrap()
    .max()
    .unwrap()
    .unwrap();

// Iterate throw rows to create multiple Bars
let mut chart = Chart::new(ChartType::Bar);
for i in 1..col_num {
    chart
        .add_series()
        .set_name(("wide", 0, i as u16)) // Name of region found in row 1, column i
        .set_categories(("wide", 1, 0, row_num as u32, 0)) // Category (econ) found in column 1, row 1-year_count
        .set_values(("wide", 1, i as u16, row_num as u32, i as u16)); // FR, FC, LR, LC
}
chart.x_axis().set_name("Economic Activity Type");
chart
    .y_axis()
    .set_name("Income")
    .set_min((min_val - 1.0).round())
    .set_max((max_val + 1.0).round());
chart.legend().set_position(ChartLegendPosition::Bottom);
ws_wide.insert_chart(1, 6, &chart).unwrap();

Excel graph based on wide data

Lastly, we can save all of this to the ./data/output/income.xlsx folder:

// Save the file to disk.
workbook.save("./data/output/income.xlsx").unwrap();

Keyboard shortcuts

Data Analysis in Rust

Excel

Setup

Excel