Example 03
intermediate
03
DataFrame
Statistics
Plotting
EDA

Data Analysis & Visualization

This example demonstrates a full exploratory data analysis (EDA) workflow on an employee dataset with 20 records. You create a DataFrame, compute descriptive statistics using deepbox/stats, group employees by department with groupBy().agg(), filter for high earners, and compute a correlation matrix. The example produces four SVG visualizations: scatter plot, histogram, bar chart, and heatmap.

Deepbox Modules Used

deepbox/dataframedeepbox/ndarraydeepbox/statsdeepbox/plot

What You Will Learn

  • Build DataFrames from plain objects and inspect with .shape, .columns, .head()
  • Compute descriptive stats (mean, std) on extracted tensor columns
  • Group rows by a column and aggregate with .groupBy().agg()
  • Filter rows with arbitrary predicate functions
  • Compute correlation matrices and generate SVG plots server-side

Source Code

03-data-analysis/index.ts
1import { mkdirSync, writeFileSync } from "node:fs";2import { DataFrame } from "deepbox/dataframe";3import { tensor } from "deepbox/ndarray";4import { Figure } from "deepbox/plot";5import { corrcoef, mean, std } from "deepbox/stats";67const expectNumber = (value: unknown): number => {8  if (typeof value !== "number") {9    throw new Error("Expected number");10  }11  return value;12};1314const expectNumberArray = (value: unknown): number[] => {15  if (!Array.isArray(value) || value.some((v) => typeof v !== "number")) {16    throw new Error("Expected number[]");17  }18  return value;19};2021console.log("=".repeat(60));22console.log("Example 1: Data Analysis & Visualization");23console.log("=".repeat(60));2425mkdirSync("docs/examples/03-data-analysis/output", { recursive: true });2627// Create a DataFrame with employee information28const employeeData = new DataFrame({29  name: [30    "Alice",31    "Bob",32    "Charlie",33    "David",34    "Eve",35    "Frank",36    "Grace",37    "Henry",38    "Ivy",39    "Jack",40    "Kate",41    "Leo",42    "Mia",43    "Noah",44    "Olivia",45    "Paul",46    "Quinn",47    "Rachel",48    "Sam",49    "Tina",50  ],51  department: [52    "Engineering",53    "Sales",54    "Engineering",55    "HR",56    "Engineering",57    "Sales",58    "Marketing",59    "Engineering",60    "HR",61    "Sales",62    "Engineering",63    "Marketing",64    "Sales",65    "Engineering",66    "HR",67    "Sales",68    "Engineering",69    "Marketing",70    "Engineering",71    "Sales",72  ],73  salary: [74    95000, 65000, 105000, 55000, 98000, 72000, 68000, 110000, 58000, 70000, 102000, 71000, 67000,75    115000, 60000, 69000, 108000, 73000, 112000, 66000,76  ],77  experience: [5, 3, 8, 2, 6, 4, 3, 10, 2, 5, 7, 4, 3, 12, 3, 4, 9, 5, 11, 3],78  age: [28, 25, 32, 24, 30, 27, 26, 35, 24, 29, 31, 28, 26, 38, 27, 28, 34, 30, 36, 26],79});8081// Display dataset overview82console.log("\n📊 Dataset Overview");83console.log("-".repeat(60));84console.log(`Total Employees: ${employeeData.shape[0]}`);85console.log(`Columns: ${employeeData.columns.join(", ")}`);8687// Show first few rows88console.log("\n📋 First 5 Rows:");89console.log(employeeData.head(5).toString());9091// Calculate descriptive statistics92console.log("\n📈 Statistical Summary");93console.log("-".repeat(60));9495// Extract columns as arrays for analysis96const salaries = expectNumberArray(employeeData.get("salary").toArray());97const experiences = expectNumberArray(employeeData.get("experience").toArray());98const ages = expectNumberArray(employeeData.get("age").toArray());99100// Convert arrays to tensors for statistical operations101const salaryTensor = tensor(salaries);102const expTensor = tensor(experiences);103104// Calculate salary statistics105const salaryMean = Number(mean(salaryTensor).data[0]);106const salarySd = Number(std(salaryTensor).data[0]);107108console.log(`Salary Statistics:`);109console.log(`  Mean: $${salaryMean.toFixed(2)}`);110console.log(`  Std Dev: $${salarySd.toFixed(2)}`);111console.log(`  Min: $${Math.min(...salaries)}`);112console.log(`  Max: $${Math.max(...salaries)}`);113114// Calculate experience statistics115const expMean = Number(mean(expTensor).data[0]);116const expSd = Number(std(expTensor).data[0]);117118console.log(`\nExperience Statistics:`);119console.log(`  Mean: ${expMean.toFixed(1)} years`);120console.log(`  Std Dev: ${expSd.toFixed(1)} years`);121122// Group by department and calculate averages123console.log("\n🏢 Department Analysis");124console.log("-".repeat(60));125126// GroupBy operation to aggregate by department127const deptGroups = employeeData.groupBy("department");128const deptStats = deptGroups.agg({129  salary: "mean",130  experience: "mean",131});132133console.log("Average Salary by Department:");134console.log(deptStats.toString());135136// Filter data based on conditions137console.log("\n🔍 Filtering Examples");138console.log("-".repeat(60));139140// Find employees earning over $100k141const highEarners = employeeData.filter((row) => expectNumber(row.salary) > 100000);142console.log(`High Earners (>$100k): ${highEarners.shape[0]} employees`);143console.log(highEarners.select(["name", "department", "salary"]).toString());144145// Filter by department146const engineeringDept = employeeData.filter((row) => row.department === "Engineering");147console.log(`\nEngineering Department: ${engineeringDept.shape[0]} employees`);148149// Analyze correlations between variables150console.log("\n📊 Correlation Analysis");151console.log("-".repeat(60));152153// Create matrix for correlation analysis154const dataMatrix = tensor([salaries, experiences, ages]);155const correlationMatrix = corrcoef(dataMatrix);156157console.log("Correlation Matrix (Salary, Experience, Age):");158console.log(correlationMatrix.toString());159160// Generate visualizations161console.log("\n🎨 Creating Visualizations");162console.log("-".repeat(60));163164// 1. Scatter plot showing relationship between experience and salary165console.log("1. Scatter Plot: Salary vs Experience");166const fig1 = new Figure();167const ax1 = fig1.addAxes();168ax1.scatter(expTensor, salaryTensor, { color: "#1f77b4", size: 8 });169ax1.setTitle("Salary vs Experience");170ax1.setXLabel("Years of Experience");171ax1.setYLabel("Salary ($)");172const svg1 = fig1.renderSVG();173writeFileSync("docs/examples/03-data-analysis/output/salary-vs-experience.svg", svg1.svg);174console.log("   ✓ Saved: output/salary-vs-experience.svg");175176// 2. Histogram showing salary distribution177console.log("2. Histogram: Salary Distribution");178const fig2 = new Figure();179const ax2 = fig2.addAxes();180ax2.hist(salaryTensor, 8, { color: "#2ca02c" });181ax2.setTitle("Salary Distribution");182ax2.setXLabel("Salary ($)");183ax2.setYLabel("Frequency");184const svg2 = fig2.renderSVG();185writeFileSync("docs/examples/03-data-analysis/output/salary-distribution.svg", svg2.svg);186console.log("   ✓ Saved: output/salary-distribution.svg");187188// 3. Bar chart comparing departments189console.log("3. Bar Chart: Average Salary by Department");190// Calculate average salary for each department191const depts = ["Engineering", "Sales", "Marketing", "HR"];192const avgSalaries = depts.map((dept) => {193  const deptData = employeeData.filter((row) => row.department === dept);194  const deptSalaries = expectNumberArray(deptData.get("salary").toArray());195  return Number(mean(tensor(deptSalaries)).data[0]);196});197198const fig3 = new Figure();199const ax3 = fig3.addAxes();200ax3.bar(tensor([0, 1, 2, 3]), tensor(avgSalaries), {201  color: "#ff7f0e",202  edgecolor: "#000000",203});204ax3.setTitle("Average Salary by Department");205ax3.setXLabel("Department");206ax3.setYLabel("Average Salary ($)");207const svg3 = fig3.renderSVG();208writeFileSync("docs/examples/03-data-analysis/output/dept-salaries.svg", svg3.svg);209console.log("   ✓ Saved: output/dept-salaries.svg");210211// 4. Heatmap visualizing correlations212console.log("4. Heatmap: Correlation Matrix");213const fig4 = new Figure();214const ax4 = fig4.addAxes();215ax4.heatmap(correlationMatrix, { vmin: -1, vmax: 1 });216ax4.setTitle("Correlation Matrix");217const svg4 = fig4.renderSVG();218writeFileSync("docs/examples/03-data-analysis/output/correlation-heatmap.svg", svg4.svg);219console.log("   ✓ Saved: output/correlation-heatmap.svg");220221// Summary of findings222console.log("\n💡 Key Insights");223console.log("-".repeat(60));224console.log("• Engineering has the highest average salary");225console.log("• Strong positive correlation between experience and salary");226console.log("• Age shows moderate correlation with both salary and experience");227console.log("• Salary distribution shows clustering around $70k and $105k");228229console.log("\n✅ Analysis Complete!");230console.log("=".repeat(60));

Console Output

$ npx tsx 03-data-analysis/index.ts
============================================================
Example 1: Data Analysis & Visualization
============================================================

📊 Dataset Overview
------------------------------------------------------------
Total Employees: 20
Columns: name, department, salary, experience, age

📋 First 5 Rows:
      name   department  salary  experience  age
0    Alice  Engineering   95000           5   28
1      Bob        Sales   65000           3   25
2  Charlie  Engineering  105000           8   32
3    David           HR   55000           2   24
4      Eve  Engineering   98000           6   30

📈 Statistical Summary
------------------------------------------------------------
Salary Statistics:
  Mean: $81950.00
  Std Dev: $20215.03
  Min: $55000
  Max: $115000

Experience Statistics:
  Mean: 5.5 years
  Std Dev: 3.0 years

🏢 Department Analysis
------------------------------------------------------------
Average Salary by Department:
    department              salary          experience
0  Engineering              105625                 8.5
1        Sales   68166.66666666667  3.6666666666666665
2           HR  57666.666666666664  2.3333333333333335
3    Marketing   70666.66666666667                   4

🔍 Filtering Examples
------------------------------------------------------------
High Earners (>$100k): 6 employees
       name   department  salary
 2  Charlie  Engineering  105000
 7    Henry  Engineering  110000
10     Kate  Engineering  102000
13     Noah  Engineering  115000
16    Quinn  Engineering  108000
18      Sam  Engineering  112000

Engineering Department: 8 employees

📊 Correlation Analysis
------------------------------------------------------------
Correlation Matrix (Salary, Experience, Age):
tensor([[1, 1.000, 1.000, ..., 1.000, 1.000, 1.000]
       [1.000, 1, 1.000, ..., 1.000, 1.000, 1.000]
       [1.000, 1.000, 1, ..., 1.000, 1.000, 1.000]
       ...
       [1.000, 1.000, 1.000, ..., 1, 1.000, 1.000]
       [1.000, 1.000, 1.000, ..., 1.000, 1, 1.000]
       [1.000, 1.000, 1.000, ..., 1.000, 1.000, 1]], dtype=float64)

🎨 Creating Visualizations
------------------------------------------------------------
1. Scatter Plot: Salary vs Experience
   ✓ Saved: output/salary-vs-experience.svg
2. Histogram: Salary Distribution
   ✓ Saved: output/salary-distribution.svg
3. Bar Chart: Average Salary by Department
   ✓ Saved: output/dept-salaries.svg
4. Heatmap: Correlation Matrix
   ✓ Saved: output/correlation-heatmap.svg

💡 Key Insights
------------------------------------------------------------
• Engineering has the highest average salary
• Strong positive correlation between experience and salary
• Age shows moderate correlation with both salary and experience
• Salary distribution shows clustering around $70k and $105k

✅ Analysis Complete!
============================================================