03
DataFrame
Statistics
Plotting
EDA
Data Analysis & Visualization
This example demonstrates a full exploratory data analysis (EDA) workflow on an employee dataset with 20 records. You create a DataFrame, compute descriptive statistics using deepbox/stats, group employees by department with groupBy().agg(), filter for high earners, and compute a correlation matrix. The example produces four SVG visualizations: scatter plot, histogram, bar chart, and heatmap.
Deepbox Modules Used
deepbox/dataframedeepbox/ndarraydeepbox/statsdeepbox/plotWhat You Will Learn
- Build DataFrames from plain objects and inspect with .shape, .columns, .head()
- Compute descriptive stats (mean, std) on extracted tensor columns
- Group rows by a column and aggregate with .groupBy().agg()
- Filter rows with arbitrary predicate functions
- Compute correlation matrices and generate SVG plots server-side
Source Code
03-data-analysis/index.ts
1import { mkdirSync, writeFileSync } from "node:fs";2import { DataFrame } from "deepbox/dataframe";3import { tensor } from "deepbox/ndarray";4import { Figure } from "deepbox/plot";5import { corrcoef, mean, std } from "deepbox/stats";67const expectNumber = (value: unknown): number => {8 if (typeof value !== "number") {9 throw new Error("Expected number");10 }11 return value;12};1314const expectNumberArray = (value: unknown): number[] => {15 if (!Array.isArray(value) || value.some((v) => typeof v !== "number")) {16 throw new Error("Expected number[]");17 }18 return value;19};2021console.log("=".repeat(60));22console.log("Example 1: Data Analysis & Visualization");23console.log("=".repeat(60));2425mkdirSync("docs/examples/03-data-analysis/output", { recursive: true });2627// Create a DataFrame with employee information28const employeeData = new DataFrame({29 name: [30 "Alice",31 "Bob",32 "Charlie",33 "David",34 "Eve",35 "Frank",36 "Grace",37 "Henry",38 "Ivy",39 "Jack",40 "Kate",41 "Leo",42 "Mia",43 "Noah",44 "Olivia",45 "Paul",46 "Quinn",47 "Rachel",48 "Sam",49 "Tina",50 ],51 department: [52 "Engineering",53 "Sales",54 "Engineering",55 "HR",56 "Engineering",57 "Sales",58 "Marketing",59 "Engineering",60 "HR",61 "Sales",62 "Engineering",63 "Marketing",64 "Sales",65 "Engineering",66 "HR",67 "Sales",68 "Engineering",69 "Marketing",70 "Engineering",71 "Sales",72 ],73 salary: [74 95000, 65000, 105000, 55000, 98000, 72000, 68000, 110000, 58000, 70000, 102000, 71000, 67000,75 115000, 60000, 69000, 108000, 73000, 112000, 66000,76 ],77 experience: [5, 3, 8, 2, 6, 4, 3, 10, 2, 5, 7, 4, 3, 12, 3, 4, 9, 5, 11, 3],78 age: [28, 25, 32, 24, 30, 27, 26, 35, 24, 29, 31, 28, 26, 38, 27, 28, 34, 30, 36, 26],79});8081// Display dataset overview82console.log("\n📊 Dataset Overview");83console.log("-".repeat(60));84console.log(`Total Employees: ${employeeData.shape[0]}`);85console.log(`Columns: ${employeeData.columns.join(", ")}`);8687// Show first few rows88console.log("\n📋 First 5 Rows:");89console.log(employeeData.head(5).toString());9091// Calculate descriptive statistics92console.log("\n📈 Statistical Summary");93console.log("-".repeat(60));9495// Extract columns as arrays for analysis96const salaries = expectNumberArray(employeeData.get("salary").toArray());97const experiences = expectNumberArray(employeeData.get("experience").toArray());98const ages = expectNumberArray(employeeData.get("age").toArray());99100// Convert arrays to tensors for statistical operations101const salaryTensor = tensor(salaries);102const expTensor = tensor(experiences);103104// Calculate salary statistics105const salaryMean = Number(mean(salaryTensor).data[0]);106const salarySd = Number(std(salaryTensor).data[0]);107108console.log(`Salary Statistics:`);109console.log(` Mean: $${salaryMean.toFixed(2)}`);110console.log(` Std Dev: $${salarySd.toFixed(2)}`);111console.log(` Min: $${Math.min(...salaries)}`);112console.log(` Max: $${Math.max(...salaries)}`);113114// Calculate experience statistics115const expMean = Number(mean(expTensor).data[0]);116const expSd = Number(std(expTensor).data[0]);117118console.log(`\nExperience Statistics:`);119console.log(` Mean: ${expMean.toFixed(1)} years`);120console.log(` Std Dev: ${expSd.toFixed(1)} years`);121122// Group by department and calculate averages123console.log("\n🏢 Department Analysis");124console.log("-".repeat(60));125126// GroupBy operation to aggregate by department127const deptGroups = employeeData.groupBy("department");128const deptStats = deptGroups.agg({129 salary: "mean",130 experience: "mean",131});132133console.log("Average Salary by Department:");134console.log(deptStats.toString());135136// Filter data based on conditions137console.log("\n🔍 Filtering Examples");138console.log("-".repeat(60));139140// Find employees earning over $100k141const highEarners = employeeData.filter((row) => expectNumber(row.salary) > 100000);142console.log(`High Earners (>$100k): ${highEarners.shape[0]} employees`);143console.log(highEarners.select(["name", "department", "salary"]).toString());144145// Filter by department146const engineeringDept = employeeData.filter((row) => row.department === "Engineering");147console.log(`\nEngineering Department: ${engineeringDept.shape[0]} employees`);148149// Analyze correlations between variables150console.log("\n📊 Correlation Analysis");151console.log("-".repeat(60));152153// Create matrix for correlation analysis154const dataMatrix = tensor([salaries, experiences, ages]);155const correlationMatrix = corrcoef(dataMatrix);156157console.log("Correlation Matrix (Salary, Experience, Age):");158console.log(correlationMatrix.toString());159160// Generate visualizations161console.log("\n🎨 Creating Visualizations");162console.log("-".repeat(60));163164// 1. Scatter plot showing relationship between experience and salary165console.log("1. Scatter Plot: Salary vs Experience");166const fig1 = new Figure();167const ax1 = fig1.addAxes();168ax1.scatter(expTensor, salaryTensor, { color: "#1f77b4", size: 8 });169ax1.setTitle("Salary vs Experience");170ax1.setXLabel("Years of Experience");171ax1.setYLabel("Salary ($)");172const svg1 = fig1.renderSVG();173writeFileSync("docs/examples/03-data-analysis/output/salary-vs-experience.svg", svg1.svg);174console.log(" ✓ Saved: output/salary-vs-experience.svg");175176// 2. Histogram showing salary distribution177console.log("2. Histogram: Salary Distribution");178const fig2 = new Figure();179const ax2 = fig2.addAxes();180ax2.hist(salaryTensor, 8, { color: "#2ca02c" });181ax2.setTitle("Salary Distribution");182ax2.setXLabel("Salary ($)");183ax2.setYLabel("Frequency");184const svg2 = fig2.renderSVG();185writeFileSync("docs/examples/03-data-analysis/output/salary-distribution.svg", svg2.svg);186console.log(" ✓ Saved: output/salary-distribution.svg");187188// 3. Bar chart comparing departments189console.log("3. Bar Chart: Average Salary by Department");190// Calculate average salary for each department191const depts = ["Engineering", "Sales", "Marketing", "HR"];192const avgSalaries = depts.map((dept) => {193 const deptData = employeeData.filter((row) => row.department === dept);194 const deptSalaries = expectNumberArray(deptData.get("salary").toArray());195 return Number(mean(tensor(deptSalaries)).data[0]);196});197198const fig3 = new Figure();199const ax3 = fig3.addAxes();200ax3.bar(tensor([0, 1, 2, 3]), tensor(avgSalaries), {201 color: "#ff7f0e",202 edgecolor: "#000000",203});204ax3.setTitle("Average Salary by Department");205ax3.setXLabel("Department");206ax3.setYLabel("Average Salary ($)");207const svg3 = fig3.renderSVG();208writeFileSync("docs/examples/03-data-analysis/output/dept-salaries.svg", svg3.svg);209console.log(" ✓ Saved: output/dept-salaries.svg");210211// 4. Heatmap visualizing correlations212console.log("4. Heatmap: Correlation Matrix");213const fig4 = new Figure();214const ax4 = fig4.addAxes();215ax4.heatmap(correlationMatrix, { vmin: -1, vmax: 1 });216ax4.setTitle("Correlation Matrix");217const svg4 = fig4.renderSVG();218writeFileSync("docs/examples/03-data-analysis/output/correlation-heatmap.svg", svg4.svg);219console.log(" ✓ Saved: output/correlation-heatmap.svg");220221// Summary of findings222console.log("\n💡 Key Insights");223console.log("-".repeat(60));224console.log("• Engineering has the highest average salary");225console.log("• Strong positive correlation between experience and salary");226console.log("• Age shows moderate correlation with both salary and experience");227console.log("• Salary distribution shows clustering around $70k and $105k");228229console.log("\n✅ Analysis Complete!");230console.log("=".repeat(60));Console Output
$ npx tsx 03-data-analysis/index.ts
============================================================
Example 1: Data Analysis & Visualization
============================================================
📊 Dataset Overview
------------------------------------------------------------
Total Employees: 20
Columns: name, department, salary, experience, age
📋 First 5 Rows:
name department salary experience age
0 Alice Engineering 95000 5 28
1 Bob Sales 65000 3 25
2 Charlie Engineering 105000 8 32
3 David HR 55000 2 24
4 Eve Engineering 98000 6 30
📈 Statistical Summary
------------------------------------------------------------
Salary Statistics:
Mean: $81950.00
Std Dev: $20215.03
Min: $55000
Max: $115000
Experience Statistics:
Mean: 5.5 years
Std Dev: 3.0 years
🏢 Department Analysis
------------------------------------------------------------
Average Salary by Department:
department salary experience
0 Engineering 105625 8.5
1 Sales 68166.66666666667 3.6666666666666665
2 HR 57666.666666666664 2.3333333333333335
3 Marketing 70666.66666666667 4
🔍 Filtering Examples
------------------------------------------------------------
High Earners (>$100k): 6 employees
name department salary
2 Charlie Engineering 105000
7 Henry Engineering 110000
10 Kate Engineering 102000
13 Noah Engineering 115000
16 Quinn Engineering 108000
18 Sam Engineering 112000
Engineering Department: 8 employees
📊 Correlation Analysis
------------------------------------------------------------
Correlation Matrix (Salary, Experience, Age):
tensor([[1, 1.000, 1.000, ..., 1.000, 1.000, 1.000]
[1.000, 1, 1.000, ..., 1.000, 1.000, 1.000]
[1.000, 1.000, 1, ..., 1.000, 1.000, 1.000]
...
[1.000, 1.000, 1.000, ..., 1, 1.000, 1.000]
[1.000, 1.000, 1.000, ..., 1.000, 1, 1.000]
[1.000, 1.000, 1.000, ..., 1.000, 1.000, 1]], dtype=float64)
🎨 Creating Visualizations
------------------------------------------------------------
1. Scatter Plot: Salary vs Experience
✓ Saved: output/salary-vs-experience.svg
2. Histogram: Salary Distribution
✓ Saved: output/salary-distribution.svg
3. Bar Chart: Average Salary by Department
✓ Saved: output/dept-salaries.svg
4. Heatmap: Correlation Matrix
✓ Saved: output/correlation-heatmap.svg
💡 Key Insights
------------------------------------------------------------
• Engineering has the highest average salary
• Strong positive correlation between experience and salary
• Age shows moderate correlation with both salary and experience
• Salary distribution shows clustering around $70k and $105k
✅ Analysis Complete!
============================================================