GitHub
deepbox/dataframe

DataFrame

A tabular data structure with labeled columns. Each column stores homogeneous data (numbers, strings, or booleans) and all columns share the same row index. Supports chainable operations: selection, filtering, sorting, grouping, aggregation, joining, CSV I/O, and conversion to Tensors for numerical computing. All operations return new DataFrames (immutable).
new DataFrame
new DataFrame(data: Record<string, Array<number | string | boolean>>)

Create a DataFrame from a column-oriented object. Each key becomes a column name, each value is an array of column data. All arrays must have the same length. Column order is preserved from the object key insertion order.

Parameters:
data: Record<string, Array<number | string | boolean>> - Column name → values mapping. All arrays must be the same length.
DataFrame.fromCsvString
DataFrame.fromCsvString(csvString: string, opts?: { delimiter?: string; header?: boolean }): DataFrame

Parse a CSV string into a DataFrame. Auto-detects numeric columns and converts them from strings. If header is false, columns are named 'col_0', 'col_1', etc.

Properties

  • .columns: string[] — Ordered array of column names
  • .shape: [number, number] — [nRows, nColumns] tuple
  • .index: (string | number)[] — Row labels (defaults to 0, 1, 2, ...)
FunctionDescriptionExample
.select(columns: string[])Select columns by name → new DataFrame with only those columnsdf.select(['name', 'salary'])
.drop(columns: string[])Remove columns by name → new DataFrame without those columnsdf.drop(['id'])
.filter(fn: (row) => boolean)Keep rows where predicate returns truedf.filter(r => r.age > 25)
.head(n?: number)First n rows (default: 5)df.head(10)
.tail(n?: number)Last n rows (default: 5)df.tail(3)
.iloc(index: number)Single row by integer position → row objectdf.iloc(0)
.loc(row: string | number)Single row by index label → row objectdf.loc('row1')
.slice(start: number, end: number)Row range [start, end)df.slice(10, 20)
FunctionDescriptionExample
.sort(by: string | string[], ascending?: boolean)Sort by one or more columns. ascending defaults to true.df.sort('salary', false)
.sample(n: number, random_state?: number)Random sample of n rows without replacementdf.sample(50)
FunctionDescriptionExample
.rename(mapper: Record<string, string> | ((name: string) => string), axis?: 0 | 1)Rename columns (axis=1, default) or index labels (axis=0)df.rename({ name: 'full_name' })
.apply(fn: (series: Series) => Series, axis?: 0 | 1)Apply function to each column (axis=0) or row (axis=1)df.apply(s => s.map(x => Number(x) * 2))
.fillna(value: unknown)Replace null/undefined/NaN valuesdf.fillna(0)
.dropna()Remove rows containing any null/undefined/NaNdf.dropna()
.replace(toReplace: unknown | unknown[], value: unknown)Replace all occurrences of toReplace with value across all columnsdf.replace('NA', 'unknown')
.clip(lower?: number, upper?: number)Clip numeric values to [lower, upper] rangedf.clip(0, 100)
.isnull()Boolean DataFrame: true where values are null/undefined/NaNdf.isnull()
.duplicated(subset?: string[], keep?: 'first' | 'last' | false)Boolean Series marking duplicate rowsdf.duplicated(['name'])
FunctionDescriptionExample
.describe()Summary statistics (count, mean, std, min, 25%, 50%, 75%, max) for all numeric columns → DataFramedf.describe()
.quantile(q: number)Quantile value for each numeric column → Seriesdf.quantile(0.5)
.cov()Pairwise covariance matrix of numeric columns → DataFramedf.cov()
.corr()Pairwise Pearson correlation matrix of all numeric columns → DataFramedf.corr()
FunctionDescriptionExample
.groupBy(col: string)Group rows by distinct values of a column → GroupedDataFramedf.groupBy('department')
.groupBy(col).agg(spec)Aggregate each group. spec: { column: 'mean'|'sum'|'count'|'min'|'max'|'std' } → DataFramedf.groupBy('dept').agg({ salary: 'mean' })
.groupBy(col).apply(fn)Apply a function to each group's DataFrame → combined DataFramedf.groupBy('dept').apply(g => g.head(1))
.groupBy(col).sum() / .mean() / .min() / .max() / .std() / .count()Shorthand aggregation methods on grouped data → DataFramedf.groupBy('department').mean()
FunctionDescriptionExample
.merge(other, opts)SQL-style join. opts: { on: string, how: 'inner'|'left'|'right'|'outer' }df.merge(other, { on: 'id', how: 'left' })
.concat(other: DataFrame)Vertically stack two DataFrames (must have same columns)df.concat(df2)
.join(other, on: string)Join on a matching column (shorthand for inner merge)df.join(lookup, 'dept_id')
FunctionDescriptionExample
.toCsvString(opts?)Export as a CSV string with header row → stringfs.writeFileSync('out.csv', df.toCsvString())
.toCsv(path: string, opts?)Write DataFrame to a CSV file (async, Node.js)await df.toCsv('output.csv')
.toArray()Convert to array of row objects → Array<Record<string, any>>df.toArray().forEach(row => ...)
.toTensor()Convert all numeric columns to a 2D Tensor → Tensor [nRows, nCols]df.select(['age', 'salary']).toTensor()
.toString()Pretty-printed table string for console outputconsole.log(df.toString())
dataframe.ts
import { DataFrame } from "deepbox/dataframe";const df = new DataFrame({  name: ["Alice", "Bob", "Charlie", "David", "Eve"],  age: [25, 30, 35, 28, 22],  salary: [50000, 60000, 75000, 55000, 48000],  department: ["IT", "HR", "IT", "HR", "IT"],});console.log(df.shape);    // [5, 4]console.log(df.columns);  // ['name', 'age', 'salary', 'department']// ── Selection & filtering ──df.select(["name", "salary"]);           // 2-column DataFramedf.filter((row) => row.age > 25);       // 3 rows matchingdf.head(3);                              // First 3 rows// ── Sorting ──df.sort("salary", false);                // Highest salary firstdf.sort(["department", "salary"]);       // Multi-column sort// ── GroupBy aggregation ──const byDept = df.groupBy("department").agg({  salary: "mean",  age: "max",});console.log(byDept.toString());// ── Statistics ──console.log(df.describe().toString()); // count, mean, std, min, 25%, 50%, 75%, maxdf.corr();                              // Correlation matrix of numeric columns// ── I/O ──const csv = df.toCsvString();const restored = DataFrame.fromCsvString(csv);const t = df.select(["age", "salary"]).toTensor(); // shape: [5, 2]