How to make tables/spreadsheets (e.g. pandas DataFrame) observable, use triggers or change events?
Asked Answered
C

2

2

Is there a way to make pandas Dataframes observable?

Lets say I have some Dataframes A, B, C that have the same index and C is calculated by

C = A + B

enter image description here

If I change a cell value in A (e.g. 2 => 4), I would like to automatically update the corresponding cell in C (e.g. 22 => 24).

=> How can I listen to changes in Dataframe A?

Is there some observable Dataframe, triggers or events I could use for that? If that is not possible with pandas Dataframes, is there some other observable table structure I could use?

Or would I need to implement my own spreadsheets?

Additional information:

The index of my tables might consist of multiple columns. There might be multiple value columns, too.

It would be great if I could use that data structure together with RxPy, so that I can create reactive pipes for table calculations.

My calculation pipes/chains might contain hundreds of tables. That's why I don't want to use Excel.

If it is hard to do such things in Python, I am also open for solutions based on JavaScript.

Related stuff

Here is some information I found so far. PyQt tables might be a way to go, because they support change events. However, I am looking for something with "less overhead" for my calculations.

Censurable answered 1/7, 2020 at 6:52 Comment(0)
C
0

Not a solution for Python, but I found some promising JavaScript libraries:

Based on that libraries I am experimenting with pipes for observable tables here:

https://github.com/stefaneidelloth/rxtable

import * as d3 from 'd3';
import { range, Observable } from 'rxjs';

import DatabaseFactory from './database/databaseFactory';
import KeyContext from './table/keyContext';
import DataContext from './table/dataContext';
import { add } from './operator/operators';

run();

async function run() {
  const databaseFactory = new DatabaseFactory();
  const database = await databaseFactory.create('project');

  const keyContext = new KeyContext(['scenario_id','country_id']);
  const dataContext = new DataContext(['y2020','y2030','y2040']);

  const inputTable = await database.createTable('input', keyContext, dataContext);

  const inputRow = {
    scenario_id: 0,
    country_id: 0,
    y2020: 1,
    y2030: 2,
    y2040: 3,
  };

  let newValue = 4;
  async function buttonClicked() {
    inputRow.y2040 = newValue;
    await inputTable.update(inputRow);
    newValue++;
  }

  d3.select('#root')
    .append('button')
    .text('Click me')
    .on('click', () => buttonClicked());

  // doc for Observable API:
  // https://rxjs.dev/api/index/class/Observable

  // doc for pipe-able operators:
  // https://rxjs.dev/guide/operators
  // https://rxjs.dev/api/operators
  inputTable.pipe(
    add('sum', 66)
  )
  .subscribe({
    initialized(table) {
      console.log(`table: ${table.name}`);
    },
    rowAdded(newRow) {
      console.log(`rowAdded`);
    },
    rowChanged(oldRow, newRow) {
      console.log(`rowChanged`);
    },
    error(error) { console.error(`error: ${error}`); },
    complete() { console.log('done'); },
  });

  await inputTable.push(inputRow);

  await inputTable.push({
    scenario_id: 1,
    country_id: 1,
    y2020: 11,
    y2030: 12,
    y2040: 13,
  });

}

I am still interested in solutions for Python, too. If you know one, please let me know. For those who prefer JavaScript over Python, it might be interesting that someone already started to port pandas to JavaScript: https://stratodem.github.io/pandas.js-docs/#introduction.

Censurable answered 7/7, 2020 at 6:12 Comment(0)
A
0

Two years later, I still can't find any reactive/observable tools for DataFrames that fits use cases such as these. So your question prompted me to grab this specific bull by the horns. Below is a small SimpleCell class, where every cell consists of (1) a recalc function and (2) a number of source cells for that recalc function.

# see https://github.com/kleynjan/cellopype/blob/main/src/cellopype/helpers.py for deep_eq function

class SimpleCell:
    def __init__(
        self,
        sources: list = None,
        recalc: callable = None
    ):
        self.sources = sources or []  # list of cells
        self.recalc_handler = recalc or (lambda *args: None)
        self.value = None
        self._previous = None
        self.sinks = []
        for cell in self.sources:
            cell.sinks.append(self)
        self.recalc()

    def recalc(self):
        self._previous = self.value
        self.value = self.recalc_handler(*[src.value for src in self.sources])
        if not deep_eq(self.value, self._previous):
            for s in self.sinks:
                s.recalc()

That allows us to build a small reactive network of cells mirroring your question:

dfA = pd.DataFrame([1, 2, 3], columns=["value"])
dfB = pd.DataFrame([10, 20, 30], columns=["value"])
cell_a = SimpleCell(recalc=lambda: dfA.copy())
cell_b = SimpleCell(recalc=lambda: dfB.copy())

cell_a and cell_b are 'recalculated' immediately upon initialization. In practice you might put a read_excel or read_csv in the recalc function for source cells such as these.

cell_c is defined as their sum:

cell_c = MyCell(recalc=pd.DataFrame.add, sources=[cell_a, cell_b])

print(cell_c.value)
    value
0   11
1   22
2   33

Changing cell_a will trigger updates across the pipeline:

dfA.loc[0,'value']=100
cell_a.recalc()

print(cell_c.value)
    value
0   110
1    22
2    33

In case you're interested, I've developed this concept into a separate module called cellopype, with a Cell class that offers lazy execution and subscription and a Pype class to handle a collection of cells. Much more info, source and examples at github.com/kleynjan/cellopype

Alain answered 13/11, 2022 at 16:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.