One issue with using LLMs in spreadsheets is the high number of tokens they process. To address this, Vole developed SheetCompressor, an "innovative encoding framework that compresses spreadsheets effectively for LLMs."
This framework improves performance in spreadsheet table detection tasks by 25.6 per cent in GPT-4's in-context learning setting.
The model consists of three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation.
Structural anchors help the LLM understand the spreadsheet better by removing distant, homogeneous rows and columns to create a condensed version. Index translation deals with spreadsheets with many empty cells and repetitive values, optimising token usage while preserving data integrity.
Vole found that SheetCompressor reduces token usage for spreadsheet encoding by 96 per cent.
It claimed that SpreadsheetLLM also shows "exceptional performance in spreadsheet table detection," a foundational task for understanding spreadsheets.
The new LLM uses the Chain of Thought methodology to introduce "Chain of Spreadsheet" (CoS), which decomposes spreadsheet reasoning into a table detection-match-reasoning pipeline.
This approach aims to make spreadsheet interactions more intelligent and efficient. The model's details are available in a pre-print paper.
One advantage of all this is that it could significantly enhance user interactions with spreadsheets.
Vole believes this innovation will pave the way for more efficient data management and analysis.