How to speed up autosizing columns in apache POI?
Asked Answered
S

5

33

I use the following code in order to autosize columns in my spreadsheet:

for (int i = 0; i < columns.size(); i++) {
   sheet.autoSizeColumn(i, true);
   sheet.setColumnWidth(i, sheet.getColumnWidth(i) + 600);
}

The problem is it takes more than 10 minutes to autosize each column in case of large spreadsheets with more than 3000 rows. It goes very fast for small documents though. Is there anything which could help autosizing to work faster?

Skipbomb answered 24/9, 2013 at 13:37 Comment(4)
sheet.autoSizeColumn(i, false) might be a little fasterFylfot
It's slow because it has to calculate the width of that cell in every row of your document. Any chance you could change your logic so you write the first few rows, size based on just those, then add the rest of the rows in later?Tiedeman
@Fylfot absolutely agree, according the source code of the poi library, it iterates through all merged regions every time it calculates the width of every single cellSkipbomb
@Tiedeman unfortunately it was not possible in this case, because large cells could come in any row later onSkipbomb
S
52

Solution which worked for me:

It was possible to avoid merged regions, so I could iterate through the other cells and finally autosize to the largest cell like this:

int width = ((int)(maxNumCharacters * 1.14388)) * 256;
sheet.setColumnWidth(i, width);

where 1.14388 is a max character width of the "Serif" font and 256 font units.

Performance of autosizing improved from 10 minutes to 6 seconds.

Skipbomb answered 25/9, 2013 at 14:9 Comment(8)
how can you get "maxNumCharacters" ?Deckard
@ToKra it's up to you. I simply calculate number of characters in each cell of column and find the largest, than apply it in formulaSkipbomb
@antken, how to calculate the "max character width" to another fonts?Guttle
From apache API: This process can be relatively slow on large sheets, so this should normally only be called once per column, at the end of your processing.Illuminance
May I ask why 256?Condonation
Because the width is set in units of 1/256th of character width. see this: poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/…Marketplace
sheet.setColumnWidth(i, Math.min(width, 255 * 256));Appertain
@Skipbomb +1 long ago, it works great in most cases. It does fail slightly when a bold font is used. Any chance you could explain how to calculate the 1.14388 constant, where does it comes from? I do realize this is a 10 years old post :( Thank you!Loire
G
2

The autoSizeColumn function itself works not perfect and some columns width not exactly fit the data inside. So, I found some solution that works for me.

  1. To avoid crazy calculations let give that to autoSizeColumn() function:
   sheet.autoSizeColumn(<columnIndex>);
  1. Now, our column autosized by library but we wont to add a little bit more to the current column width to make table looks fine:
   // get autosized column width
   int currentColumnWidth = sheet.getColumnWidth(<columnIndex>);

   // add custom value to the current width and apply it to column
   sheet.setColumnWidth(<columnIndex>, (currentColumnWidth + 2500));
  1. The full function could looks like:
   public void autoSizeColumns(Workbook workbook) {
        int numberOfSheets = workbook.getNumberOfSheets();
        for (int i = 0; i < numberOfSheets; i++) {
            Sheet sheet = workbook.getSheetAt(i);
            if (sheet.getPhysicalNumberOfRows() > 0) {
                Row row = sheet.getRow(sheet.getFirstRowNum());
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    int columnIndex = cell.getColumnIndex();
                    sheet.autoSizeColumn(columnIndex);
                    int currentColumnWidth = sheet.getColumnWidth(columnIndex);
                    sheet.setColumnWidth(columnIndex, (currentColumnWidth + 2500));
                }
            }
        }
    }

P.S. Thanks Ondrej Kvasnovsky for the function https://mcmap.net/q/159201/-apache-poi-excel-how-to-configure-columns-to-be-expanded

Goldstone answered 19/3, 2020 at 7:59 Comment(0)
E
0

The autosizeColumn() function very slow and unneficient. Even authors of apache POI mentioned in docs, that:

This process can be relatively slow on large sheets, ...

Calculating and setting the cell's width manually is way faster - in my case I reduced the time from ~25,000ms to ~1-5ms.

This is how to achieve it (I was basing on Vladimir Shcherbukhin's answer:

Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet();
final int[] maxNumCharactersInColumns = new int[headers.length]; // maximum number of characters in columns. Necessary to calculate the cell width in most efficient way. sheet.autoSizeColumn(...) is very slow.

Row headersRow = sheet.createRow(0);
CellStyle headerStyle = createHeadersStyle(workbook); // createHeadersStyle() is my own function. Create headers style if you want

for (int i = 0; i < headers.length; i++) { // create headers
        Cell headerCell = headersRow.createCell(i, CELL_TYPE_STRING);
        headerCell.setCellValue(headers[i]);
        headerCell.setCellStyle(headerStyle);

        int length = headers[i].length();
        if (maxNumCharactersInColumns[i] < length) { // adjust the columns width
            maxNumCharactersInColumns[i] = length + 2; // you can add +2 if you have filtering enabled on your headers
        }
}

int rowIndex = 1;
    for (List<Object> rowValues : rows) {
        Row row = sheet.createRow(rowIndex);

        int columnIndex = 0;
        for (Object value : rowValues) {
            Cell cell = createRowCell(row, value, columnIndex); // createRowCell() is my own function.

            int length;
            if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
                String cellValue = cell.getStringCellValue();

                // this is quite important part. In some excel spreadsheet you can have a values with line-breaks. It'll be cool to handle that scenario :)
                String[] arr = cellValue.split("\n"); // if cell contains complex value with line breaks, calculate only the longest line
                length = Arrays.stream(arr).map(String::length).max(Integer::compareTo).get();
            } else {
                length = value != null ? value.toString().length() : 0;
            }

            if (maxNumCharactersInColumns[columnIndex] < length) { // if the current cell value is the longest one, save it to an array
                maxNumCharactersInColumns[columnIndex] = length;
            }

            columnIndex++;
        }
        rowIndex++;
    }

    for (int i = 0; i < headers.length; i++) {
        int width = (int) (maxNumCharactersInColumns[i] * 1.45f) * 256; // 1.45f <- you can change this value
        sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cell width
    }

    sheet.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.length - 1));

    ByteArrayOutputStream output = new ByteArrayOutputStream();
    workbook.write(output);
    workbook.close();
Ephebe answered 1/9, 2022 at 14:46 Comment(0)
B
0

Unfortunately I don't have enough reputations yet to add comments in answers. So here some annotations:

  • When using Row row = sheet.getRow(sheet.getFirstRowNum()); be shure, this row contains at least a value in the last column. Otherwise the cellIterator will end too early, i.e. if a subsequent row has a value in this column, this column will not be autosized. This problem is bypassed if rowcontains the headers (names of the columns). Or explicit use a known header row, e.g.
    int indexOfHeaderRow = ...;
    ...
    Row row = sheet.getRow(indexOfHeaderRow);
  • Jakub Słowikowski
    sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cellwidth
    I'm not shure about this line because there is no information about content of MAX_CELL_WIDTH - perhaps overall maximum? So I used instead:
    sheet.setColumnWidth(i, Math.max(width, 2048));
    2048 seams to be the default width? This value prevents extremely narrow widths for empty columns.
Bryon answered 5/1, 2023 at 10:35 Comment(0)
H
0

The biggest problem of XSSFSheet I encountered is that it doesn't have a funcion to return all values of specific column, you have to iterate all by row, so when you specify column, it actually iterates over all rows and columns. That probably where the slowness comes from I made my own implementation where it iterates over all columns and rows only onece, and then change the columns specified in listOfColumns, I also added maxWidth for cases where you work with long texts, so you don't end up with crazy widths. It would probably be good idea to add implementation for font width as Zz'Rot have in his implementation

private static void autoSizeColumns(XSSFSheet sheet, Integer maxWidth, List<Integer> listOfColumns) {
    HashMap<Integer,Integer> sizesArray = new HashMap<>();
    for (Row row : sheet) {
        for (Cell cell : row) {
            int column = cell.getColumnIndex();
            int oldMaxSize = Optional.ofNullable(sizesArray.get(column)).orElse(0);
            int newMaxSize = switch (cell.getCellType()) {
                case STRING,FORMULA ->
                        Math.max(oldMaxSize, cell.getStringCellValue().length());
                case NUMERIC ->
                        Math.max(oldMaxSize, Double.toString(cell.getNumericCellValue()).length());
                case BOOLEAN ->
                        Math.max(oldMaxSize, Boolean.toString(cell.getBooleanCellValue()).length());
                case ERROR ->
                        Math.max(oldMaxSize, Byte.toString(cell.getErrorCellValue()).length());
                default ->
                        oldMaxSize;
            };
            if(oldMaxSize != newMaxSize)
                sizesArray.put(column, newMaxSize);
        }
    }
    if(listOfColumns == null){
        for (int i = 0; i < sizesArray.size(); i++) {
            int width = sizesArray.get(i) * 256;
            if (maxWidth != null && width > maxWidth) {
                width = maxWidth;
            }
            sheet.setColumnWidth(i, width);
        }
        return;
    }
    for (Integer column : listOfColumns) {
        Integer size = sizesArray.get(column);
        int width = sizesArray.get(size) * 256;
        if (maxWidth != null && width > maxWidth) {
            width = maxWidth;
        }
        sheet.setColumnWidth(column, width);
    }
}
Harper answered 26/11, 2023 at 22:19 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.