Missing worksheets and page size issue when excel (.xlsx) convert to pdf (.pdf) using open office
Asked Answered
G

4

10

I have created an application using JodConverter and Open-Office for converting an excel(.xlsx) to PDF, The application works fine but i am facing two problems

  1. The pages of output PDF is in the form of A4 size, since because of that certain worksheet content have been sliced off. since i want each worksheet of the excel as complete as in one page what ever the size.

  2. The no of worksheets were missing, say if my excel has 8 worksheet i am getting only two or three within the PDF output

Even if we tried to convert to pdf directly from open-office, its giving the above similar issues

Excel File - ss1.xlsx

Output PDF - work.pdf

can anyone please tell me some solution for this

My code is as given below

public class MyConverter {

    public static void main(String[] args) throws ConnectException {
        File inputFile = new File("C:/Users/Work/Desktop/ss1.xlsx");
        File outputFile = new File("C:/Users/Work/Desktop/work.pdf");

        // connect to an OpenOffice.org instance running on port 8100
        OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
        connection.connect();

        // convert
        DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
        converter.convert(inputFile, outputFile);

        // close the connection
        connection.disconnect();
    }
Gleich answered 17/6, 2014 at 18:6 Comment(20)
Do you have a sample spreadsheet (the C:/Users/Work/Desktop/ss1.xlsx) ?Slavism
Use a service like dropboxSlavism
@Nirk here is the file dropbox.com/s/jauhsype3f204in/ss1.xlsxGleich
Excel 2011 and 2013 both report that the worksheet has 4 pages. Printing to pdf in both applications shows 21 pages, just like libreoffice (not sure if open office does the same). Can you confirm?Slavism
@Nirk when I used open office to convert to pdf I got two pages of A4sizes ......and certain areas were slicedGleich
@Nirk is there any solution for thisGleich
I can't reproduce on OSX. What is the exact version of OpenOffice that you are using?Slavism
@Nirk is that means there is no solution for thisGleich
What is the page size if you open the file from openoffice directly? Is it A4 or letter?Slavism
i am able to view completedly. why did you asked thatGleich
I should have clarified: go to format ... page, then in the Page tab there is a dropdown called "Format". What is selected in that line? Is it "Letter" or "A4"Slavism
Can you post the generated PDF?Slavism
@Nirk this is the pdf generated dropbox.com/s/jv1jiul0mefcn11/document.pdfGleich
@Nirk Have you seen my pdfGleich
@Nirk you are only one for giving any suggestions and help regarding this question....thanks a lot...Gleich
@Nirk have you seen my pdf...Gleich
Is "fit to page" set (in the page settings)?Slavism
How to do that, even if i export it as PDF from open office manually i am getting only two pages, which is sliced.....The pdf file which is been exported from the Export pdf option in open office tool is this dropbox.com/s/rsk4c9k4yoh7qbg/ss1s.pdfGleich
even when we export directly from open office the same similar issue is facingGleich
@Nirk any suggestion regarding with this....i have provided a 100 bountyGleich
B
2

I used the (free) PrimoPDF printer driver to create the PDF directly from within Excel. The large number of pages (20+) is due to the fact that the 'fit to page' print option is missing in one of the worksheets, the 3rd if i remember well. After correcting this, the command to print all worksheets still results in 2 PDF files to be generated. PrimoPDF asks twice for a file name while it should ask only one. I assume your program just generates the PDF corresponding to the first part since normally only one PDF should be generated. I have no explanation for the 2-part printing. It might be due to some print setting in one of the worksheets that forces the printing to be performed in 'two batches'. For instance, a different resolution value may prevent printing in one batch. Conclusion: the workaround is to print with PrimoPDF and concatenate the 2 PDF files using one of the freely available programs on the web. For a durable solution you'll have to verify the print settings of all worksheets in detail and make sure they are equal.

Busy answered 28/7, 2014 at 12:22 Comment(2)
I'm afraid the code shown in the question can't be modified to solve the problem. The problem lies in the xlsx file. I checked the page settings and found that in the last sheet (Sample Test 3) the Print Quality is left blank whereas all other sheets specify 600 DPI.Busy
(continued comment) So I changed the last sheet to 600 DPI and lo and behold, PrimoPDF generates just one PDF containing all sheets. I'm quite sure this change will also solve the problem with the program.Busy
B
2

I'm afraid my previous answer was not clear enough. So here is the essence:

  • All your worksheets except the 3rd have a resolution setting of 600. In the 3rd sheet the resolution is left blank
  • Change the resolution of the 3rd sheet to 600
  • The PDF file will now be generated normally, containing all sheets.

Apparently Excel can only produce PDFs where the page resolutions of all worksheets are the same. If it encounters a sheet with different (of blank) resolution it just stops producing output without giving any warning. IMHO this is a bug in Excel. Fortunately the workaround is easy.

Hope this clarifies my previous answer.

Busy answered 23/8, 2014 at 10:18 Comment(17)
so do i need to use PrimoPDF printer driverGleich
No, your program should work fine once you've set the resolution ('print quality' in page setup dialog) to 600 in the 3rd worksheet (tab page) of your Excel file. The point is that all worksheets must have the same resolution setting, otherwise Excel will not be able to produce a single PDF file.Busy
why the output PDF is in the form of A4 sizeGleich
The page setup setttings are not the same for all worksheets. Here are the important settings i found on the Excel file ss1.xlsx I just downloaded on Dropbox:Busy
(cont'd from previous comment) -- Sample Test 0: scale=fit to page, paper size Letter -- Sample Test 1: scale=fit to page, paper size Letter -- Sample Test 2: scale 100%, paper size A4 -- Sample Test 3: scale 100%, paper size A4 As you see the scaling and paper size are not the same for all sheets. Make all 4 page setups equal, e.g. scaling = fit to page and paper size = A4. I think this will produce an ok pdf file containing all sheets and no data clipped off.Busy
It's a quick manual job to make sure all page settings are equal. Of course automating is useful if you expect to receive a lot of files like this one.Busy
since the excel files are directly imported to the application for converting it into pdf, so we can't do those manual settings within the excels, rather it should be done programmeticallyGleich
Hey check my answer out, you can change all kinds of settings by following the Reference Models of Excel msdn.microsoft.com/en-us/library/office/…. Also by using Jacob it makes it easier, was the generated PDF in my answer what you were trying to achieve?Porbeagle
Afraid it would not be very useful on Linux or Mac, it uses activeX Components to connect to .dll files, but it does work very well on windows.Porbeagle
@user3884212 i changed the scaling to fit to page and paper size =A4, but still the open-office is generating PDF with two pagesGleich
@Alex Man I left .PrintQuality = 300 which corresponds to a blank entry. With this setting, PrimoPDF generated 2 files as well as your program. Apparently the value must be a validBusy
@Alex Man (cont'd) quality must be a valid DPI value. I changed .PrintQuality to 300 and now i'm getting a single PDF file! I think the rationale is as follows: in PDF, page settings are document global. This means that a document that contains 2 different resolutions must be generated as 2 files (= as 2 print jobs). BTW PDF does support multiple page sizes -> may be different if my reasoning ok! Hope the DPI fix will help, cross fingers...Busy
@alex Man The excel file is at dropbox Some remarks: --1-- The main macro that performs page setup for all worksheets is PageSetup_AllSheets --2-- You don't need primoPDF. I ran the macro with default printer set to my Canon --3-- Not all printers support explicit dpi values (such as 600) as printQuality. For instance my Canon won't. There is a second method to specify print quality using negative numbers. Values are: -4=high -3=medium -2=low. I used -3 (medium) in my macro.Busy
The Excel file i placed on Dropbox was not the correct one. Here is the UPDATE Unfortunately in the PDF output from PrimoPDF, the 3rd page is chopped off at the right, probably in the same way as reported by @PalinDrome555. A way to create multiline text that is not aligned with rows and avoids wrapping/merging is to create the text in a separate range and then using paste as image to the target location. For details see this Excel tipBusy
@Alex Man Please can you clarify to which xls setting you are referring? (all changes done using Excel, didn't use Word)Busy
What I meant is that whether you done those print settings ie 300 dpi and the page size to A4 using microsoft excel....since when I open the particular xls within openoffice I have got a message saying that the file contains macros.....also you have changed xlsx to xlsGleich
@Alex Man The print settings are indeed done using VBA code (=Excel macros). Unfortunately, the OpenOffice implementation of VBA is far from compatible with Excel. The change from .xlsx to .xls is because I used an older Excel version (Excel 2003). It's compatible except for some new features. Excel versions 2010 and newer produce .xlsx (xls extended) files by default. IMO MS Excel is still the best solution for automation (macros).Busy
P
1

You can use Microsoft Excel to do any .xlsx conversions to PDF, I am currently developing an application that uses Jacob, (Java Com Bridge) a thin wrapper for connecting to the Object Models of Microsoft Office Programs, afaik it doesn't support open-office, but it does a good job at converting your .xlsx file to a PDF file. It requires a bit of setting up. Link to Jacob

When looking into the problem I found in Excel -> Page Setup, if you change Fit to 1 pages wide and 1 pages tall, it squeezes each work sheet to fit on each page in PDF. Another problem I ran into was the wrap text property, be careful when you use it, as it can cause layout issues.

I made a small implementation of this, tested on Excel 2010 and Jacob 1.18

import com.jacob.activeX.ActiveXComponent;
import com.jacob.com.ComFailException;
import com.jacob.com.Dispatch;
import com.jacob.com.Variant;

public class ExcelApplication {

    private final String APP_NAME = "Excel.Application";

    private final ActiveXComponent excelApplication;

    private Dispatch workbooks;//all active workbooks opened

    private Dispatch activeWorkbook;//active workbook

    private Dispatch activeWorksheets;//all worksheets in active workbook

    public ExcelApplication() {
        excelApplication = new ActiveXComponent(APP_NAME);
    }

    public void openExcelFileInvisible(String fileName) {
        //Opens Excel in the background
        String fileUrl;
        if (excelApplication != null) {
            excelApplication.setProperty("Visible", new Variant(false));//sets excel invisible            
            //file url relative to this class
            //or you can just give an absolute path
            fileUrl = getClass().getResource(fileName).toExternalForm();
            //get workbooks
            workbooks = Dispatch.call(excelApplication, "Workbooks").getDispatch();
            if (activeWorkbook == null) {
                try {
                    activeWorkbook = Dispatch.call(workbooks, "Open", fileUrl).getDispatch();
                } catch (ComFailException comFailEx) {
                    //error opening the Excel Document
                }
            }
        }
    }

    public void closeActiveWorkbookAndSave() {
        try {
            //close and save change's to active workbook
            //this only closes the workbook, not Excel
            Dispatch.call(activeWorkbook, "Close", new Variant(true));
            //if you want to exit the Excel App.            
            //excelApplication.invoke("Quit", new Variant[0]);            
        } catch (ComFailException cfe) {
            //problem closing the workbook
        }
    }

    public void convert_XLSX_TO_PDF(String pdfFileName) {
        if (activeWorkbook != null) {
            String workbookName = Dispatch.call(activeWorkbook, "Name").getString();
            activeWorksheets = Dispatch.call(activeWorkbook, "Worksheets").getDispatch();
            int workSheetCount = Dispatch.call(activeWorksheets, "Count").getInt();
            System.out.println("Workbook Name =" + workbookName);
            System.out.println("Total Worksheets In Active Document = " + workSheetCount);
            System.out.println("Converting to PDF....");
            try {                
                Dispatch currentWorksheet;
                String currentWorksheetName;
                //worksheets not zero based, starts at one
                for (int i = 1; i < workSheetCount+1; i++) {
                    //get each active work sheet and set up the page setup settings
                    currentWorksheet = Dispatch.call(activeWorksheets, "Item", new Variant(i)).getDispatch();
                    currentWorksheetName = Dispatch.call(currentWorksheet, "Name").getString();
                    System.out.println("Setting up page setup for Workbook Sheet ("+ i + ".) - " + currentWorksheetName);
                    //Get page setup for each worksheet
                    Dispatch pageSetup = Dispatch.get(currentWorksheet, "PageSetup").getDispatch();
                    /**** Zoom must be set to false for FitToPagesWide and FitToPagesTall
                       to take control of scaling
                    */
                    Dispatch.put(pageSetup, "Zoom", new Variant(false));                    
                    //Fit content on each worksheet to fit in a single page                                        
                    Dispatch.put(pageSetup, "FitToPagesWide", new Variant(1));
                    Dispatch.put(pageSetup, "FitToPagesTall", new Variant(1));                    
                    //set print area to not chop off content
                    Dispatch.put(pageSetup, "PrintArea", new Variant(false));                    
                    //set left margin small
                    Dispatch.put(pageSetup, "LeftMargin", new Variant(0));                                     
                }
                //[3rd param] = 0 specifies PDF document, 1 is XPS format
                //[4th param] = 0 specifies high quality, 1 is low quality
                //[5th param] = true to keep document properties, false to ommit
                //[6th param] = true to keep print areas set, false does not keep print areas set 
                Dispatch.call(activeWorkbook, "ExportAsFixedFormat", new Variant(0), new Variant(pdfFileName), new Variant(0), new Variant(false), new Variant(true));
                System.out.println("Export to PDF has been successful.");
                //close and save
                closeActiveWorkbookAndSave();
            } catch (ComFailException comFailEx) {
                //Export Failed
                System.out.println("Export to PDF has failed");
            }
        }
    }

}

public class TestExcel {

    public static void main(String[] args) {
        // TODO code application logic here
        ExcelApplication e = new ExcelApplication();
        e.openExcelFileInvisible("ss1.xlsx");
        //full path accepted here or if not it will be exported to current directory
        e.convert_XLSX_TO_PDF("covertedXLSXFile.pdf");
    }

}

Here is the PDF File generated from the above code. Notice the 3rd page, the content is a bit chopped off, when you remove the wrap text property and merged cells, it generates fine. Converted XLSX

Porbeagle answered 1/8, 2014 at 21:12 Comment(0)
B
0

This is the Excel VBA code to set up the same page parameters for all worksheets. Sorry i'm not familiar with Openoffice programming, assume that the API is similar:

Sub PageSetup_AllSheets()
    Dim ws As Worksheet
    For Each ws In ActiveWorkbook.Worksheets
        ws.Activate
        Setup_Page
    Next
End Sub

Sub Setup_Page()
'
' Setup_Page Macro
' Macro recorded 27/08/2014 by Paul
'
    With ActiveSheet.PageSetup
        .PrintTitleRows = ""
        .PrintTitleColumns = ""
        .LeftHeader = ""
        .CenterHeader = ""
        .RightHeader = ""
        .LeftFooter = ""
        .CenterFooter = ""
        .RightFooter = ""
        .LeftMargin = Application.InchesToPoints(0.7)
        .RightMargin = Application.InchesToPoints(0.7)
        .TopMargin = Application.InchesToPoints(0.75)
        .BottomMargin = Application.InchesToPoints(0.75)
        .HeaderMargin = Application.InchesToPoints(0.3)
        .FooterMargin = Application.InchesToPoints(0.3)
        .PrintHeadings = False
        .PrintGridlines = False
        .PrintComments = xlPrintNoComments
        .PrintQuality = -3
        .CenterHorizontally = False
        .CenterVertically = False
        .Orientation = xlPortrait
        .Draft = False
        .PaperSize = xlPaperA4
        .FirstPageNumber = xlAutomatic
        .Order = xlDownThenOver
        .BlackAndWhite = False
        .Zoom = False
        .FitToPagesWide = 1
        .FitToPagesTall = 1
        .PrintErrors = xlPrintErrorsDisplayed
    End With
End Sub
Busy answered 24/8, 2014 at 14:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.