Main Tutorials

PDFBox – How to read PDF file in Java

This article shows you how to use Apache PDFBox to read a PDF file in Java.

1. Get PDFBox

pom.xml

<dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.6</version>
</dependency>

2. Print PDF file

Example to extract all text from a PDF file.

ReadPdf.java

package com.mkyong;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import java.io.File;
import java.io.IOException;

public class ReadPdf {

    public static void main(String[] args) throws IOException {

        try (PDDocument document = PDDocument.load(new File("/path-to/abc.pdf"))) {

            document.getClass();

            if (!document.isEncrypted()) {
			
                PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                stripper.setSortByPosition(true);

                PDFTextStripper tStripper = new PDFTextStripper();

                String pdfFileInText = tStripper.getText(document);
                //System.out.println("Text:" + st);

				// split by whitespace
                String lines[] = pdfFileInText.split("\\r?\\n");
                for (String line : lines) {
                    System.out.println(line);
                }

            }

        }

    }
}
Note
Please refer to this pdfbox svn for more examples

References

  1. Apache PDFBox
  2. iText – Read and Write PDF in Java

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
13 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Ankita Tapadia
6 years ago

Hi Mykong. I am getting following error on the same problem statement. Could you please guide me on a resolution?

org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 10: unable to resolve class org.apache.pdfbox.text.PDFTextStripper
@ line 10, column 3.
import org.apache.pdfbox.text.PDFTextStripper;
^

1 error

at org.webharvest.runtime.scripting.GroovyScriptEngine.eval(GroovyScriptEngine.java:138) ~[workfusion-webharvest-core.jar:na]
at org.webharvest.runtime.processors.ScriptProcessor.execute(ScriptProcessor.java:74) ~[workfusion-webharvest-core.jar:na]
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127) ~[workfusion-webharvest-core.jar:na]
at org.webharvest.runtime.Scraper.execute(Scraper.java:169) ~[workfusion-webharvest-core.jar:na]
at org.webharvest.runtime.Scraper.execute(Scraper.java:182) ~[workfusion-webharvest-core.jar:na]
at com.workfusion.studio.rpa.wf.webharvest.launch.StudioWebHarvestTaskExecutor.execute(StudioWebHarvestTaskExecutor.java:108) ~[com.workfusion.studio.wf_8.4.0.jar:na]
at com.workfusion.studio.rpa.wf.webharvest.launch.SingleThreadWebHarvestProcess.processTaskInputs(SingleThreadWebHarvestProcess.java:75) ~[com.workfusion.studio.wf_8.4.0.jar:na]
at com.workfusion.studio.rpa.wf.webharvest.launch.SingleThreadWebHarvestProcess.start(SingleThreadWebHarvestProcess.java:44) ~[com.workfusion.studio.wf_8.4.0.jar:na]
at com.workfusion.studio.rpa.wf.webharvest.launch.WebHarvestMainLauncher.launch(WebHarvestMainLauncher.java:83) ~[com.workfusion.studio.wf_8.4.0.jar:na]
at com.workfusion.studio.rpa.wf.webharvest.launch.WebHarvestMainLauncher.main(WebHarvestMainLauncher.java:141) ~[com.workfusion.studio.wf_8.4.0.jar:na]
Caused by: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:

sankar
2 years ago

i am unable to import the load method plz help me on this

Manmaya
4 years ago

Hi,
Thanks for posting this, is there any way to determine the font name and its size in the particular line of text.

goyal
2 years ago

Getting these exceptions :

at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:89)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:394)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:322)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:269)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:233)
at PdfToConsole.main(PdfToConsole.java:35)
Caused by: java.lang.ClassNotFoundException: org.apache.fontbox.FontBoxFont
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

Rituja
4 years ago

I can’t read colum wise pdf document. It read row wise data. For reading the column wise data what we have to do

SHANKAR
4 years ago

hi how to read non editable image pdf iwithout installing software in java

Mohamad Basuki
4 years ago

Hi Bro, how read multipage pdf, thanks 🙂

Sid
4 years ago

Hi Mykong, I have to covert PDF file to HTML and for this I need a java code to fetch formatting of the PDF as well along with the text. For example tables, images, forms etc. Please guide me.
Thanks.

Anonymous
4 years ago
Reply to  Sid

https://www.baeldung.com/pdf-conversions-java

refer this one it might be useful for you.

kumar
4 years ago

Thanks for the help

srinivas
6 years ago

how can get the font style for each line in pdf using pdfbox

Smarty
6 years ago

yes really its worty

Shafqat Shafi
5 years ago

Thanks a lot for such a neat and informative article.