Extract images from PDF fails for embedded JPEGs

java.lang.NoClassDefFoundError: com/sun/image/codec/jpeg/JPEGCodec
	at org.jpedal.io.ObjectStore.saveStoredJPEGImage(Unknown Source)
	at org.jpedal.io.ObjectStore.saveStoredImage(Unknown Source)
	at org.jpedal.parser.PdfStreamDecoder.processImage(Unknown Source)
	at org.jpedal.parser.PdfStreamDecoder.processImageXObject(Unknown Source)
	at org.jpedal.parser.PdfStreamDecoder.DO(Unknown Source)
	at org.jpedal.parser.PdfStreamDecoder.processToken(Unknown Source)
	at org.jpedal.parser.PdfStreamDecoder.decodeStreamIntoObjects(Unknown Source)
	at org.jpedal.parser.PdfStreamDecoder.decodePageContent(Unknown Source)
	at org.jpedal.PdfDecoder.decodePage(Unknown Source)
	at sc.fiji.io.Extract_Images_From_PDF.run(Extract_Images_From_PDF.java:39)
	at ij.IJ.runUserPlugIn(IJ.java:217)
	at ij.IJ.runPlugIn(IJ.java:181)
	at ij.Executer.runCommand(Executer.java:137)
	at ij.Executer.run(Executer.java:66)
	at java.lang.Thread.run(Thread.java:745)
1 Like

Dear @albertcardona,

has the problem been solved since you first asked? I tried to run the command just now and it has worked as expected for me on Java 8 with the Java-8 update site enabled.

The JPEGCodec has been removed in Java 7 and it seems the plugin has been updated since then – couldn’t find a reference to it in Extract_Images_From_PDF.java (otherwise it wouldn’t be working for me). Are you by chance running Fiji on Java 8 with the Java-8 site disabled?

Best,
Stefan

2 Likes

Hi @stelfrich,

The java-8 site is enabled. I’ve not modified this fiji instance in anyway other than by updating it to today. The error persists. The issue clearly is in the jpedal library, according to the stack trace. So the jpedal library needs patching to cope with the lack of JPEGCodec in java-8.

The PDF throwing the errors:
http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0031314&type=printable

The figures of the paper don’t open, only the small icons with the PLoS ONE logo do.

Albert

You are right @albertcardona, my mistake!

I have tried to replaced JPedal with Apache PDFBox on a new branch: https://github.com/stelfrich/IO/tree/replace-jpedal. This works fine for me with the PDF that you have provided. However, to completely remove the outdated dependency on JPedal, we’ll have to invest some time to refactor PDF_Viewer as well.

1 Like

@stelfrich, @albertcardona: I can successfully extract all images from the PDF you provided on a up-to-date, non-modified Fiji installation (i.e., just subscribed to the IJ, Fiji and Java 8 update sites) on MacOS 10.12.3 / Java 1.8.0_66 (The version bundled in Fiji). I’ve pasted the output of org.scijava.plugins.commands.debug.SystemInformation here.
I extract images from PDFs all the time, would hate to see it broken, let me know if I can help.

@tferr: I would swear that I had the same issue as @albertcardona but I can’t reproduce it now (neither on a Windows 10 machine with an up-to-date Fiji).

@albertcardona: Which OS are you running? Could you provide the System Information as well (Plugins > Debug > System Information)?

Alright, I was able to reproduce the error. Imports from com.sun.* and com.oracle.* are discouraged but are functional (in most of the cases) if you are running an Oracle JRE / JDK. Since JPedal has such imports, I guess @albertcardona is running Fiji with an OpenJDK?

A short-term solution is to use the Oracle JRE that comes with Fiji. We could, however, think about replacing JPedal with e.g. PDFBox to support OpenJDK.

2 Likes