Paperclip uploads for office files (docx,pptx) are being downloaded as zip files?
Asked Answered
P

6

13

I'm using the following for file uploading: Rails 3.2, Paperclip (3.0.4), aws-sdk (1.5.2) & jQuery-File-Upload

Problem is office files like (pptx) are being downloaded as zip files not pptx files. Here is what I see in the logs:

Started POST
Processing by AttachmentsController#create as JS
  Parameters: {"files"=>[#<ActionDispatch::Http::UploadedFile:0x007fa1d5bee960 @original_filename="test1.pptx", @content_type="application/vnd.openxmlformats-officedocument.presentationml.presentation", @headers="Content-Disposition: form-data; name=\"files[]\"; filename=\"test1.pptx\"\r\nContent-Type: application/vnd.openxmlformats-officedocument.presentationml.presentation\r\n", @tempfile=#<File:/var/folders/rm/89l_3yt93g31p22738hqydmr0000gn/T/RackMultipart20120529-10443-1ljhigq>>]}
.....


SQL (1.4ms)  INSERT INTO "attachments" ("attachment_content_type", "attachment_file_name", "attachment_file_size", "attachment_file_title", "attachment_updated_at", "created_at", "deleted", "room_id", "pinned", "updated_at", "user_id") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) RETURNING "id"  [["attachment_content_type", "application/zip"], ["attachment_file_name", "test1_1338339249.pptx"], ["attachment_file_size", 150329], ["attachment_file_title", "test1.pptx"], ["attachment_updated_at", Wed, 30 May 2012 00:54:09 UTC +00:00], ["created_at", Wed, 30 May 2012 00:54:09 UTC +00:00], ["deleted", false], ["room_id", 20], ["pinned", false], ["updated_at", Wed, 30 May 2012 00:54:09 UTC +00:00], ["user_id", 1]]
[paperclip] Saving attachments.
[paperclip] saving /development/private/rooms/20/user_uploaded_files/test1_1338339249.pptx
Command :: file -b --mime '/var/folders/rm/89l_3yt93g31p22738hqydmr0000gn/T/RackMultipart20120529-10443-1ljhigq20120529-10443-1lr2yg2'
[AWS S3 200 1.16513 0 retries] put_object(:acl=>:private,:bucket_name=>"cdn-assets-site-com",:content_type=>"application/zip",:data=>#<Paperclip::FileAdapter:0x007fa1d2540170 @target=#<File:/var/folders/rm/89l_3yt93g31p22738hqydmr0000gn/T/RackMultipart20120529-10443-1ljhigq>, @tempfile=#<File:/var/folders/rm/89l_3yt93g31p22738hqydmr0000gn/T/RackMultipart20120529-10443-1ljhigq20120529-10443-1lr2yg2>>,:key=>"development/private/rooms/20/user_uploaded_files/test1_1338339249.pptx") 

Notice how the file comes in as pptx but when uploaded to AWS S3 goes as a zip file?

Ponder answered 30/5, 2012 at 1:2 Comment(0)
N
12

It turns out, as Marc B first hinted at - that all Office documents that end in x are indeed zipped XML files. Anything that uses normal mimetypes will assume that it's a zipped file.

To get around this, you have to register the Office mimetypes with your server. So, for your .pptx files, you put

Mime::Type.register "application/vnd.openxmlformats-officedocument.presentationml.presentation", :pptx

in your config/initializers/mime_types.rb file.

Alternatively, you can use the Rack::Mime::MIME_TYPES.merge!() method, which is seen in action in this Stackoverflow answer, if you have to support all of the Office 2007 files.

Noblesse answered 1/6, 2012 at 2:4 Comment(8)
Thanks but adding the line to the config/initializers/mime_types.rb has no effect, even after restart. Why would that be?Ponder
Also adding Rack::Mime::MIME_TYPES.merge!({ ".pptx" => "application/presentation" }) had no effect, no change. Ideas?Ponder
The funny thing here is that in S3 I see the file with the correct extention, PPTX. It's when I got to get the file that it turns into a ZIP. Perhaps that is the issue? This is what I use to get the file for the S3 download in my attachment model: attachment.s3_object(style).url_for(:read, :secure => true, :expires => expires_in).to_sPonder
Also, I'm noticing that in the db the paperclip field for content_type is showing "application/zip" for the pptx file. Why?Ponder
@AnApprentice: I've only tested this with Ubuntu 12.04, Ruby 1.9.3, and Rails 3.2.4, so I can't comment on any other platforms. If you're using a server other than WEBrick, you'll need to register the mimetypes with that server properly - this link provides some hints on what to do with different server platforms.Noblesse
This is happening locally with WEBrick running the same versions of ruby and rails...Ponder
Also interesting, in rails c it is working correctly but for some reason it is not being inserted correctly in the DB... > MIME::Types.type_for("test1.pptx").to_s => "[application/vnd.openxmlformats-officedocument.presentationml.presentation]"Ponder
@Ponder So how did you get it working? I also applied the mime registration, and it went from returning nil on content_type to returning application/zip .. restarted, etc. Any ideas what might be missing?Thalia
L
17

Seems like you don't have MIME types registered.

Office files that end in x (Office 2007+) are indeed zipped XML files. Anything that uses normal MIME types will assume it as a zipped file.

MIME types for office 2007+ files

| File |                             MIME type                                   |
+------+-------------------------------------------------------------------------+
|.docx |application/vnd.openxmlformats-officedocument.wordprocessingml.document  |
+------+-------------------------------------------------------------------------+
|.xlsx |application/vnd.openxmlformats-officedocument.spreadsheetml.sheet        |
+------+-------------------------------------------------------------------------+
|.pptx |application/vnd.openxmlformats-officedocument.presentationml.presentation|

In your config/initializers/mime_types.rb file, add the required field, like the example below;

"application/vnd.openxmlformats-officedocument.presentationml.presentation", :pptx

Ironically IE can have difficulty recognising the new MS Office files while other browsers recognise them fine.

In order to get IE working with these files you need to add the mime types to the server config. In Rails this is done in config/initializers/mime_types.rb

Mime::Type.register "application/vnd.openxmlformats-officedocument.wordprocessingml.document", :docx
Mime::Type.register "application/vnd.openxmlformats-officedocument.presentationml.presentation", :pptx
Mime::Type.register "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", :xlsx

If your app is proxied through Apache and Apache serves your static assets you'll also have to configure apache with the new mime types (and restart) as per http://bignosebird.com/apache/a1.shtml

Usually mime types were located at /etc/mime.types but try locate mime.types if you're not sure.

You may refer paperclip adapters.

You may read Description of the default settings for the MimeMap property and for the ScriptMaps property in IIS , Office 2007 MIME types for Apache , Uploading docx files with Paperclip and Rails and Dynamic Word (.docx) Documents in Rails also.

Lole answered 2/6, 2012 at 4:55 Comment(1)
You should remove the IIS Manager references from this answer, as they have no bearing on Rails.Primatology
N
12

It turns out, as Marc B first hinted at - that all Office documents that end in x are indeed zipped XML files. Anything that uses normal mimetypes will assume that it's a zipped file.

To get around this, you have to register the Office mimetypes with your server. So, for your .pptx files, you put

Mime::Type.register "application/vnd.openxmlformats-officedocument.presentationml.presentation", :pptx

in your config/initializers/mime_types.rb file.

Alternatively, you can use the Rack::Mime::MIME_TYPES.merge!() method, which is seen in action in this Stackoverflow answer, if you have to support all of the Office 2007 files.

Noblesse answered 1/6, 2012 at 2:4 Comment(8)
Thanks but adding the line to the config/initializers/mime_types.rb has no effect, even after restart. Why would that be?Ponder
Also adding Rack::Mime::MIME_TYPES.merge!({ ".pptx" => "application/presentation" }) had no effect, no change. Ideas?Ponder
The funny thing here is that in S3 I see the file with the correct extention, PPTX. It's when I got to get the file that it turns into a ZIP. Perhaps that is the issue? This is what I use to get the file for the S3 download in my attachment model: attachment.s3_object(style).url_for(:read, :secure => true, :expires => expires_in).to_sPonder
Also, I'm noticing that in the db the paperclip field for content_type is showing "application/zip" for the pptx file. Why?Ponder
@AnApprentice: I've only tested this with Ubuntu 12.04, Ruby 1.9.3, and Rails 3.2.4, so I can't comment on any other platforms. If you're using a server other than WEBrick, you'll need to register the mimetypes with that server properly - this link provides some hints on what to do with different server platforms.Noblesse
This is happening locally with WEBrick running the same versions of ruby and rails...Ponder
Also interesting, in rails c it is working correctly but for some reason it is not being inserted correctly in the DB... > MIME::Types.type_for("test1.pptx").to_s => "[application/vnd.openxmlformats-officedocument.presentationml.presentation]"Ponder
@Ponder So how did you get it working? I also applied the mime registration, and it went from returning nil on content_type to returning application/zip .. restarted, etc. Any ideas what might be missing?Thalia
L
4

The 'x' versions of the Office formats ARE zip files - zipped xml. As such, anything that determines file extensions based on mime types will always see them as zip files.

Literate answered 30/5, 2012 at 1:4 Comment(2)
Thanks Marc, so how/where to you handle making sure they are downloaded with the correct file ext (pptx vs zip)? ThanksPonder
Don't know anything about paperclip/ruby, but you've got the original filename in your snippet above - I'd suggest storing that with the rest of the stuff going into the database, and use it for the download later on.Literate
K
4

As of 2019 the accepted solution does not work.

Referring to the solution found here, I used the following code to make 'em download as .docx.

[
  ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', [[0, "PK\x03\x04", [[30, '_rels/.rels', [[0..5000, 'word/']]]]]]],
  ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', [[0, "PK\x03\x04", [[30, 'word/']]]]]
].each do |magic|
  MimeMagic.add(magic[0], magic: magic[1])
end

Note that simply requiring 'mimemagic/overlay' in initializers/mimemagic.rb didn't work either, I had to manually add mimetypes.

To tackle files that were uploaded before the fix, I simply re-uploaded them to s3.

Kluge answered 12/9, 2019 at 13:56 Comment(3)
This is 100% the only solution that worked for me, in 2019, latest version of paperclip.Tabloid
This is exactly what I needed, thanks! How did you end up determining what to add here? 28: >> 30 regex,=\[Content_Types\]\.xml|_rels/\.rels|docProps,""] 1 == 0 = 0 [try softmagic 1] application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary Running file -D -b --mime <my file> showed that at the end; I was having trouble trying to figure out what to add to MimeMagic.addCheerly
so I ran into another document that this did not work for, though file says it's application/vnd.openxmlformats-officedocument.wordprocessingml.document. Looking at it further I noticed that the first file that this worked for had an index('word/') of exactly 30 like you have here, but this other one has it at 940. Should the 2nd item be [[0..5000, 'word/']] as well vs. [[30, 'word/']]?Cheerly
L
3

The Command :: file -b --mime '/var/folders ... part of your log means that Paperclip is failing to detect the mime type via MIME::Types.type_for and is falling back on the file command.

Relevant code here: https://github.com/thoughtbot/paperclip/blob/5bf0619fe79ffbcaf8f0d8a7aca88b5685aec4b3/lib/paperclip/io_adapters/file_adapter.rb#L16

and here: https://github.com/thoughtbot/paperclip/blob/5bf0619fe79ffbcaf8f0d8a7aca88b5685aec4b3/lib/paperclip/io_adapters/file_adapter.rb#L71

The file command is run on the extension-less temporary file and figures it's a ZIP file, since, as others have pointed out, it really is.

The fact that MIME::Types.type_for("test1.pptx") works correctly for you in console seems to indicate that either original_filename is weird in that part of the code or MIME::Types.type_for is behaving differently inside paperclip than in your console.

Can you instrument the relevant part of the gem (via debugger or throwing some prints in your local copy) to see what it's seeing? Also, can you provide some more details on how you're converting the parameters your controller gets into attachment objects?

Leiker answered 3/6, 2012 at 21:27 Comment(0)
R
2

For those who find this is still not working, newer versions of Paperclip have the mimemagic gem dependency in the Paperclip::ContentTypeDetector. You'll want to register the mime types with that.

Rutilant answered 5/8, 2016 at 19:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.