Small file upload to s3 hangs with 100% CPU usage using Paperclip
Asked Answered
P

2

53

I have a directory of <20MB pdf files (each pdf represents an ad) on an AWS EC2 large instance. I'm trying to upload each pdf file to S3 using ruby and DM-Paperclip.

Most files upload successfully but some seem to take hours with the CPU hanging at 100%. I've located the line of code that causes the issue by printing debug statements in the relevant section.

 # Takes an array of pdf file paths and uploads each to S3 using dm-paperclip
 def save_pdfs(pdfs_files)
  pdf_files.each do |path|
  pdf = File.open(path)
  ad = Ad.new
  ad.pdf.assign(pdf) # <= Last debug statment is printed before this line
  begin
    ad.save
  rescue => e
    # log error
  ensure
    pdf.close
  end
 end

To help troubleshoot the issue I attached strace to the process while it was stuck at 100%. The result was hundreds of thousands of lines like this:

 ...
 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
 ... 500K lines

Followed by a few thousand:

 ...
 brk(0x1224d0000)                        = 0x1224d0000
 brk(0x1224f3000)                        = 0x1224f3000
 brk(0x122514000)                        = 0x122514000
 ...

During an upload that doesn't hang, strace looks like this:

 ...
 ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
 fstat(12, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
 fcntl(12, F_GETFL)                      = 0x2 (flags O_RDWR)
 write(12, "%PDF-1.3\n%\342\343\317\323\n8 0 obj\n<</Filter"..., 4096) = 4096
 ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
 write(12, "S\34\367\23~\277u\272,h\204_\35\215\35\341\347\324\310\307u\370#\364\315\t~^\352\272\26\374"..., 4096) = 4096
 ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
 write(12, "\216%\267\2454`\350\177\4\36\315\211\7B\217g\33\217!e\347\207\256\264\245vy\377\304\256\307\375"..., 4096) = 4096
 ...

The pdf files that cause this issue seem random. They are all valid pdf files, and they are all relatively small. They vary between ~100KB to ~50MB.

Is the strace with the seemingly excessive stat system calls related to my issue?

Pareira answered 15/8, 2013 at 1:14 Comment(8)
Your ensure block is not being executed when an exception occurs unless the exception is raised by ad.save. In this case, ad.pdf.assign(pdf) might be raising an exception, and the file would not be closed. That may have happened several hundred times before the file that's taking 100% CPU usage, leaving you with references to hundreds of files. If you wrap everything in a block and pass it to File.open, then you can be sure the file will always be closed correctly. Depending on how many files you are dealing with, that may improve performance significantly.Oversold
Maybe related: serverfault.com/a/562148Frenulum
for all "download/upload cpu hangs/outofmemory issues", I strongly recommend, to set the <attachment>_file_size parameter (in HTTP: Content-length header).Loudspeaker
Paperclip is written in Rails. Rails is written in C. Rails calls functions in C to obtain timezone information, which is code you cannot see and likely causing a memory leak. Rails has custom classes and methods for dealing with timezones. The paperclip app should use those methods instead of the built-in ones to prevent this.Slocum
api.rubyonrails.org/v5.0/classes/ActiveSupport/…Slocum
edgeguides.rubyonrails.org/…Slocum
The C function current_zone() is huge because the developers have added so many fixes over the years for all of the different platforms out there. So the function fails often because of it's complexity. If you want to know if the exception is coming from the timezone, write a simple C app like the one in this link and tell us what exception is thrown. If not, then move to the next theory. howardhinnant.github.io/date/tz.htmlSlocum
Or check it the simple way if one of these methods works on your OS. #4704330Slocum
P
1

To start dm-paperclip is no longer maintained, so you should consider using other alternatives like carrierwave or active_storage (if you're using Rails) to handle file uploads.

Based on the information you provided, it seems that the excessive stat system calls could be related to your issue. The high CPU usage might be a result of your Ruby process repeatedly querying the file system for the /etc/localtime file.

You could try to profile your Ruby code using a profiling tool like ruby-prof to determine where the performance bottleneck is occurring. This might help you pinpoint the issue more precisely.

Partee answered 6/4, 2023 at 16:32 Comment(0)
L
-1

It appears to be a problem of the original file permissions on failed files sent along with the file from origin computer. All pdf's(files Tobe saved into server) to be 0644 assigned in the script or if script uses original permissions at pickup to send from client. Basically the server OS and configurations is rejecting it because the file permissions are not 0644 on write to disc.

Lloyd answered 26/9, 2022 at 4:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.