Can one specify a file content-type to download using Wget?
Asked Answered
S

3

6

I want to use wget to download files linked from the main page of a website, but I only want to download text/html files. Is it possible to limit wget to text/html files based on the mime content type?

Stavro answered 17/7, 2011 at 5:6 Comment(0)
N
1

I dont think they have implemented this yet. As it is still on there bug list.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=21148

You might have to do everything by file extension

Nicknack answered 17/7, 2011 at 5:18 Comment(0)
N
1

Wget2 has this feature.

--filter-mime-type    Specify a list of mime types to be saved or ignored`

### `--filter-mime-type=list`

Specify a comma-separated list of MIME types that will be downloaded.  Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:

  wget2 -r https://<site>/<document> --filter-mime-type=*,\!image/*

It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

  wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)

Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.

Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to [email protected].

Newsy answered 14/11, 2018 at 8:39 Comment(0)
F
0

Add the header to the options

wget --header 'Content-type: text/html'
Frodina answered 20/4, 2022 at 23:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.