I've got a HTML file, and I'd like to grab all the links that are in the file and save it into another file using Vim.
I know that the regex would be something like:
:g/href="\v([a-z_/]+)"/
but I don't know where to go from here.
I've got a HTML file, and I'd like to grab all the links that are in the file and save it into another file using Vim.
I know that the regex would be something like:
:g/href="\v([a-z_/]+)"/
but I don't know where to go from here.
Put your cursor in the first row/column and try this:
:redir > output.txt|while search('href="', "We")|exe 'normal yi"'|echo @"|endwhile|redir END
Jeff Meatball Yang was almost there.
As Sasha wrote if you use w it writes the full original file to the outfile
To only write the matched line, you have to add '.' before 'w':
:g/href="\v([a-z_/]+)"/ .w >> outfile
Note that the outfile needs to exists.
.w!
–
Pettiford clear reg:x
qxq
search regex
(whatever) and append to reg:x
:g/regex/call setreg('X', matchstr(getline('.'), 'regex') . "\n")
open a new tab
:tabnew outfile
put reg:x
"xp
write file
:w
The challenge here lies with extracting all of the links where there may be multiple on line, otherwise you'd be able to simply do:
" Extract all lines with href=
:g/href="[^"]\+"/w >> list_of_links.txt
" Open the new file
:e list_of_links.txt
" Extract the bit inside the quotation marks
:%s/.*href="\([^"]\+\)".*/\1/
The simplest approach would probably be to do this:
" Save as a new file name
:saveas list_of_links.txt
" Get rid of any lines without href=
:g!/href="\([^"]\+\)"/d
" Break up the lines wherever there is a 'href='
:%s/href=/\rhref=/g
" Tidy up by removing everything but the bit we want
:%s/^.*href="\([^"]\+\)".*$/\1/
Alternatively (following a similar theme),
:g/href="[^"]\+"/w >> list_of_links.txt
:e list_of_links.txt
:%s/href=/\rhref=/g
:%s/^.*href="\([^"]\+\)".&$/\1/
(see :help saveas, :help :vglobal, :help :s)
However, if you really wanted to do it in a more direct way, you could do something like this:
" Initialise register 'h'
:let @h = ""
" For each line containing href=..., get the line, and carry out a global search
" and replace that extracts just the URLs and a double quote (as a delimiter)
:g/href="[^"]\+"/let @h .= substitute(getline('.'), '.\{-}href="\([^"]\+\)".\{-}\ze\(href=\|$\)', '\1"', 'g')
" Create a new file
:new
" Paste the contents of register h (entered in normal mode)
"hp
" Replace all double quotes with new-lines
:s/"/\r/g
" Save
:w
Finally, you could do it in a function with a for loop, but I'll leave that for someone else to write!
Put your cursor in the first row/column and try this:
:redir > output.txt|while search('href="', "We")|exe 'normal yi"'|echo @"|endwhile|redir END
W
= don't wrap the search past end of file. e
= move to the end of the match. See :h search()
. –
Chemosmosis Have you tried this?
:g/href="\v([a-z_/]+)"/w >> outfile
© 2022 - 2024 — McMap. All rights reserved.
W
= don't wrap the search past end of file.e
= move to the end of the match. See:h search()
. – Chemosmosis