Reading data from URL
Asked Answered
M

6

21

Is there a reasonably easy way to get data from some url? I tried the most obvious version, does not work:

readcsv("https://dl.dropboxusercontent.com/u/.../testdata.csv")

I did not find any usable reference. Any help?

Mccrea answered 15/6, 2014 at 13:18 Comment(0)
H
24

If you want to read a CSV from a URL, you can use the Requests package as @waTeim shows and then read the data through an IOBuffer. See example below.

Or, as @Colin T Bowers comments, you could use the currently (December 2017) more actively maintained HTTP.jl package like this:

julia> using HTTP

julia> res = HTTP.get("https://www.ferc.gov/docs-filing/eqr/q2-2013/soft-tools/sample-csv/transaction.txt");

julia> mycsv = readcsv(res.body);

julia> for (colnum, myheader) in enumerate(mycsv[1,:])
           println(colnum, '\t', myheader)
       end
1   transaction_unique_identifier
2   seller_company_name
3   customer_company_name
4   customer_duns_number
5   tariff_reference
6   contract_service_agreement
7   trans_id
8   transaction_begin_date
9   transaction_end_date
10  time_zone
11  point_of_delivery_control_area
12  specific location
13  class_name
14  term_name
15  increment_name
16  increment_peaking_name
17  product_name
18  transaction_quantity
19  price
20  units
21  total_transmission_charge
22  transaction_charge

Using the Requests.jl package:

julia> using Requests

julia> res = get("https://www.ferc.gov/docs-filing/eqr/q2-2013/soft-tools/sample-csv/transaction.txt");

julia> mycsv = readcsv(IOBuffer(res.data));

julia> for (colnum, myheader) in enumerate(mycsv[1,:])
         println(colnum, '\t', myheader)
       end
1   transaction_unique_identifier
2   seller_company_name
3   customer_company_name
4   customer_duns_number
5   tariff_reference
6   contract_service_agreement
7   trans_id
8   transaction_begin_date
9   transaction_end_date
10  time_zone
11  point_of_delivery_control_area
12  specific location
13  class_name
14  term_name
15  increment_name
16  increment_peaking_name
17  product_name
18  transaction_quantity
19  price
20  units
21  total_transmission_charge
22  transaction_charge
Hisakohisbe answered 16/6, 2014 at 2:55 Comment(1)
2017 update: HTTP.jl appears to be more actively maintained these days.Tide
V
17

If you are looking to read into a dataframe, this will also work in Julia:

using CSV   

dataset = CSV.read(download("https://mywebsite.edu/ml/machine-learning-databases/my.data"))
Van answered 16/10, 2018 at 0:1 Comment(2)
This is the most straightforward answer, gives a DataFrame output, seamless - thanksDebacle
Note that download is now under Downloads.download instead of BaseSedgemoor
N
10

The Requests package seems to work pretty well. There are others (see the entire package list) but Requests is actively maintained.

Obtaining it

julia> Pkg.add("Requests")

julia> using Requests

Using it

You can use one of the exported functions that correspond to the various HTTP verbs get, post, etc which returns a Response type

julia> res = get("http://julialang.org")
Response(200 OK, 21 Headers, 20913 Bytes in Body)

julia> typeof(res)
Response (constructor with 8 methods)

And then, for example, you can print the data using @printf

julia> @printf("%s",res.data);
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
...
Nickell answered 15/6, 2014 at 14:57 Comment(1)
Alright, this is rather cumbersome. Shouldn't the question be converted into feature request? Or is there some benefit in keeping the infrastructure that complicated? Also couldn't the two replies from you and @Hisakohisbe be merged, they complete each other very well.Mccrea
B
3

If it is directly a csv file, something like this should work:

A = readdlm(download(url),';')
Bundelkhand answered 11/3, 2017 at 13:13 Comment(1)
It is good to know that as said in the documentation: "Note that this function relies on the availability of external tools such as curl, wget or fetch to download the file and is provided for convenience."Proline
D
3

A very easy solution, alike to mike gold´s post, though in 2023 you need to specify a sink argument:

using CSV, DataFrames

my_table = CSV.read(download(some_url), DataFrame)
Deoxyribonuclease answered 12/4, 2023 at 13:41 Comment(1)
or kepping CSV and DataFrames "separately": CSV.download(some_url) |> CSV.File |> DataFrameHarm
B
2

Nowadays you can also use UrlDownload.jl which is pure Julia, take care of download details, process data in-memory and can also work with compressed files.

Usage is straightforward

using UrlDownload

A = urldownload("https://data.ok.gov/sites/default/files/unspsc%20codes_3.csv")
Bradleybradly answered 26/6, 2020 at 7:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.