Does GitHub rate-limit access to public "raw" files? [closed]
Asked Answered
B

4

7

Does GitHub have public access restrictions?

Example file:

https://raw.githubusercontent.com/vuejs/vue/dev/package.json

What will happen if a million users download this file?

By answered 7/3, 2021 at 22:45 Comment(6)
Limits apply for certain types of requests. Retrieving a file from GitHub does not apply in this case. Here's GitHub info on rate limits through the API: docs.github.com/en/developers/apps/rate-limits-for-github-appsLanugo
thanks you for reply. Did you mean that all users will download the file successfully?By
Yes, and by users this is any person. Does not require a GitHub account for public files.Lanugo
Even if currently there are no restrictions I'm absolutely sure if you publish a direct link that would be downloaded million times a day Github would react rather soon and impose hard limits.Dufour
Related: #60347186Floatable
Discussion with GitHub employees: github.com/github/docs/issues/8031Floatable
F
5

This is from a GitHub employee in regard to "raw" file access:

I spoke with our engineering team and learnt that there's a limit of 5000 requests per hour per IP address. Additionally, due to internal routing and caching, that 5000 figure isn't going to be exact. We may accept more but it's sometimes possible that we'll accept less too.

As was pointed out to me, if you're at risk of hitting this limit, then you're probably doing something wrong and there's a better way to obtain or even store the file.

After 1+ year of waiting, they still haven't confirmed if this is accurate or updated Docs, so I'm guessing routing requests via the GitHub API and using tokens might be more reliable.

Ref: https://github.com/littlebizzy/slickstack/issues/180

Ref: https://github.com/github/docs/issues/8031

Floatable answered 30/12, 2022 at 10:34 Comment(3)
Based off their response - the limit of 5000 requests per hour, per IP address, is accurate. I haven't found anything in their docs either though, which may just be a matter of not wanting to advertise their limits for unauthenticated, public-facing domains.Kollwitz
@Kollwitz That was my conclusion too, on the Issue linked above for my SlickStack project, which has been using wget on raw files. I think their security team is preventing them from officially confirming this limit... but I also question how accurate this figure is because our project has noticed wget failures on a regular basis despite sending fewer than 5000 requests per hour.Floatable
yeah they said that it's possible that an IP address could be rate limited before hitting 5000 requests "due to internal routing and caching" - which sounds weird to me, but without knowing much about how their connection tracking and rate limiting is actually implemented, it's really hard to say anything about it except "okay" haha :'cKollwitz
A
1

There are limitations. I am hitting getting the "wget: server returned error: HTTP/1.1 403 rate limit exceeded" when accessing https://raw.githubusercontent.com/securego/gosec/master/install.sh from a shared CI/CD environment.

Trying to get over that with a token and Authorization: header, but the documentation is lacking or unclear. In the case of static content.

For the API access, the documentation matches the reality it seems.

Alcot answered 27/2, 2023 at 11:12 Comment(0)
L
0

GitHib definitions for "public" code access are very vague online so hope this helps anyone who was as confused as I was!

GitHub confuses "public" with "open source". The first is a permission-based access designation and "git" workflow strategy on GitHub, the latter a licensing issue and a broader code access paradigm. But they mix the two together to create a new workflow on their website for how code gets shared using source control git. That confused me.

In general, GitHub "public" repositories means close to the same thing as "open source" in terms of access and use. In general it means any public GitHub repo can be viewed, downloaded, forked, etc. But anything beyond that starting with "write" access on the owners original code base requires the "owner" of the repo to add that person as a "collaborator". I interpret that to mean unlimited and unrestricted access to copy, download, and view your code by any known person, machines, process., etc.!

However, the sample open source licenses (like GNU 3.0, etc.) they recommend you create or use for your projects might legally limit some use of your code. By they are not going to help you enforce or limit that. Once your code is online there is no script or lawyer or enforcing entity that can stop any of that. That is why its called "open source". I have used the GNU "free beer" license for distribution of my personal code before and like it though Ive never seen a need to enforce it as far as limiting much. The main thing it would help with is making sure you remain copyright owner on the code in the USA and in a few other countries....AND....stop big corporate entities from taking your code and claiming copyright, limiting free use, etc.

HOW GITHUB DEFINES "public"

Note: The following applies to GiHub individuals, not organizations or enterprise accounts which have much more granular control over GitHub code projects and repositories.

When you go public on GitHub, meaning you turn your repo to "public" access, you are allowing some form of "open source" or "free" use of the code. In the "git" world this could be many different things as far as both access and use. But in the GitHub world it implies full rights for people or machines to have "read" access by default when your repo is "public". What does that really mean as far as access and use? Well it means:

  1. Anyone or any machine can view the code (they call it "visible") or code files online for free, including manually copy the code in a web browser. That means unlimited views and use of your code.

  2. Anyone or any machine can "download" the code via their code download link. In the GitHub world that means a zip or other compacted wrapper of all the code files into a format you can download in one file. That means unlimited downloads of your code.

  3. Anyone or any machine can "fork" (not "clone") the code. In the GitHub world that means GitHub copies the code and sticks that copy into your GitHub online web account, if you have one. This copy is a "fork" to them, though traditionally that's not what "forked software" means. With this copy a user can then download a "clone" of the forked code to their local machine and start modifying it and push changes to the GitHub forked copy. They cannot do anything with those changes as far as changing your original code base without you setting them up as a "collaborator". But it does includes sharing that with the world as well, which increases views and downloads of your code base to even more people you cannot track! So "public" means all the public clones, mirrors, or forks can be downloaded and shared as well.

BTW...."forking" the code in the GitHub world means copying the code with all the commit and git source history to their GitHub account so later - with more permissions granted by you - they can submit your code back to the original repository code base with a pull request for changes.

This confused me at first, as I thought a "public" repository at GitHub meant anyone can "clone" the original repo to their local box only, which would allow anyone to use a local copy of the GitHub remote repo and pull code updates. In that model they could never do push or pull request updates without additional permissions, which makes sense, but also could never share copies of your code online (unless they explicitly created a new repo at GitHub from your code base).

But that is not what "public" means to them. They want people to directly fork or copy projects into the public site and modify code on their platform using forks. That is the workflow GitHub encourages on "public" projects on their site. This allows any user or machine to make a full copy of everything and do whatever they like to that copy, including sharing and distributing it to others. This is why "public access" does open up your code to lots of crazy things including copies of your code spreading quickly across GitHub with no way to know how many people have truly used it in projects or even care to contribute back to your original.

Personally, at all the companies I have worked at that use Git, I have never seen that type of model for distribution of repositories. We always cloned a master in a development environment and built branches remotely and locally from there. It feels like this was not thought through as it opens up distribution of your code into millions of versions of forks most people never asked for, cannot sync, and will forget about over time.

Leaf answered 21/7, 2022 at 15:17 Comment(0)
P
-1

After some research I found out that GitHub has certain rate limits for accessing raw.githubusercontent.com

  • If you're sending the request without any authorization token as header, the rate limit is set to 60 request per hour per IP address as of right now. You can verify it yourself by sending a request to https://api.github.com/rate_limit
curl -H "Accept: application/vnd.github+json" https://api.github.com/rate_limit
  • However if you're sending an authentication token as a header in your request, then the rate limit is set to 5000 request per hour regardless of the IP address. You can verify it yourself as well.

For personal access tokens

curl -H "Accept: application/vnd.github+json" -H "Authorization: Bearer YOUR_TOKEN" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/rate_limit

For OAuth tokens

curl -H "Accept: application/vnd.github+json" -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit

Replace YOUR_TOKEN with your OAuth or Personal authorization token.

Pyrrho answered 27/5, 2023 at 5:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.