Does GitHub have public access restrictions?
Example file:
https://raw.githubusercontent.com/vuejs/vue/dev/package.json
What will happen if a million users download this file?
Does GitHub have public access restrictions?
Example file:
https://raw.githubusercontent.com/vuejs/vue/dev/package.json
What will happen if a million users download this file?
This is from a GitHub employee in regard to "raw" file access:
I spoke with our engineering team and learnt that there's a limit of 5000 requests per hour per IP address. Additionally, due to internal routing and caching, that 5000 figure isn't going to be exact. We may accept more but it's sometimes possible that we'll accept less too.
As was pointed out to me, if you're at risk of hitting this limit, then you're probably doing something wrong and there's a better way to obtain or even store the file.
After 1+ year of waiting, they still haven't confirmed if this is accurate or updated Docs, so I'm guessing routing requests via the GitHub API and using tokens might be more reliable.
wget
on raw files. I think their security team is preventing them from officially confirming this limit... but I also question how accurate this figure is because our project has noticed wget
failures on a regular basis despite sending fewer than 5000 requests per hour. –
Floatable There are limitations. I am hitting getting the "wget: server returned error: HTTP/1.1 403 rate limit exceeded" when accessing https://raw.githubusercontent.com/securego/gosec/master/install.sh from a shared CI/CD environment.
Trying to get over that with a token and Authorization: header, but the documentation is lacking or unclear. In the case of static content.
For the API access, the documentation matches the reality it seems.
GitHib definitions for "public" code access are very vague online so hope this helps anyone who was as confused as I was!
GitHub confuses "public" with "open source". The first is a permission-based access designation and "git" workflow strategy on GitHub, the latter a licensing issue and a broader code access paradigm. But they mix the two together to create a new workflow on their website for how code gets shared using source control git. That confused me.
In general, GitHub "public" repositories means close to the same thing as "open source" in terms of access and use. In general it means any public GitHub repo can be viewed, downloaded, forked, etc. But anything beyond that starting with "write" access on the owners original code base requires the "owner" of the repo to add that person as a "collaborator". I interpret that to mean unlimited and unrestricted access to copy, download, and view your code by any known person, machines, process., etc.!
However, the sample open source licenses (like GNU 3.0, etc.) they recommend you create or use for your projects might legally limit some use of your code. By they are not going to help you enforce or limit that. Once your code is online there is no script or lawyer or enforcing entity that can stop any of that. That is why its called "open source". I have used the GNU "free beer" license for distribution of my personal code before and like it though Ive never seen a need to enforce it as far as limiting much. The main thing it would help with is making sure you remain copyright owner on the code in the USA and in a few other countries....AND....stop big corporate entities from taking your code and claiming copyright, limiting free use, etc.
HOW GITHUB DEFINES "public"
Note: The following applies to GiHub individuals, not organizations or enterprise accounts which have much more granular control over GitHub code projects and repositories.
When you go public on GitHub, meaning you turn your repo to "public" access, you are allowing some form of "open source" or "free" use of the code. In the "git" world this could be many different things as far as both access and use. But in the GitHub world it implies full rights for people or machines to have "read" access by default when your repo is "public". What does that really mean as far as access and use? Well it means:
Anyone or any machine can view the code (they call it "visible") or code files online for free, including manually copy the code in a web browser. That means unlimited views and use of your code.
Anyone or any machine can "download" the code via their code download link. In the GitHub world that means a zip or other compacted wrapper of all the code files into a format you can download in one file. That means unlimited downloads of your code.
Anyone or any machine can "fork" (not "clone") the code. In the GitHub world that means GitHub copies the code and sticks that copy into your GitHub online web account, if you have one. This copy is a "fork" to them, though traditionally that's not what "forked software" means. With this copy a user can then download a "clone" of the forked code to their local machine and start modifying it and push changes to the GitHub forked copy. They cannot do anything with those changes as far as changing your original code base without you setting them up as a "collaborator". But it does includes sharing that with the world as well, which increases views and downloads of your code base to even more people you cannot track! So "public" means all the public clones, mirrors, or forks can be downloaded and shared as well.
BTW...."forking" the code in the GitHub world means copying the code with all the commit and git source history to their GitHub account so later - with more permissions granted by you - they can submit your code back to the original repository code base with a pull request for changes.
This confused me at first, as I thought a "public" repository at GitHub meant anyone can "clone" the original repo to their local box only, which would allow anyone to use a local copy of the GitHub remote repo and pull code updates. In that model they could never do push or pull request updates without additional permissions, which makes sense, but also could never share copies of your code online (unless they explicitly created a new repo at GitHub from your code base).
But that is not what "public" means to them. They want people to directly fork or copy projects into the public site and modify code on their platform using forks. That is the workflow GitHub encourages on "public" projects on their site. This allows any user or machine to make a full copy of everything and do whatever they like to that copy, including sharing and distributing it to others. This is why "public access" does open up your code to lots of crazy things including copies of your code spreading quickly across GitHub with no way to know how many people have truly used it in projects or even care to contribute back to your original.
Personally, at all the companies I have worked at that use Git, I have never seen that type of model for distribution of repositories. We always cloned a master in a development environment and built branches remotely and locally from there. It feels like this was not thought through as it opens up distribution of your code into millions of versions of forks most people never asked for, cannot sync, and will forget about over time.
After some research I found out that GitHub has certain rate limits for accessing raw.githubusercontent.com
curl -H "Accept: application/vnd.github+json" https://api.github.com/rate_limit
For personal access tokens
curl -H "Accept: application/vnd.github+json" -H "Authorization: Bearer YOUR_TOKEN" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/rate_limit
For OAuth tokens
curl -H "Accept: application/vnd.github+json" -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit
Replace YOUR_TOKEN with your OAuth or Personal authorization token.
© 2022 - 2024 — McMap. All rights reserved.