Managing documents using GIT
Asked Answered
D

3

6

I am working on a website where I will be able to create project and upload data to each of my products. The data could be mostly in the form of spreadsheet docs, images, pdfs etc. Ideally, I would like to use a VCS (git pref) kind of setup where each time I update a particular document, I could just commit that document to a repo. Any ideas on how I could go about implementing will be helpful.

Dignitary answered 11/1, 2011 at 8:20 Comment(0)
M
8

You can call git in a subshell after each upload.

But I don't think using any VCS it's good solution for document versioning, especially in web application. This is because with office-like documents you will use mostly binary data. VCS sucks (no exceptions) when comes to binary data. You will not be able to do any diff, and metadata management is not suited for such things - author of commit is mostly bounded to particular account (and you will be using probably one system account for git), no additional information (except base file information: size, permissions, ctime) is stored, so you will have to store it (authorship, permissions for web application users, additional meta-data) some near by by yourself. Also note that several users can commit data at the same time, so there will be branches in your versioning. When you will have huge dataset (and with binary office files it can come quicker than you think), you will not be able to partition such repository.

IMO, using VCS here gives you very small gain and introduces additional problems.

I'd advice keeping metadata in database (file name, revisions, additional stuff), and keep file revisions on disk. Keep each file with revisions in separate, unique dir. One tip here: don't use file names that comes from upload. Use hash functions to calculate unique name based on content and metadata.

Malraux answered 11/1, 2011 at 10:33 Comment(4)
i agree with the above, but in case the OP wants to pursue the VCS way, there are git wrappers for various languagesFlutist
I agree on the "don't use VCS" tip. +1Inter
@Malraux - Are you aware of any tool/software which helps to maintain & manage documents (with VCS features-diff, versions etc)?Cnemis
@AndyDufresne any decent Document Management System should have such features. For example OKM: openkm.com/en/overview/features.html, Alfresco: alfresco.com/products/document-management and many others.Malraux
N
2

As a branch off of Cezio's answer, if you would really like to use a VCS for version control, consider LaTeX. Since it is essentially source code that is compiled into a document (usually PDF via pdflatex), it's a reasonable candidate for version control.

Northwards answered 3/11, 2012 at 5:43 Comment(1)
As an afterthought, apologies for resurrecting this guy.Northwards
I
1

There isn't an universal "commit on save" feature (at least one integrated with all the editors associated with the document types you mention)

The easiest way would be a background job which would commit (or 'git add -A && git commit -m "xxx" in the case of Git) every 5 minutes for instance.

Actually, Mark Longair comments:

flashbake is designed to be run from cron to do what you describe in the second paragraph with some kind of reasonable commit message.
I'm not sure that that's what the original poster is after, though.

Original project here:

  • Automated backup is nice unless you have files for which you want to view an incremental history.
  • Source control is great for that history but most tools expect the author to manually commit their changes along the way.
  • => A seamless source control solution combines the convenience of automated back up with the power of source version control.
Inter answered 11/1, 2011 at 8:39 Comment(3)
actually, if files will be added with upload, server-side automatically add and commit after upload.Malraux
I've never wanted such a thing, so haven't tried it myself, but flashbake github.com/commandline/flashbake/wiki is designed to be run from cron to do what you describe in the second paragraph with some kind of reasonable commit message. I'm not sure that that's what the original poster is after, though.Hexapartite
@Mark: interesting, thank you. I have included your comment (and some additional informations) in my answer.Inter

© 2022 - 2024 — McMap. All rights reserved.