I have 2 csv files. File A, with multiple columns. File B, with one column. eg.:
File A:
chr1 100000 100022 A C GeneX
chr2 200000 200033 X GeneY
chr3 300000 300055 G A GeneZ
File B:
GeneY
GeneZ
I would want my output to be:
chr2 200000 200033 X GeneY
chr3 300000 300055 G A GeneZ
I have tried using grep
(which crashes) and others.
I am certain there must be a very simple answer to this that I just can't see!
grep
crashes? How big are the files that you're working with? You said that you got an 'out of memory' error when you triedgrep -f FileB FileA
. Your best bet in that case is probably to splitFileB
into sections small enough to be processed withoutgrep
crashing. The obvious disadvantage of this is that you will end up with rows in the result set that are out of order compared with the originalFileA
. If two words fromFileB
can appear in a single line, then you could also end up with repeats. – Reagansed
work any better? What about Perl? If neithersed
norgrep
nor Perl works, then you may be able to find a better way to encode the information and write your own processing. But that's something of a last resort, depending on a lot of factors not yet described in the question. – Reagangrep
installed? It will be quicker and simpler than most of the alternatives. (I just tried doinggrep -f FileA
with a file containing 1500 generated lines such asGZX6274256PQA
(a seven digit random number sandwiched between two constant strings) and it started up without a problem on my Mac, using BSDgrep
, rather than GNU. – Reagangrep -f FileA
with a similar file (new set of random numbers, different sandwiching letters) without problems. That's got 16 GiB main memory; I don't know if you're memory constrained -- the memory pressure on my machine is non-existent (11 GiB used, so 5 GiB available) -- see Activity Monitor / Memory tab. Have you rebooted since you ran into trouble? (I hate suggesting that, but it can help surprisingly/depressingly often.) – Reagan