Sampling Large Data Files
Asked Answered
P

6

5

I currently work in the position of Data Warehouse programmer and as such have to put numerous flat files through ETL process. Of course prior to loading the file I have to be aware of its content, the problem is that majority of the files are > 1 GB large and I can not open them using my dear old friend "notepad". Kidding. I usually use VIM or Notepad++ but it still takes a while to open the file. Could I perform a "partial" read of the file using VIM or some other editor?

P.S. I know that I could write a 10 liner script to "data sample" the file, but it would be simpler to convince team members to use a feature of an editor than a script that I wrote.

Thank you for any insight you might have.

Plasm answered 1/4, 2010 at 18:46 Comment(0)
C
3

If you want to stick with using vim, you could have a look at the LargeFile script.

Alternatively, I've always found that UltraEdit opens large files extremely quickly.

Clamant answered 1/4, 2010 at 18:52 Comment(0)
N
3

You said you had VIM, that makes me wonder if you have a unix environment as well?

If you like, you can pipe the input through unix utility top and display the raw imput on your screen. Like this:

EDIT: (thanks Honk)

terminal$> head -N 15 file.csv

(Where that 15 indicates you want to see 15 lines only).

Nahshunn answered 1/4, 2010 at 18:53 Comment(6)
Not sure if top is special in mainframe Unixes, but on Linux you would pipe into head -n 15.Apuleius
Or you would even avoid the unnecessary cat with head -n 15 file.csv. This should be orders of magnitude faster, too.Apuleius
Thanks, but I am just a big fan of UNIX, our environment is built on MS stack.Plasm
@Plasm - being a fan of UNIX but on Microsoft, you might like CYGWIN! This is an off-the-topic suggestion, though. : )Nahshunn
I definitely appreciate this suggestion and actually have it installed :).Plasm
@Plasm +1 for being a CYGWIN fan as well n_nNahshunn
M
2

Pretty sure there are loads of similar questions, but hey, Textpad is a good choice for this.

Mischa answered 1/4, 2010 at 18:54 Comment(2)
Verified & Confirmed. Textpad opened a 1.3 GB file flawlessly in 6 seconds for me (although saving it took much, much longer).Nahshunn
TextPad ended up being waaay too slow when tasked with opening the file taking quite a bit longer than Notepad++.Plasm
M
2

use the head command.

Mccomas answered 1/4, 2010 at 19:3 Comment(0)
G
1

Use the 'less' on solaris ... use the same through cygwin on windows. On mainframes this problem doesn't appear, ISPF editor handles it pretty well.

Garibold answered 1/4, 2010 at 19:24 Comment(1)
CYGWIN also handles less , and topNahshunn
N
0

UltraEdit claims to handle files over 4GB...

Nathan answered 1/4, 2010 at 18:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.