large-data-volumes Questions

11

There's a 1 gigabyte string of arbitrary data which you can assume to be equivalent to something like: 1_gb_string=os.urandom(1*gigabyte) We will be searching this string, 1_gb_string, for an infi...
Guyguyana asked 17/11, 2009 at 17:11

9

Solved

How can I plot a very large data set in R? I'd like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and calculate the summaries nee...
Humbuggery asked 2/12, 2010 at 23:24

10

Solved

what if you have so many entries in a table, that 2^32 is not enough for your auto_increment ID within a given period (day, week, month, ...)? What if the largest datatype MySQL provides is not eno...
Acalia asked 31/3, 2009 at 20:43

8

Solved

I have come across an interview question "If you were designing a web crawler, how would you avoid getting into infinite loops? " and I am trying to answer it. How does it all begin from the begin...

8

Solved

My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list. I would like to filter them in place but I ...
Zsigmondy asked 8/6, 2009 at 5:14

12

Solved

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following w...

2

Solved

I know how to create and mount a data volume container to multiple other containers using --volumes-from, but I do have a few questions regarding it's usage and limitations: Situation: I am lookin...
Miki asked 9/6, 2015 at 21:48

4

Solved

Let's make it immediately clear: this is not a question about memory leak! I have a page which allows the user to enter some data and a JavaScript to handle this data and produce a result. The Java...
Heer asked 8/1, 2010 at 12:55

7

Solved

How would you tackle the following storage and retrieval problem? Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row: id (unique row identifier)...
Tuberculin asked 20/3, 2009 at 10:32

5

Solved

I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wo...
Caylor asked 22/1, 2010 at 4:28

4

Solved

What's the best way of displaying page navigation for many, many pages? (Initially this was posted as a how-to tip with my answer included in the question. I've now split my answer off into the "a...

2

Solved

I was looking around for jQuery grid recommendations and came across this question/answers: https://stackoverflow.com/questions/159025/jquery-grid-recommendations In looking through the many jQuer...
Hobbyhorse asked 12/9, 2010 at 17:45

4

We're designing a large scale web scraping/parsing project. Basically, the script needs to go through a list of web pages, extract the contents of a particular tag, and store it in a database...
Axenic asked 29/6, 2010 at 17:50

3

Solved

I am trying to create a Python script which will take an address as input and will spit out its latitude and longitude, or latitudes and longitudes in case of multiple matches, quite like Nominatim...

4

Solved

I just took my first baby step today into real scientific computing today when I was shown a data set where the smallest file is 48000 fields by 1600 rows (haplotypes for several people, for chromo...
Westfalen asked 10/6, 2010 at 6:34

7

Solved

I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I...
Krak asked 13/12, 2008 at 16:18

1

I am an apache solr user about a year. I used solr for simple search tools but now I want to use solr with 5TB of data. I assume that 5TB data will be 7TB when solr index it according to filter tha...
Posthaste asked 12/1, 2012 at 14:34

5

I have a case where I need to transfer large amounts of serialized object graphs (via NetDataContractSerializer) using WCF using wsHttp. I'm using message security and would like to continue to do ...

5

I have large datasets with millions of records in XML format. These datasets are full data dumps of a database up to a certain point in time. Between two dumps new entries might have been added and...
Admonitory asked 6/9, 2011 at 17:35

12

Solved

I have a process that's going to initially generate 3-4 million PDF files, and continue at the rate of 80K/day. They'll be pretty small (50K) each, but what I'm worried about is how to manage the t...
Psychomotor asked 10/8, 2009 at 21:50

1

I'm just wondering if anyone out there knows of a java implementation of singular value decomposition (SVD) for large sparse matrices? I need this implementation for latent semantic analysis (LSA)....
Virg asked 25/7, 2011 at 17:28

1

the correlation matrix is so large (50000by50000) that it is not efficient in calculating what I want. What I want to do is to break it down to groups and treat each as separate correlation matrice...
Lava asked 16/6, 2011 at 18:51

5

Solved

I'm writing a very computationally intense procedure for a mobile device and I'm limited to 32-bit CPUs. In essence, I'm performing dot products of huge sets of data (>12k signed 16-bit integers). ...
Autophyte asked 10/6, 2011 at 14:45

1

Solved

Summary I have a large an rapidly changing dataset which I wish to bind to a UI (Datagrid with grouping). The changes are on two levels; Items are frequently added or removed from the collection...

4

Solved

Context We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (e.g. up to 100,000). The problem: we stor...
Heartbreak asked 5/12, 2010 at 1:39

© 2022 - 2024 — McMap. All rights reserved.