large-data-volumes Questions
11
There's a 1 gigabyte string of arbitrary data which you can assume to be equivalent to something like:
1_gb_string=os.urandom(1*gigabyte)
We will be searching this string, 1_gb_string, for an infi...
Guyguyana asked 17/11, 2009 at 17:11
9
Solved
How can I plot a very large data set in R?
I'd like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and calculate the summaries nee...
Humbuggery asked 2/12, 2010 at 23:24
10
Solved
what if you have so many entries in a table, that 2^32 is not enough for your auto_increment ID within a given period (day, week, month, ...)?
What if the largest datatype MySQL provides is not eno...
Acalia asked 31/3, 2009 at 20:43
8
Solved
I have come across an interview question "If you were designing a web crawler, how would you avoid getting into infinite loops? " and I am trying to answer it.
How does it all begin from the begin...
Churr asked 29/4, 2011 at 16:37
8
Solved
My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list. I would like to filter them in place but I ...
Zsigmondy asked 8/6, 2009 at 5:14
12
Solved
I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following w...
Sofko asked 13/5, 2010 at 11:21
2
Solved
I know how to create and mount a data volume container to multiple other containers using --volumes-from, but I do have a few questions regarding it's usage and limitations:
Situation: I am lookin...
Miki asked 9/6, 2015 at 21:48
4
Solved
Let's make it immediately clear: this is not a question about memory leak!
I have a page which allows the user to enter some data and a JavaScript to handle this data and produce a result.
The Java...
Heer asked 8/1, 2010 at 12:55
7
Solved
How would you tackle the following storage and retrieval problem?
Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row:
id (unique row identifier)...
Tuberculin asked 20/3, 2009 at 10:32
5
Solved
I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wo...
Caylor asked 22/1, 2010 at 4:28
4
Solved
What's the best way of displaying page navigation for many, many pages?
(Initially this was posted as a how-to tip with my answer included in the question. I've now split my answer off into the "a...
Ori asked 20/10, 2011 at 12:12
2
Solved
I was looking around for jQuery grid recommendations and came across this question/answers:
https://stackoverflow.com/questions/159025/jquery-grid-recommendations
In looking through the many jQuer...
Hobbyhorse asked 12/9, 2010 at 17:45
4
We're designing a large scale web scraping/parsing project. Basically, the script needs to go through a list of web pages, extract the contents of a particular tag, and store it in a database...
Axenic asked 29/6, 2010 at 17:50
3
Solved
I am trying to create a Python script which will take an address as input and will spit out its latitude and longitude, or latitudes and longitudes in case of multiple matches, quite like Nominatim...
Selhorst asked 12/4, 2012 at 12:56
4
Solved
I just took my first baby step today into real scientific computing today when I was shown a data set where the smallest file is 48000 fields by 1600 rows (haplotypes for several people, for chromo...
Westfalen asked 10/6, 2010 at 6:34
7
Solved
I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I...
Krak asked 13/12, 2008 at 16:18
1
I am an apache solr user about a year. I used solr for simple search tools but now I want to use solr with 5TB of data. I assume that 5TB data will be 7TB when solr index it according to filter tha...
Posthaste asked 12/1, 2012 at 14:34
5
I have a case where I need to transfer large amounts of serialized object graphs (via NetDataContractSerializer) using WCF using wsHttp. I'm using message security and would like to continue to do ...
Dissentient asked 10/6, 2010 at 16:39
5
I have large datasets with millions of records in XML format. These datasets are full data dumps of a database up to a certain point in time.
Between two dumps new entries might have been added and...
Admonitory asked 6/9, 2011 at 17:35
12
Solved
I have a process that's going to initially generate 3-4 million PDF files, and continue at the rate of 80K/day. They'll be pretty small (50K) each, but what I'm worried about is how to manage the t...
Psychomotor asked 10/8, 2009 at 21:50
1
I'm just wondering if anyone out there knows of a java implementation of singular value decomposition (SVD) for large sparse matrices? I need this implementation for latent semantic analysis (LSA)....
Virg asked 25/7, 2011 at 17:28
1
the correlation matrix is so large (50000by50000) that it is not efficient in calculating what I want. What I want to do is to break it down to groups and treat each as separate correlation matrice...
Lava asked 16/6, 2011 at 18:51
5
Solved
I'm writing a very computationally intense procedure for a mobile device and I'm limited to 32-bit CPUs. In essence, I'm performing dot products of huge sets of data (>12k signed 16-bit integers). ...
Autophyte asked 10/6, 2011 at 14:45
1
Solved
Custom ObservableCollection<T> or BindingList<T> with support for periodic notifications
Summary
I have a large an rapidly changing dataset which I wish to bind to a UI (Datagrid with grouping). The changes are on two levels;
Items are frequently added or removed from the collection...
Balky asked 15/3, 2011 at 10:8
4
Solved
Context
We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (e.g. up to 100,000). The problem: we stor...
Heartbreak asked 5/12, 2010 at 1:39
1 Next >
© 2022 - 2024 — McMap. All rights reserved.