bigdata Questions
3
Solved
How to rename a TABLE in BigQuery using StandardSQL or LegacySQL in order to partition a table which was not previously partitioned?
I'm trying with StandardSQL but it is giving following error,
RE...
Transonic asked 17/5, 2018 at 6:29
2
Solved
This may be something simple but I have search a LOT and can't find how to fix it.
I am using Rstudio 2.15.1 on a server because we use big data sets and the server has more ram
to deal with it.
I...
3
Solved
I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow.
Is there a faster me...
Emptyhanded asked 27/12, 2016 at 10:33
7
I am looking for a sample code which can convert .h5 files to csv or tsv.
I have to read .h5 and output should be csv or tsv.
Sample code would be much appreciated,please help as i have stuck on i...
4
Is there any possibility using a framework for enabling / using Dependency Injection in a Spark Application?
Is it possible to use Guice, for instance?
If so, is there any documentation, or sampl...
Tonsillotomy asked 19/11, 2017 at 20:1
0
I thought Hive lineage was not available, but after some research I have found that it can be enable. Some of the things I found while searching was enabling its lineage via either Cloudera Manager...
Odele asked 2/5, 2022 at 10:8
1
I want to query my mongodb to perform a non-match between 2 collections.
Here is my structure :
CollectionA :
_id, name, firstname, website_account_key, email, status
CollectionB :
_id, webs...
Ellette asked 16/3, 2015 at 11:46
3
I am using Parquet.Net to read parquet files, but the only option to read from the parquet file is.
//get the first group
Parquet.ParquetRowGroupReader rowGroup = myParquet.OpenRowGroupReader(0);
...
Pachysandra asked 21/7, 2020 at 1:3
2
Solved
When I execute run-example SparkPi, for example, it works perfectly, but
when I run spark-shell, it throws these exceptions:
WARNING: An illegal reflective access operation has occurred
WARNING: Il...
Batch asked 11/12, 2021 at 17:37
4
Solved
So what I'm looking to do is create a report that shows how many sales a company had on a weekly basis.
So we have a time field called created that looks like this:
2016-04-06 20:58:06 UTC
This fi...
Champollion asked 2/5, 2016 at 17:17
0
3
I am trying to query my dynamodb table to get feed_guid and status_id = 1. But it returns Query key condition not supported error.
Please find my table schema and query.
$result =$dynamodbClient-&...
Strand asked 5/8, 2015 at 11:13
3
Solved
now has JSON data as follows
{"Id":11,"data":[{"package":"com.browser1","activetime":60000},{"package":"com.browser6","activetime":1205000},{"package":"com.browser7","activetime":1205000}]}
{"Id":...
Mystery asked 6/3, 2018 at 14:32
3
is it true that e-mail can be deduplicated by just using some of their headers as according to RFC their message-id should be unique?
Is there any way to calculate the chance of 1 single email bee...
Zerk asked 3/4, 2014 at 15:6
2
Solved
I want create new database 'demo' in neo4j, but I see a bug:
I was search but can't find result, can you help me? Thank all!
2
Problem: We need a big data method for calculating distances between points. We outline what we'd like to do below with a five-observation dataframe. However, this particular method is infeasible a...
Cuffs asked 17/12, 2021 at 16:31
4
I have 6k of data to update in ElasticSearch. And I have to use PHP.
I search in the documentation and I have found this, Bulk Indexing but this is not keeping the previous data.
I have structure:
...
Brimful asked 11/12, 2017 at 19:6
5
Solved
I read other similar threads and searched Google to find a better way but couldn't find any workable solution.
I have a large large table in BigQuery (assume inserting 20 million rows per day). I...
Chromato asked 6/3, 2019 at 23:5
1
Solved
I have a dataset that includes video frames partially 1000 real videos and 1000 deep fake videos. each video after preprocessing phase converted to the 300 frames in other worlds I have a dataset w...
Ymir asked 10/11, 2021 at 14:15
3
Solved
I learn Cassandra through its documentation. Now I'm learning about batch and static fields.
In their example at the end of the page, they somehow managed to make balance have two different values...
3
First I want to quickly give some background. What I want to achieve eventually is to train a fully connected neural network for a multi-class classification problem under tensorflow framework.
Th...
Educatee asked 11/10, 2017 at 3:36
2
Solved
I just tried using the IncrementalPCA from sklearn.decomposition, but it threw a MemoryError just like the PCA and RandomizedPCA before. My problem is, that the matrix I am trying to load is too bi...
Gunas asked 15/7, 2015 at 11:0
1
I have a collection named "allvoice" which has the following structure:
{
"_id" : ObjectId("612599bb1cff80e6fc5cbf38"),
"subscriber_id" : "e3365edb9c7...
Noellanoelle asked 10/9, 2021 at 4:36
1
While working to adapt Java's KafkaIOIT to work with a large dataset I encountered a problem. I want to push 100M records through a Kafka topic, verify data correctness and at the same time check t...
Portion asked 12/9, 2019 at 7:26
2
Can we update a dynamodb item only with global secondary index?
$response = $dynamodbClient->updateItem(array(
'TableName' => 'feed',
'Key' => array(
'feed_guid' => array('S' => ...
Titre asked 11/8, 2015 at 13:30
© 2022 - 2024 — McMap. All rights reserved.