partitioning Questions
6
I have a dataset in parquet in S3 partitioned by date (dt) with oldest date stored in AWS Glacier to save some money. For instance, we have...
s3://my-bucket/my-dataset/dt=2017-07-01/ [in glacier]...
Reneta asked 21/8, 2017 at 13:12
2
I have two datasets stored as parquet files with schemas as below:
Dataset 1:
id
col1
col2
1
v1
v3
2
v2
v4
Dataset 2:
id
col3
col4
1
v5
v7
2
v6
v8
I want to join the two dat...
Broch asked 9/4 at 13:0
2
Solved
There are several similar-yet-different concepts in Spark-land surrounding how work gets farmed out to different nodes and executed concurrently. Specifically, there is:
The Spark Driver node (sp...
Goatsbeard asked 8/9, 2016 at 0:57
18
Solved
I have an ArrayList, which I want to divide into smaller List objects of n size, and perform an operation on each.
My current method of doing this is implemented with ArrayList objects in Java. Any...
Byers asked 28/4, 2011 at 20:49
7
i have a digital ocean Ubuntu server, two days ago every page in my website gives me error 500,
in ssh i got this error message "No space left on device",
For more info please check this SSH scr...
Balefire asked 18/2, 2018 at 13:49
4
Solved
I use a partitioned table with a large amount of data. According to MySQL docs, it is on the ToDo list that:
Queries involving aggregate functions such as SUM() and COUNT() can
easily be parall...
Melainemelamed asked 28/7, 2011 at 15:4
9
Solved
I have a set of distinct values. I am looking for a way to generate all partitions of this set, i.e. all possible ways of dividing the set into subsets.
For instance, the set {1, 2, 3} has the fol...
Pyro asked 11/12, 2013 at 21:19
4
Solved
I've need for a particular form of 'set' partitioning that is escaping me, as it's not quite partitioning. Or rather, it's the subset of all partitions for a particular list that maintain the origi...
Kickback asked 23/8, 2014 at 5:48
3
Solved
I have access to a database and I need to know the Partition Scheme definitions in the database. i.e. I need to know the partition scheme name, which Partition function is it using, what file group...
Atwood asked 22/7, 2014 at 3:36
3
Solved
I'm using PostgreSQL 12, in which there is a partitioned table. This table has old partitions that need to be deleted. I've seen the code where the old partitions are firstly detached and only then...
Anthology asked 15/4, 2022 at 8:38
5
For QA purposes I need to be able to partition a drive via a bash script up to 30 or more partitions for both RHEL and SLES.
I have attempted to do this in BASH with fdisk via a "here document...
Rhinoscopy asked 27/8, 2012 at 21:52
4
Solved
I am trying to write out a large partitioned dataset to disk with Spark and the partitionBy algorithm is struggling with both of the approaches I've tried.
The partitions are heavily skewed - some ...
Liliuokalani asked 28/10, 2018 at 23:52
2
Solved
I have the following dataframe (df_parquet):
DataFrame[id: bigint, date: timestamp, consumption: decimal(38,18)]
I intend to get sorted lists of dates and consumptions using collect_list, just a...
Mcdevitt asked 29/7, 2019 at 14:22
4
I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem.
I want to create a table partition by week (the nu...
Tsarina asked 17/4, 2013 at 0:20
4
Solved
So I want to upload large CSV files to a mongoDB cloud database using a Node.js server using Express, Mongoose and Multer's GridFS storage engine, but when the file upload starts, my database becom...
Fessler asked 10/5, 2022 at 12:41
2
Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example:
Using a normal sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |...
Digenesis asked 28/8, 2010 at 15:26
5
I have a weka model stored in S3 which is of size around 400MB.
Now, I have some set of record on which I want to run the model and perform prediction.
For performing prediction, What I have tried...
Justen asked 13/10, 2016 at 8:20
5
I am trying to create dynamic partitions in hive using following code.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
create external table if not exist...
Colpin asked 15/4, 2015 at 11:17
2
I am using spark 2.3 and have written one dataframe to create hive partitioned table using dataframe writer class method in pyspark.
newdf.coalesce(1).write.format('orc').partitionBy('veh_country'...
Equimolecular asked 19/11, 2018 at 10:47
2
Solved
I would like to know what is the best way to load a delta table specific partition ?
Is option 2 loading the all table before filtering ?
option 1 :
df = spark.read.format("delta").option...
Communication asked 12/7, 2021 at 8:37
3
We have a Spring Boot project that uses Spring-JPA for data access. We have a couple of tables where we create/update rows once (or a few times, all within minutes). We don't update rows that are o...
Roughshod asked 31/5, 2016 at 18:56
3
Solved
I am trying to run this function in PostrgeSQL:
CREATE OR REPLACE FUNCTION create_partition_and_insert()
RETURNS trigger AS
$BODY$
DECLARE
partition VARCHAR(25);
_date text;
BEGIN
EXECUTE 'SELECT ...
Herculaneum asked 13/10, 2015 at 15:32
2
Solved
(Note: updated with adopted answer below.)
For a PostgreSQL 8.1 (or later) partitioned table, how does one define an UPDATE trigger and procedure to "move" a record from one partition to ...
Sino asked 25/11, 2009 at 16:35
2
Solved
A list of elements is given. I want to have all the possibilities to divide this list into any number of partitions so that each partition has at least x elements. The order of the partitions in th...
Hersch asked 22/2, 2022 at 9:12
8
Solved
I made partition my 300MB table and trying to make select query from p0 partition with this command
SELECT * FROM employees PARTITION (p0);
But I am getting following error
ERROR 1064 (42000): ...
Lichtenfeld asked 1/1, 2013 at 16:53
1 Next >
© 2022 - 2024 — McMap. All rights reserved.