partitioning Questions

6

I have a dataset in parquet in S3 partitioned by date (dt) with oldest date stored in AWS Glacier to save some money. For instance, we have... s3://my-bucket/my-dataset/dt=2017-07-01/ [in glacier]...
Reneta asked 21/8, 2017 at 13:12

2

I have two datasets stored as parquet files with schemas as below: Dataset 1: id col1 col2 1 v1 v3 2 v2 v4 Dataset 2: id col3 col4 1 v5 v7 2 v6 v8 I want to join the two dat...
Broch asked 9/4 at 13:0

2

Solved

There are several similar-yet-different concepts in Spark-land surrounding how work gets farmed out to different nodes and executed concurrently. Specifically, there is: The Spark Driver node (sp...

18

Solved

I have an ArrayList, which I want to divide into smaller List objects of n size, and perform an operation on each. My current method of doing this is implemented with ArrayList objects in Java. Any...
Byers asked 28/4, 2011 at 20:49

7

i have a digital ocean Ubuntu server, two days ago every page in my website gives me error 500, in ssh i got this error message "No space left on device", For more info please check this SSH scr...
Balefire asked 18/2, 2018 at 13:49

4

Solved

I use a partitioned table with a large amount of data. According to MySQL docs, it is on the ToDo list that: Queries involving aggregate functions such as SUM() and COUNT() can easily be parall...
Melainemelamed asked 28/7, 2011 at 15:4

9

Solved

I have a set of distinct values. I am looking for a way to generate all partitions of this set, i.e. all possible ways of dividing the set into subsets. For instance, the set {1, 2, 3} has the fol...
Pyro asked 11/12, 2013 at 21:19

4

Solved

I've need for a particular form of 'set' partitioning that is escaping me, as it's not quite partitioning. Or rather, it's the subset of all partitions for a particular list that maintain the origi...
Kickback asked 23/8, 2014 at 5:48

3

Solved

I have access to a database and I need to know the Partition Scheme definitions in the database. i.e. I need to know the partition scheme name, which Partition function is it using, what file group...

3

Solved

I'm using PostgreSQL 12, in which there is a partitioned table. This table has old partitions that need to be deleted. I've seen the code where the old partitions are firstly detached and only then...
Anthology asked 15/4, 2022 at 8:38

5

For QA purposes I need to be able to partition a drive via a bash script up to 30 or more partitions for both RHEL and SLES. I have attempted to do this in BASH with fdisk via a "here document...
Rhinoscopy asked 27/8, 2012 at 21:52

4

Solved

I am trying to write out a large partitioned dataset to disk with Spark and the partitionBy algorithm is struggling with both of the approaches I've tried. The partitions are heavily skewed - some ...
Liliuokalani asked 28/10, 2018 at 23:52

2

Solved

I have the following dataframe (df_parquet): DataFrame[id: bigint, date: timestamp, consumption: decimal(38,18)] I intend to get sorted lists of dates and consumptions using collect_list, just a...
Mcdevitt asked 29/7, 2019 at 14:22

4

I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem. I want to create a table partition by week (the nu...
Tsarina asked 17/4, 2013 at 0:20

4

Solved

So I want to upload large CSV files to a mongoDB cloud database using a Node.js server using Express, Mongoose and Multer's GridFS storage engine, but when the file upload starts, my database becom...
Fessler asked 10/5, 2022 at 12:41

2

Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example: Using a normal sequence in Issue: | Project_ID | Issue | | 1 | 1 | | 1 | 2 |...
Digenesis asked 28/8, 2010 at 15:26

5

I have a weka model stored in S3 which is of size around 400MB. Now, I have some set of record on which I want to run the model and perform prediction. For performing prediction, What I have tried...
Justen asked 13/10, 2016 at 8:20

5

I am trying to create dynamic partitions in hive using following code. SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; create external table if not exist...
Colpin asked 15/4, 2015 at 11:17

2

I am using spark 2.3 and have written one dataframe to create hive partitioned table using dataframe writer class method in pyspark. newdf.coalesce(1).write.format('orc').partitionBy('veh_country'...
Equimolecular asked 19/11, 2018 at 10:47

2

Solved

I would like to know what is the best way to load a delta table specific partition ? Is option 2 loading the all table before filtering ? option 1 : df = spark.read.format("delta").option...
Communication asked 12/7, 2021 at 8:37

3

We have a Spring Boot project that uses Spring-JPA for data access. We have a couple of tables where we create/update rows once (or a few times, all within minutes). We don't update rows that are o...

3

Solved

I am trying to run this function in PostrgeSQL: CREATE OR REPLACE FUNCTION create_partition_and_insert() RETURNS trigger AS $BODY$ DECLARE partition VARCHAR(25); _date text; BEGIN EXECUTE 'SELECT ...
Herculaneum asked 13/10, 2015 at 15:32

2

Solved

(Note: updated with adopted answer below.) For a PostgreSQL 8.1 (or later) partitioned table, how does one define an UPDATE trigger and procedure to "move" a record from one partition to ...
Sino asked 25/11, 2009 at 16:35

2

Solved

A list of elements is given. I want to have all the possibilities to divide this list into any number of partitions so that each partition has at least x elements. The order of the partitions in th...

8

Solved

I made partition my 300MB table and trying to make select query from p0 partition with this command SELECT * FROM employees PARTITION (p0); But I am getting following error ERROR 1064 (42000): ...
Lichtenfeld asked 1/1, 2013 at 16:53

© 2022 - 2024 — McMap. All rights reserved.