dask Questions
3
In the below, I want to capture "dask_client_log_msg" and other task-logs in one file and "dask_worker_log_msg" and other client-logs in a separate file. As obviously client wil...
Odontalgia asked 1/2, 2018 at 10:35
5
Solved
I have a dask dataframe created from a csv file and len(daskdf) returns 18000 but when I ddSample = daskdf.sample(2000) I get the error
ValueError: Cannot take a larger sample than population when...
3
Solved
I would like to see a progress bar on Jupyter notebook while I'm running a compute task using Dask, I'm counting all values of id column from a large csv file +4GB, so any ideas?
import dask.datafr...
Rhino asked 28/2, 2018 at 22:33
7
Performing .shape is giving me the following error.
AttributeError: 'DataFrame' object has no attribute 'shape'
How should I get the shape instead?
3
Solved
How do I add a new DataArray to an existing Dataset without overwriting the whole thing? The new DataArray shares some coordinates with the existing one, but also has new ones. In my current implem...
Leibniz asked 21/9, 2019 at 17:27
5
I am having issues accessing data inside a dictionary.
Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.
I am working with a dask.dataframe created from a csv.
Edit Ques...
Sophist asked 26/8, 2016 at 15:25
3
Solved
I use the following to create a local cluster from a Jupyter notebook :
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=24)
c = Client(cluster)
Is it possible...
Tirpitz asked 7/2, 2020 at 14:48
3
I have a parquet file with 10 row groups:
In [30]: print(pyarrow.parquet.ParquetFile("/tmp/test2.parquet").num_row_groups)
10
But when I load it using Dask Dataframe, it is read into a single pa...
2
Solved
I am getting the error stated in the question title when trying to import dask.dataframe interface, even though import dask works.
My current version of dask is 2022.7.0. What might be the problem?...
Blomquist asked 24/5, 2023 at 11:9
3
I have recently begun looking at Dask for big data.
I have a question on efficiently applying operations in parallel.
Say I have some sales data like this:
customerKey productKey transactionKey ...
Azine asked 28/3, 2018 at 11:2
6
Solved
After some searching I failed to find a thorough comparison of fastparquet and pyarrow.
I found this blog post (a basic comparison of speeds).
and a github discussion that claims that files crea...
Electrolyte asked 16/7, 2018 at 12:0
2
I want to use Dask on Databricks. It should be possible (I cannot see why not). If I import it, one of two things happens, either I get an ImportError but when I install distributed to solve this D...
Desorb asked 4/6, 2019 at 12:53
3
Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.
if I am provided an xlsx file with multiple tab...
Kiruna asked 20/6, 2017 at 13:47
1
I'm fairly new to xarray and I'm currently trying to leverage it to subset some NetCDFs. I'm running this on a shared server and would like to know how best to limit the processing power used by xa...
Rickey asked 17/9, 2018 at 20:3
2
I am loading my pre-trained keras model and then trying to parallelize a large number of input data using dask? Unfortunately, I'm running into some issues with this relating to how I'm creating my...
Festival asked 20/5, 2020 at 23:49
1
Solved
I was trying to implement a conjugate gradient algorithm using Dask (for didactic purposes) when I realized that the performance were way worst that a simple numpy implementation.
After a few exper...
4
Is there an equivalent package in R to Python's dask? Specifically for running Machine Learning algorithms on larger-than-memory data sets on a single machine.
Link to Python's Dask page:
https://...
2
I've successfully brought in one table using dask read_sql_table from a oracle database. However, when I try to bring in another table I get this error KeyError: 'Only a column name can be used for...
3
Solved
How do I call unique on a dask DataFrame ?
I get the following error if I try to call it the same way as for a regular pandas dataframe:
In [27]: len(np.unique(ddf[['col1','col2']].values))
Attr...
2
Solved
[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://172.26.32.36:50930'
distributed.diskutils - WARNING - Found stale lock file ...
Abott asked 7/2, 2018 at 6:32
4
Solved
Action
Reading two csv (data.csv and label.csv) to a single dataframe.
df = dd.read_csv(data_files, delimiter=' ', header=None, names=['x', 'y', 'z', 'intensity', 'r', 'g', 'b'])
df_label = dd.rea...
6
Solved
I installed Dask using pip like this:
pip install dask
and when I try to do import dask.dataframe as dd I get the following error message:
>>> import dask.dataframe as dd
Traceback (mo...
Reedreedbird asked 3/1, 2017 at 22:38
3
The following code is converting any kind of timestamp of dataframe into a given Format.
pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')
How can I do this with "DASK&qu...
4
Solved
Using dask distributed i try to submit a function that is located in another file named worker.py.
In workers i've the following error :
No module named 'worker'
However I'm unable to figure...
1
I am writing a df to a Parquet file using Dask:
df.to_parquet(file, compression='snappy', write_metadata_file=False,\
engine='pyarrow', index=None)
I need to present the contents of the file in a...
Submit asked 7/7, 2022 at 2:21
1 Next >
© 2022 - 2025 — McMap. All rights reserved.