data-mining Questions

6

Solved

UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI's DBSCAN implimentation to do my clustering rather than...

6

Solved

I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving t...
Businesswoman asked 7/4, 2016 at 7:40

5

Solved

I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. t...
Fini asked 10/8, 2009 at 8:52

7

Solved

So let's say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through sim...

4

Solved

When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent overfitting, we use regularization, and we include a lambda parameter in th...
Entomophagous asked 29/8, 2012 at 16:4

3

Solved

I want to calculate the similarity between two list of words, for example : ['email','user','this','email','address','customer'] is similar to this list: ['email','mail','address','netmail'] I ...
Barkentine asked 14/3, 2019 at 12:34

2

Solved

I use data mining by twitter. So I get value create_at from twitter to save in file excel after that send file excel to google sheet but It can't to send it. It have error this : response = servi...
Cosby asked 7/4, 2020 at 6:51

5

Solved

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting ...
Eclosion asked 14/1, 2014 at 10:23

3

I am working on a data mining project and I would like to gather historical weather data. I am able to get historical data through the web interface that they provide at http://www.ncdc.noaa.gov/cd...
Mendacious asked 14/11, 2013 at 13:19

2

I am running Python program, but I do not have a GPU, what can I do to make Python use CPU instead of GPU? $ python extract_feature.py --data mnist --net checkpoint_4.pth.tar --features pretrained ...
Novick asked 25/1, 2021 at 10:15

9

Solved

I am using DBSCAN to cluster some data using Scikit-Learn (Python 2.7): from sklearn.cluster import DBSCAN dbscan = DBSCAN(random_state=0) dbscan.fit(X) However, I found that there was no built-...

2

Solved

By my understanding of DBSCAN, it's possible for you to specify an epsilon of, say, 100 meters and — because DBSCAN takes into account density-reachability and not direct density-reachability when ...
Copp asked 31/8, 2013 at 10:29

2

Solved

How to rename columns with multiple levels after pandas pivot operation? Here's some code to generate test data: import pandas as pd df = pd.DataFrame({ 'c0': ['A','A','B','C'], 'c01': ['A','A1...
Wilbourn asked 7/2, 2017 at 20:9

4

Solved

I heard about clustering to group similar data. I want to know how it works in the specific case for String. I have a table with more than different 100,000 words. I want to identify the same wo...
Zashin asked 19/11, 2011 at 18:48

3

So I have this Table: Trans_ID Name Fuzzy_Value Total_Item 100 I1 0.33333333 3 100 I2 0.33333333 3 100 I5 0.33333333 3 200 I2 0.5 2 200 I5 0.5 2 300 I2 0.5 2 300 I3 0.5 2 400 I1 0.33333333 ...
Adapter asked 3/12, 2010 at 20:45

3

Solved

I need to cluster a simple univariate data set into a preset number of clusters. Technically it would be closer to binning or sorting the data since it is only 1D, but my boss is calling it cluster...

3

I am using LDAModel of pyspark to get topics from corpus. My goal is to find topics associated with each document. For that purpose I tried to set topicDistributionCol as per Docs. Since I am new t...

5

Solved

I am trying to create my own and simple feature selection algorithm. The data set that I am going to work with is here (very famous data set). Can someone give me a pointer on how to do so? I am p...
Middleweight asked 7/3, 2011 at 17:10

3

Solved

I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow. Is there a faster me...
Emptyhanded asked 27/12, 2016 at 10:33

7

Solved

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the...
Overexert asked 24/11, 2016 at 22:11

2

Solved

I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of l...
Mafia asked 24/2, 2013 at 9:29

6

Can anyone recommend a decision tree classifier implementation, in either Python or Java, that can be used incrementally? All the implementations I've found require you to provide all the features...
Lester asked 13/7, 2010 at 13:54

6

Solved

When we calculate the F-Measure considering both Precision and Recall, we take the harmonic mean of the two measures instead of a simple arithmetic mean. What is the intuitive reason behind takin...
Kursh asked 14/10, 2014 at 8:22

7

Solved

I have a corpus of text which contains some strings. In these strings, some are English words, some are random such as VmsVKmGMY6eQE4eMI, there are no limit on the number of characters in each stri...
Wringer asked 11/2, 2014 at 23:21

2

I'm new to Weka and I'm confused with the tool. I have a data set about fruit prices and related attributes. I'm trying to predict the specific fruit price using the data set. Since I'm new to Weka...
Extensity asked 17/11, 2012 at 17:11

© 2022 - 2025 — McMap. All rights reserved.