data-mining - McMap

6

Solved

UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI's DBSCAN implimentation to do my clustering rather than...

python scikit-learn cluster-analysis data-mining dbscan

Derick asked 5/5, 2013 at 5:4

6

Solved

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) [closed]

I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving t...

r machine-learning cluster-analysis data-mining k-means

Businesswoman asked 7/4, 2016 at 7:40

5

Solved

Finding the center of a cluster

I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. t...

algorithm cluster-analysis data-mining

Fini asked 10/8, 2009 at 8:52

7

Solved

1D Number Array Clustering

So let's say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through sim...

arrays cluster-analysis data-mining dimension partition-problem

Maidy asked 16/7, 2012 at 22:25

4

Solved

How to calculate the regularization parameter in linear regression

When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent overfitting, we use regularization, and we include a lambda parameter in th...

machine-learning data-mining regression

Entomophagous asked 29/8, 2012 at 16:4

3

Solved

I want to calculate the similarity between two list of words, for example : ['email','user','this','email','address','customer'] is similar to this list: ['email','mail','address','netmail'] I ...

python data-mining text-mining similarity

Barkentine asked 14/3, 2019 at 12:34

2

Solved

How to fix Python at Object of type datetime is not JSON serializable error

I use data mining by twitter. So I get value create_at from twitter to save in file excel after that send file excel to google sheet but It can't to send it. It have error this : response = servi...

python json google-sheets twitter data-mining

Cosby asked 7/4, 2020 at 6:51

5

Solved

Implementation of k-means clustering algorithm

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting ...

java algorithm data-mining cluster-analysis k-means

Eclosion asked 14/1, 2014 at 10:23

3

Historical weather data from NOAA

I am working on a data mining project and I would like to gather historical weather data. I am able to get historical data through the web interface that they provide at http://www.ncdc.noaa.gov/cd...

web-scraping data-mining weather-api

Mendacious asked 14/11, 2013 at 13:19

2

please use torch.load with map_location=torch.device('cpu')

I am running Python program, but I do not have a GPU, what can I do to make Python use CPU instead of GPU? $ python extract_feature.py --data mnist --net checkpoint_4.pth.tar --features pretrained ...

python deep-learning data-mining

Novick asked 25/1, 2021 at 10:15

9

Solved

scikit-learn: Predicting new points with DBSCAN

I am using DBSCAN to cluster some data using Scikit-Learn (Python 2.7): from sklearn.cluster import DBSCAN dbscan = DBSCAN(random_state=0) dbscan.fit(X) However, I found that there was no built-...

machine-learning scikit-learn cluster-analysis data-mining dbscan

Marry asked 7/1, 2015 at 15:27

2

Solved

dbscan - setting limit on maximum cluster span

By my understanding of DBSCAN, it's possible for you to specify an epsilon of, say, 100 meters and — because DBSCAN takes into account density-reachability and not direct density-reachability when ...

python algorithm cluster-analysis data-mining dbscan

Copp asked 31/8, 2013 at 10:29

2

Solved

pandas pivot table rename columns

How to rename columns with multiple levels after pandas pivot operation? Here's some code to generate test data: import pandas as pd df = pd.DataFrame({ 'c0': ['A','A','B','C'], 'c01': ['A','A1...

python pandas pivot pivot-table data-mining

Wilbourn asked 7/2, 2017 at 20:9

4

Solved

How does clustering (especially String clustering) work?

I heard about clustering to group similar data. I want to know how it works in the specific case for String. I have a table with more than different 100,000 words. I want to identify the same wo...

string cluster-analysis data-mining

Zashin asked 19/11, 2011 at 18:48

3

Data Mining Operation using SQL Query (Fuzzy Apriori Algorithm) - Coding it using SQL

So I have this Table: Trans_ID Name Fuzzy_Value Total_Item 100 I1 0.33333333 3 100 I2 0.33333333 3 100 I5 0.33333333 3 200 I2 0.5 2 200 I5 0.5 2 300 I2 0.5 2 300 I3 0.5 2 400 I1 0.33333333 ...

sql data-mining apriori

Adapter asked 3/12, 2010 at 20:45

3

Solved

How would one use Kernel Density Estimation as a 1D clustering method in scikit learn?

I need to cluster a simple univariate data set into a preset number of clusters. Technically it would be closer to binning or sorting the data since it is only 1D, but my boss is calling it cluster...

machine-learning scikit-learn cluster-analysis data-mining kernel-density

Telmatelo asked 29/1, 2016 at 21:35

3

How to get topic associated with each document using pyspark(2.1.0) LdA?

I am using LDAModel of pyspark to get topics from corpus. My goal is to find topics associated with each document. For that purpose I tried to set topicDistributionCol as per Docs. Since I am new t...

pyspark data-mining lda topic-modeling data-processing

Nils asked 31/1, 2017 at 13:9

5

Solved

Simplest feature selection algorithm

I am trying to create my own and simple feature selection algorithm. The data set that I am going to work with is here (very famous data set). Can someone give me a pointer on how to do so? I am p...

algorithm machine-learning data-mining semantic-analysis

Middleweight asked 7/3, 2011 at 17:10

3

Solved

sklearn Clustering: Fastest way to determine optimal number of cluster on large data sets

I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow. Is there a faster me...

python scikit-learn cluster-analysis data-mining bigdata

Emptyhanded asked 27/12, 2016 at 10:33

7

Solved

PCA For categorical features?

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the...

python machine-learning scikit-learn data-mining

Overexert asked 24/11, 2016 at 22:11

2

Solved

Estimating/Choosing optimal Hyperparameters for DBSCAN

I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of l...

data-mining cluster-analysis dbscan

Mafia asked 24/2, 2013 at 9:29

6

Interactive Decision Tree Classifier

Can anyone recommend a decision tree classifier implementation, in either Python or Java, that can be used incrementally? All the implementations I've found require you to provide all the features...

machine-learning data-mining decision-tree

Lester asked 13/7, 2010 at 13:54

6

Solved

Why is the F-Measure a harmonic mean and not an arithmetic mean of the Precision and Recall measures?

When we calculate the F-Measure considering both Precision and Recall, we take the harmonic mean of the two measures instead of a simple arithmetic mean. What is the intuitive reason behind takin...

machine-learning classification data-mining

Kursh asked 14/10, 2014 at 8:22

7

Solved

How whether a string is randomly generated or plausibly an English word?

I have a corpus of text which contains some strings. In these strings, some are English words, some are random such as VmsVKmGMY6eQE4eMI, there are no limit on the number of characters in each stri...

java text data-mining text-mining

Wringer asked 11/2, 2014 at 23:21

2

How to use Weka for predicting results

I'm new to Weka and I'm confused with the tool. I have a data set about fruit prices and related attributes. I'm trying to predict the specific fruit price using the data set. Since I'm new to Weka...

dataset data-mining classification weka prediction

Extensity asked 17/11, 2012 at 17:11

data-mining Questions

Recommended topics

Hot tags