How to clustering syllable types with python?
Asked Answered
V

0

0

This is my second question in stack overflow. I don't have to much experience with python, but had excellent results with my first question and I was able to implement the code from the answer, so I will try again with this new problem:

I am trying to classify syllable types from a canary song, in order to use each types as templates to find and classify large sets of data with similar behavior. I use the envelope of the singing. My data is a sampled array, with time and amplitude (a plot of the data is posted in http://ceciliajarne.web.unq.edu.ar/envelope-problem/ ). I try to use singular value decomposition algorithm from Numpy:

U,s,V = linalg.svd(A) # SVD decomposition of A

I'm not sure how to build a meaningful A matrix with the data of time series in order to follow this approach. How to cut the time series to obtain a matrix to analyze it?

I thought a possible second approach: the Hierarchical clustering. It may be a better solution, but I don't know how to use a clustering criteria. What I know is that:

  • There are around 10 different syllable types.
  • The distance between the minims and relative maximum in each type changes.
  • Also the length of each syllable. -Similar syllable has similar frequency behavior.

Which information can I use to fed the scipy.cluster.hierarchy. Function? I want to group the common syllable types in clusters.

I was inspired by: Unsupervised clustering with unknown number of clusters

But now I don't know how to implement a first test... Any idea could be very useful, this is the first time for me with patterns and time series.

Vesicant answered 28/10, 2015 at 14:12 Comment(5)
SVD is typically used for dimensionality reduction rather than finding clusters per se, although reducing the dimensionality of your data can be a useful preprocessing step for many clustering methods. It would be helpful if you could tell us something about the format of your data.Mllly
Thank you very much. My data is a 1d array representing the amplitude of of the sound envelope sampled. There is a link with one example of how it looks like when you plot the data.Vesicant
Presumably you have multiple of those 1D arrays. Do you consider each one of those to be a single syllable, or does each array consist of a sequence of syllables?Mllly
I have to cut the single sillable. There are around 15 different types, but I have to extract them from the sound file, and they have different size so I don't know how to chose a criteria to chop them.Vesicant
I fond a way to compare the syllables using the correlation function from scipy.Vesicant

© 2022 - 2024 — McMap. All rights reserved.