Grouping a list of integers with nearest values

Go To StackoverFlow.com

5

I have a list:

d = [23,67,110,25,69,24,102,109]

how can i group nearest values with dynamic gap, and create a tuple like this, what is the fastest method? :

[(23,24,25),(67,69),(102,109,110)]
2012-04-04 18:05
by pylover
k-means clustering - Joel Cornett 2012-04-04 18:12
how do you define "nearest values"? In my opinion, 102 isn't close to 109 at all, and belongs in its own group. Do you have an objective way of determining grouping - Kevin 2012-04-04 18:13
I agree with Kevin. It's all very arbritrary--which is fine--you just have to define more specifically how you would like to split the numbers and also how you would NOT like them to be split - Joel Cornett 2012-04-04 18:14
the problem is here , i doesn't have an objective way, just i need to computer determines it, it can be a fuzzy logi - pylover 2012-04-04 18:15
@pylover Computers can only do exactly what you tell them to, if you can't describe the logic, you can't expect the computer to invent it for you - Andrew Clark 2012-04-04 18:17
thanks Joel, k-means seems to resolve my problem, this is a great idea & entry point to best practice. i have some experience in data clustering, but i was forgotten them completely.sorry for my eng - pylover 2012-04-04 18:19
I second Kevin, too. You'll have to consider your actual application. Also k-means still requires you to pick k. How will you do that? There's a rich literature on this subject, so if you want a simple answer, then you'll have to do something application specific - cape1232 2012-04-04 18:24


13

Like

d = [23,67,110,25,69,24,102,109]

d.sort()

diff = [y - x for x, y in zip(*[iter(d)] * 2)]
avg = sum(diff) / len(diff)

m = [[d[0]]]

for x in d[1:]:
    if x - m[-1][0] < avg:
        m[-1].append(x)
    else:
        m.append([x])


print m
## [[23, 24, 25], [67, 69], [102, 109, 110]]

Fist we calculate an average difference between sequential elements and then group together elements whose difference is less than average.

2012-04-04 18:18
by georg
thanks, this work - pylover 2012-04-04 18:24
@thg435: +1. This is really clever. However, if d = [1,2,4,5] then m becomes [[1], [2], [4], [5]] instead of [[1, 2], [4, 5]]. I think this can be fixed by changing diff to diff = [data[i+1]-data[i] for i in range(len(data)-1)] and the condition to x - m[-1][-1] < avg - unutbu 2012-04-04 19:32
can you do java or pseudocode for this - alexey polusov 2018-12-25 17:45
Ads