how to optimally count elements in a python list

Not enough credits to upvote
0
Not enough credits to downvote
By: Speedbird (SysAdmin), Created 8 years ago, Updated 7 years ago.
Hello,

This is almost the same question than here, except that I am asking about the most efficient solution for a sorted result.

I have a list (about 10 integers randomly between 0 and 12), for example:

the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]

I want to create a function that returns a list of tuples (item, counts) ordered by the first element, for example

output = [(4, 3), (5, 4), (6, 1), (7, 2)]

So far I have used:

def dupli(the_list):
return [(item, the_list.count(item)) for item in sorted(set(the_list))]


But I call this function almost a millon time and I need to make it as fast as I (python) can. Therefore my question: How to make this function less time comsuming? (what about memory?)

I have played around a bit, but nothing obvious came up:

from timeit import Timer as T
number=10000
setup = "the_list=[5, 7, 6, 5, 5, 4, 4, 7, 5, 4]"

stmt = "[(item, the_list.count(item)) for item in sorted(set(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[230]: 0.058799982070922852

stmt = "L = []; \nfor item in sorted(set(the_list)): \n L.append((item, the_list.count(item)))"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[233]: 0.065041065216064453

stmt = "[(item, the_list.count(item)) for item in set(sorted(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[236]: 0.098351955413818359
Tags

1 Responses

Not enough credits to upvote
6
Not enough credits to downvote
Accepted Answer
By: josh.bohde (Member), 8 years ago
Python 2.6

In [1]: from collections import defaultdict

In [3]: def dupli(the_list):
...: counts = defaultdict(int)
...: for item in the_list:
...: counts[item] += 1
...: return counts.items()
...:

In [4]: dupli([5, 7, 6, 5, 5, 4, 4, 7, 5, 4])
Out[4]: [(4, 3), (5, 4), (6, 1), (7, 2)]

In [14]: %timeit [(item, the_list.count(item)) for item in sorted(set(the_list))]
100000 loops, best of 3: 3.26 us per loop

In [15]: %timeit dupli(the_list)
100000 loops, best of 3: 3.14 us per loop


If you have 2.7, then the builtin Counter is probably the fastest,

>>> from collections import Counter
>>> Counter([5, 7, 6, 5, 5, 4, 4, 7, 5, 4]).items()
[(4, 3), (5, 4), (6, 1), (7, 2)]


To be honest though, optimizing for only 10 elements will be difficult. It would probably be more efficient to find a way to not run the code a million times.

You may post an answer by signing in or registering for an account Here

QA-Stack.com

Welcome to QA-Stack.com - QA-Stack is a Q&A open source web application written in the best programming language in the galaxy: Python and using the best web framework in the solar system: web2py, be sure to check forum.qa-stack.com for discussions about this site. Have a great stay.

QA-Stack source code is available thanks to the cool folks at bitbucket.org, browse the source, or follow updates to this projects by visiting the qa-stack page at https://bitbucket.org/speedbird/qastack.

Documentation

Documentation for QA-Stack can be found at the QA-Stack Google Docs Page.

Bugs

Submit your bugs in i-track, another web application made by the same author (we eat our own dog food): View currently submitted bugs on i-track.

Popular Tags