I wrote a thing
-
I wrote a utility in Go to get unique strings like the
uniq
utility. The plus side to mine is you don't need to sort the input, it's faster, and cross platform. You can run over stdin or a file.https://gitlab.com/hooksie1/goniq
Here's a timed run with 2,799,264 words. It's a list of 466,544 words repeated 6 times.
time sort allwords.txt | uniq sort allwords.txt 6.58s user 0.21s system 127% cpu 5.348 total uniq 2.88s user 0.79s system 68% cpu 5.347 total time goniq allwords.txt goniq allwords.txt 1.96s user 0.81s system 114% cpu 2.428 total
But even with a sorted list it's still faster:
uniq allwordssorted.txt 2.90s user 0.73s system 99% cpu 3.651 total goniq allwordssorted.txt 1.66s user 0.74s system 120% cpu 1.986 total
-
@stacksofplates Very interesting, thanks!