Monday, 8 July 2019

linux - Counting occurrences in first column of a file


We have this file:


1 2 
1 3
1 2
3 3
52 1
52 300

and 1000 more.


I want to count the number of times each value occurs in the first column.


1  3 
3 1
52 2

This means we saw 1 three times.


How can I do that, in Perl, AWK or Bash?



Answer



If the input is sorted, you can use uniq:



If not, sort it first:



Output:


  3 1                                      
1 3
2 52

The output is swapped compared to your requirement, you can use awk '{ print $2, $1 }' to change that.


1 3 
3 1
52 2

There's also the awk idiom, which does not require sorted input:


awk '{h[$1]++}; END { for(k in h) print k, h[k] }'

Output:


1 3
52 2
3 1

As the output here comes from a hash it will not be ordered, pass to sort -n if that is needed:


awk '{h[$1]++} END { for(k in h) print k, h[k] }' | sort -n

If you're using GNU awk, you can do the sorting from within awk:


awk '{h[$1]++} END { n = asorti(h, d, "@ind_num_asc"); for(i=1; i<=n; i++) print d[i], h[d[i]] }'

In the last two cases the output is:


1 3
3 1
52 2

No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...