Finding the number of unique items in a column
Often to check the content of a tab delemited file we want to know how many unique things there are in a particular column. Below I give you instructions for checking this using command line and the python package pandas.
Ìý
If you want to check for number of uniq things on the command line or in a shell script
How many unique things are in column <1> of a file named ?
# outputs a count of the unique things in a column
cut -f 1 input_file | sort | uniq | wc -l
#outputs the each of the unique things and how many of each there are
cut -f 1 input_file | sort | uniq –c
If you want to check for number of uniq things using the python package pandas
#my files first line is chr, start, stop, name, score. I want to know how many uniq chromosomes there are or how many lines have each of thechromosomes.
#First open python by typing python (if you are on fiji you must also module load python/2.7.3/pandas)
# outputs a count of the unique things in a column
import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].nunique()
#outputs the each of the unique things and how many of each there are
import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].value_counts()
Ìý
Ìý
import pandas
df = pandas.read_csv("allmu.bed", names=["chr", "start", "stop", "name", "score"], sep="\t")
print sort(df["chr"].value_counts())
print df["chr"].nunique()
Ìý
Ìý
Ìý