Removing Duplicates from Large Datasets