This might just take the form of a regex style find and delete. But for example "the, the" is always going to be just one "the" but "that that" sometimes is actual English that I want to keep (e.g., "I think that that is a problem"). So I would want to review repeated "that"s but just always remove duplicate "the"s and "and"s, etc.