How to remove duplicates in sql


SqlServer, SqlServer Useful functions

Introduction


Certain times we notice duplicate in names/ AccountNames. Here is how to remove duplicates using CTE. Also there is a further query on how I reset the accountCodes after removing duplicates.

 

Description


1. Identify Duplicates Using CTE

WITH DuplicateAccounts AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY AccountName ORDER BY AccountId) AS rn
    FROM Accounts
)
SELECT * FROM DuplicateAccounts WHERE rn > 1;
Then Remove using 
WITH DuplicateAccounts AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY AccountName ORDER BY AccountId) AS rn
    FROM Accounts
)
DELETE FROM DuplicateAccounts
WHERE rn > 1;

----Resetting AccountCodes after removal of duplicates.

-- View before update

WITH OrderedAccounts AS (
    SELECT AccountId, AccountName,
           ROW_NUMBER() OVER (ORDER BY AccountName) + 1000 AS NewCode
    FROM Accounts
    WHERE AccountNature = 'Ledger'
)
SELECT * FROM OrderedAccounts;
 
-- update ---
 
WITH OrderedAccounts AS (
    SELECT AccountId,
           ROW_NUMBER() OVER (ORDER BY AccountName) + 1000 AS NewCode
    FROM Accounts
    WHERE AccountNature = 'Ledger'
)
UPDATE a
SET AccountId = CAST(o.NewCode AS VARCHAR(20))
FROM Accounts a
JOIN OrderedAccounts o ON a.AccountId = o.AccountId;