postgresql重復數據的刪除 · postgresql手冊

今天在協助開發導表數據時發現有重復的數據，需要去重。去重的方法一般是找到重復數據中的一條，以某一唯一條件去掉其他重復值。oracle中常用的是根據rowid來做，PG中也有一個唯一字段ctid，也可以根據此來做,如果表里設置了oid,數據量不大的情況下也可以。當然如果表中有唯一的序列值，就更方便了。下面是以ctid來刪除重復數據的測試。測試數據 ~~~ postgres=# create table test(id int,name varchar); CREATE TABLE postgres=# insert into test values (1,'kenyon'); INSERT 0 1 postgres=# insert into test values (1,'kenyon'); INSERT 0 1 postgres=# insert into test values (1,'kenyon'); INSERT 0 1 postgres=# insert into test values (2,'kenyon_test'); INSERT 0 1 postgres=# insert into test values (2,'kenyon_test'); INSERT 0 1 postgres=# insert into test values (3,'test'); INSERT 0 1 postgres=# insert into test values (5,'test'); INSERT 0 1 postgres=# insert into test values (5,'jackson'); INSERT 0 1 postgres=# select ctid,* from test; ctid | id | name -------+----+------------- (0,1) | 1 | kenyon (0,2) | 1 | kenyon (0,3) | 1 | kenyon (0,4) | 2 | kenyon_test (0,5) | 2 | kenyon_test (0,6) | 3 | test (0,7) | 5 | test (0,8) | 5 | jackson (8 rows) ~~~ 查詢要保留的數據,以min(ctid)或max(ctid)為準 ~~~ postgres=# select ctid,* from test where ctid in (select min(ctid) from test group by id); ctid | id | name -------+----+------------- (0,1) | 1 | kenyon (0,4) | 2 | kenyon_test (0,6) | 3 | test (0,7) | 5 | test (4 rows) ~~~ 刪除重復數據,查看最后結果 ~~~ postgres=# delete from test where ctid not in (select min(ctid) from test group by id); DELETE 4 postgres=# select ctid,* from test; ctid | id | name -------+----+------------- (0,1) | 1 | kenyon (0,4) | 2 | kenyon_test (0,6) | 3 | test (0,7) | 5 | test (4 rows) ~~~ 如果表中已經有標明唯一的序列主鍵值，可以把該值替換上述的ctid直接刪除。