196.删除重复的电子邮箱
题目描述
表: Person
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| email | varchar |
+-------------+---------+
id 是该表的主键列(具有唯一值的列)。
该表的每一行包含一封电子邮件。电子邮件将不包含大写字母。
编写解决方案 删除 所有重复的电子邮件,只保留一个具有最小 id
的唯一电子邮件。
(对于 SQL 用户,请注意你应该编写一个 DELETE
语句而不是 SELECT
语句。)
(对于 Pandas 用户,请注意你应该直接修改 Person
表。)
运行脚本后,显示的答案是 Person
表。驱动程序将首先编译并运行您的代码片段,然后再显示 Person
表。Person
表的最终顺序 无关紧要 。
返回结果格式如下示例所示。
示例 1:
输入:
Person 表:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
输出:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
解释: john@example.com重复两次。我们保留最小的Id = 1。
SQL Schema
sql
DROP TABLE IF EXISTS Person;
Create table If Not Exists Person
(
Id int,
Email varchar(255)
);
insert into Person (id, email)
values ('1', 'john@example.com');
insert into Person (id, email)
values ('2', 'bob@example.com');
insert into Person (id, email)
values ('3', 'john@example.com');
答案
解法一:分组+连表+非等值筛选
sql
DELETE p
FROM person p
INNER JOIN
(SELECT MIN(id) AS `min_id`, email
FROM person
GROUP BY email
HAVING COUNT(email) > 1) t ON p.email = t.email AND p.id != t.min_id;
解法二:笛卡尔积,非等值筛选
sql
DELETE p1
FROM person p1
INNER JOIN person p2 ON p1.email = p2.email AND p1.id > p2.id;
解法三:子查询+排名窗口函数
sql
DELETE p
FROM person p
WHERE id IN (SELECT id
FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS `rn`
FROM person) t
WHERE t.rn > 1);