Skip to content

196.删除重复的电子邮箱

Static BadgeStatic BadgeStatic Badge

题目描述

表: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id 是该表的主键列(具有唯一值的列)。
该表的每一行包含一封电子邮件。电子邮件将不包含大写字母。

编写解决方案 删除 所有重复的电子邮件,只保留一个具有最小 id 的唯一电子邮件。

(对于 SQL 用户,请注意你应该编写一个 DELETE 语句而不是 SELECT 语句。)

(对于 Pandas 用户,请注意你应该直接修改 Person 表。)

运行脚本后,显示的答案是 Person 表。驱动程序将首先编译并运行您的代码片段,然后再显示 Person 表。Person 表的最终顺序 无关紧要

返回结果格式如下示例所示。

示例 1:

输入: 
Person 表:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+
输出: 
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+
解释: john@example.com重复两次。我们保留最小的Id = 1。

SQL Schema

sql
DROP TABLE IF EXISTS Person;
Create table If Not Exists Person
(
    Id    int,
    Email varchar(255)
);
insert into Person (id, email)
values ('1', 'john@example.com');
insert into Person (id, email)
values ('2', 'bob@example.com');
insert into Person (id, email)
values ('3', 'john@example.com');

答案

解法一:分组+连表+非等值筛选

sql
DELETE p
FROM person p
         INNER JOIN
     (SELECT MIN(id) AS `min_id`, email
      FROM person
      GROUP BY email
      HAVING COUNT(email) > 1) t ON p.email = t.email AND p.id != t.min_id;

解法二:笛卡尔积,非等值筛选

sql
DELETE p1
FROM person p1
         INNER JOIN person p2 ON p1.email = p2.email AND p1.id > p2.id;

解法三:子查询+排名窗口函数

sql
DELETE p
FROM person p
WHERE id IN (SELECT id
             FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS `rn`
                   FROM person) t
             WHERE t.rn > 1);