๐ŸŒŒ Deep Learning/๋…ผ๋ฌธ ๋ฆฌ๋ทฐ [KOR]

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ + ์ฝ”๋“œ] PointCutMix: Regularization Strategy for Point Cloud Classification (Neurocomputing 2022)

๋ณต๋งŒ 2022. 9. 14. 17:24

CutMix augmentation์„ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•œ ๋…ผ๋ฌธ์ด๋‹ค. ๋‘ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ผ๋Œ€์ผ ๋Œ€์‘๊ด€๊ณ„๋ฅผ ์ฐพ๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‘ ๋ฐ์ดํ„ฐ๋ฅผ ์„ž๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ–ˆ๋‹ค.

 

Paper: https://arxiv.org/pdf/2101.01461.pdf

Code: https://github.com/cuge1995/PointCutMix

 

 

 

Introduction

์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด mixed sample data augmentation (MSDA)๊ฐ€ ํ™œ๋ฐœํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜์–ด ์™”๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋Š” MixUp (Zhang et al., 2018)๊ณผ CutMix (Yun et al., 2019) ๊ฐ€ ์žˆ๋‹ค.

 

Yun, Sangdoo, et al. "Cutmix: Regularization strategy to train strong classifiers with localizable features."  Proceedings of the IEEE/CVF international conference on computer vision . 2019.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด CutMix๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” PointCutMix๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

 

 

Related Work

  • ์ผ๋ฐ˜์ ์œผ๋กœ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์‚ฌ์šฉ๋˜๋˜ augmentation์œผ๋กœ๋Š” random rotation, jittering, scaling ๋“ฑ์ด ์žˆ๋‹ค. (Qi et al., 2017a;b)
  • ์ด์™ธ์—๋„ PointAugment (Li et al., 2020), PointMixUp (Chen et al., 2020) ๋“ฑ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค.
    • PointAugment์˜ ๊ฒฝ์šฐ๋Š” adversarial training์„ ์ด์šฉํ•ด augmenter๊ณผ classifier network๋ฅผ ๋™์‹œ์— ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ํ•™์Šต์ด ์–ด๋ ต๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.
    • PointMixUp์€ ๋ง ๊ทธ๋Œ€๋กœ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด MixUp augmentation์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, local feature์„ ์žƒ์–ด๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

 

 

Method

1. Optimal assignment of point clouds

 

MSDA๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ๋‘ sample์˜ ์ตœ์†Œ ๋‹จ์œ„์— ๋Œ€ํ•ด ์ผ๋Œ€์ผ ๋Œ€์‘์ด ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. 

 

์ตœ์†Œ ๋‹จ์œ„๋ž€, ์ด๋ฏธ์ง€์˜ ๊ฒฝ์šฐ์—๋Š” ํ”ฝ์…€์— ํ•ด๋‹นํ•˜๊ณ , ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ํ•˜๋‚˜์˜ ํฌ์ธํŠธ์— ํ•ด๋‹นํ•œ๋‹ค.

 

์ด๋ฏธ์ง€์˜ ๊ฒฝ์šฐ resize, crop ๋“ฑ์„ ์ด์šฉํ•˜๋ฉด ๋‘ sample ๊ฐ„์˜ ํ”ฝ์…€ ๊ฐ„ ์ผ๋Œ€์ผ ๋Œ€์‘์„ ๊ฐ„๋‹จํžˆ ์ •์˜ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ํฌ์ธํŠธํด๋ผ์šฐ๋“œ๋Š” ํ•ด๋‹น ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” PointMixUp (Chen et al., 2020)๊ณผ MSN (Liu et al., 2020a)์—์„œ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•์„ ๋”ฐ๋ผ, Earth Mover's Distance (EMD)๋ฅผ ์œ„ํ•œ optimal assignment $\phi^*$๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

 

$\phi^*=argmin_{\phi\in\Phi}\sum_i||x_{1, i}-x_{2,\phi(i)}||_2$

 

์ด๋Š” ๋‘ ์  ์‚ฌ์ด ๊ฑฐ๋ฆฌ๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๊ฒŒ ํ•˜๋Š” ์ผ๋Œ€์ผ ๋Œ€์‘ ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

 

EMD๋Š” $\phi^*$๋ฅผ ์ด์šฉํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

 

$EMD=\frac1N\sum_i||x_{1,i}-x_{2,\phi^*(i)}||_2$

 

2. Mixing algorithm

 

PointCutMix์˜ ๋ชฉํ‘œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ ๊ฐœ์˜ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ($x_1, y_1$), ($x_2, y_2$)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ƒˆ๋กœ์šด ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ($\tilde x, \tilde y$)๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋•Œ, $x$๋Š” ํฌ์ธํŠธํด๋ผ์šฐ๋“œ, $y$๋Š” classification ๋ผ๋ฒจ์„ ์˜๋ฏธํ•œ๋‹ค.

 

  • $\tilde x = B\cdot x_1 + (I_N-B)\cdot \tilde x_2$
  • $\tilde y = \lambda y_1 + (1-\lambda)y_2$

 

์ด๋•Œ $B$๋Š” $b1, b2, ..., b_N$์„ ๋Œ€๊ฐ์„ฑ๋ถ„์œผ๋กœ ๊ฐ–๋Š” ๋Œ€๊ฐํ–‰๋ ฌ๋กœ, $b_i$๋Š” $i$๋ฒˆ์งธ ํฌ์ธํŠธ๋ฅผ samplingํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” 0๋˜๋Š” 1์˜ ๊ฐ’์ด๋‹ค.

 

์–ด๋–ค ํฌ์ธํŠธ๋ฅผ samplingํ• ์ง€ ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‘๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. PointCutMix-R: $x_1$์—์„œ randomํ•˜๊ฒŒ $n$๊ฐœ์˜ ํฌ์ธํŠธ๋ฅผ samplingํ•œ๋‹ค.

2. PoitnCutMix-K: centtral point๋ฅผ ํ•˜๋‚˜ ์ •ํ•˜๊ณ , nearest neighbor์„ ์ด์šฉํ•ด ์ฃผ์œ„์˜ $n-1$๊ฐœ์˜ ์ ์„ ์„ ํƒํ•ด samplingํ•œ๋‹ค.

 

 

$\lambda$๋Š” sampling ratio๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, $Beta(\beta, \beta)$ ๋ถ„ํฌ์—์„œ ์ถ”์ถœ๋œ๋‹ค. $\beta$๊ฐ€ ํด์ˆ˜๋ก ๋ถ„ํฌ๋Š” narrowํ•ด์ง€๋ฉฐ, ์ด๋Š” ๋ชจ๋“  ์‹œํ–‰์— ๋Œ€ํ•ด ๋น„์Šทํ•œ sampling ratio๋ฅผ ์‚ฌ์šฉํ•จ์„ ์˜๋ฏธํ•œ๋‹ค.

 

 

์„œ๋กœ ๋‹ค๋ฅธ sampling ratio $\lambda$๊ฐ’์— ๋”ฐ๋ฅธ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

 

 

 

Result

Comparison on different $\rho$ and $\beta$ values

 

PointCutMix augmentation์„ ์ ์šฉํ•  ํ™•๋ฅ ์ธ $\rho$์™€, sampling ratio์˜ ํŽธ์ฐจ๋ฅผ ์กฐ์ ˆํ•˜๋Š” $\beta$ ๊ฐ’์„ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ ์‹คํ—˜ํ–ˆ๋‹ค.

 

์šฐ์„  $\rho$์˜ ๊ฒฝ์šฐ 0์ผ ๋•Œ (augmentation์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ) ๋ณด๋‹ค ํ•ญ์ƒ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜๊ณ  ๊ทธ ๊ฐ’์— ๋”ฐ๋ฅธ ์ฐจ์ด๊ฐ€ ๋ณ„๋กœ ์—†๋Š” ๋ฐ์— ๋ฐ˜ํ•ด, PointNet๋งŒ $\rho$ ๊ฐ’์ด ์ปค์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ํ•˜๋ฝํ–ˆ๋‹ค. PointNet์€ ๊ตฌ์กฐ์ƒ localํ•œ feature์„ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•˜๋Š”๋ฐ, ์ €์ž๋“ค์€ ์ด๋กœ ์ธํ•ด PointNet ๊ตฌ์กฐ์—๋Š” PointCutMix๊ฐ€ ํšจ๊ณผ๊ฐ€ ์—†๋‹ค๊ณ  ํŒ๋‹จ, ์ดํ›„์˜ ์‹คํ—˜์—์„œ PointNet์„ ์ œ์™ธํ–ˆ๋‹ค.

 

 

$\beta$์˜ ๊ฒฝ์šฐ๋„ ๊ฐ’์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๋งค์šฐ ์ ์œผ๋‚˜, RS-CNN์€ ๋งค์šฐ ์ž‘์€ $\beta$๊ฐ’์„ ์„ ํ˜ธํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. $\beta$๊ฐ’์ด ์ž‘์„์ˆ˜๋ก, ๋งค ์‹œํ–‰๋งˆ๋‹ค ๋น„์Šทํ•œ sampling ratio๋ฅผ ์‚ฌ์šฉํ•จ์„ ์˜๋ฏธํ•œ๋‹ค.

 

 

Classification Result

 

$\rho=1.0$, $\beta=1.0$์œผ๋กœ ์„ค์ •ํ–ˆ๋‹ค.

 

 

Baseline, PointMixUp, PointAugment ๋“ฑ๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๊ฑฐ์˜ ํ•ญ์ƒ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

 

 

Code (PyTorch Implementation)

์ฝ”๋“œ๋Š” Github์— ๊ณต๊ฐœ๋œ official implementation์„ ์ผ๋ถ€ ์ˆ˜์ •ํ–ˆ๋‹ค.

 

1. Optimal assignment of point clouds

 

๋‘ Batch ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ผ๋Œ€์ผ๋Œ€์‘ ํ•จ์ˆ˜ ๋ถ€๋ถ„์ด๋‹ค. ์ด ํ•จ์ˆ˜ ์—ญ์‹œ Gradient backpropagation์„ ํ†ตํ•ด ํ•จ๊ป˜ optimize๋œ๋‹ค.

#codes from https://github.com/cuge1995/PointCutMix/blob/main/emd/emd_module.py

class emdFunction(Function):
    @staticmethod
    def forward(ctx, xyz1, xyz2, eps, iters):

        batchsize, n, _ = xyz1.size()
        _, m, _ = xyz2.size()

        assert(n == m)
        assert(xyz1.size()[0] == xyz2.size()[0])
        assert(n % 1024 == 0)
        assert(batchsize <= 512)

        xyz1 = xyz1.contiguous().float().cuda()
        xyz2 = xyz2.contiguous().float().cuda()
        dist = torch.zeros(batchsize, n, device='cuda').contiguous()
        assignment = torch.zeros(batchsize, n, device='cuda', dtype=torch.int32).contiguous() - 1
        assignment_inv = torch.zeros(batchsize, m, device='cuda', dtype=torch.int32).contiguous() - 1
        price = torch.zeros(batchsize, m, device='cuda').contiguous()
        bid = torch.zeros(batchsize, n, device='cuda', dtype=torch.int32).contiguous()
        bid_increments = torch.zeros(batchsize, n, device='cuda').contiguous()
        max_increments = torch.zeros(batchsize, m, device='cuda').contiguous()
        unass_idx = torch.zeros(batchsize * n, device='cuda', dtype=torch.int32).contiguous()
        max_idx = torch.zeros(batchsize * m, device='cuda', dtype=torch.int32).contiguous()
        unass_cnt = torch.zeros(512, dtype=torch.int32, device='cuda').contiguous()
        unass_cnt_sum = torch.zeros(512, dtype=torch.int32, device='cuda').contiguous()
        cnt_tmp = torch.zeros(512, dtype=torch.int32, device='cuda').contiguous()

        emd.forward(xyz1, xyz2, dist, assignment, price, assignment_inv, bid, bid_increments, max_increments, unass_idx, unass_cnt, unass_cnt_sum, cnt_tmp, max_idx, eps, iters)

        ctx.save_for_backward(xyz1, xyz2, assignment)
        return dist, assignment

    @staticmethod
    def backward(ctx, graddist, gradidx):
        xyz1, xyz2, assignment = ctx.saved_tensors
        graddist = graddist.contiguous()

        gradxyz1 = torch.zeros(xyz1.size(), device='cuda').contiguous()
        gradxyz2 = torch.zeros(xyz2.size(), device='cuda').contiguous()

        emd.backward(xyz1, xyz2, gradxyz1, graddist, assignment)
        return gradxyz1, gradxyz2, None, None

class emdModule(nn.Module):
    def __init__(self):
        super(emdModule, self).__init__()
	
    def forward(self, input1, input2, eps, iters):
        return emdFunction.apply(input1, input2, eps, iters)

 

 

 

2. Mixing algorithm (PointCutMix-R)

 

PointCutMix-R์˜ ์ฝ”๋“œ์ด๋‹ค. Batch ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ž์€ ํ›„ ์œ„์—์„œ ์ •์˜ํ•œ emdModule์„ ์ด์šฉํ•ด ์›๋ž˜ ๋ฐ์ดํ„ฐ์™€์˜ ์ผ๋Œ€์ผ๋Œ€์‘์„ ๊ณ„์‚ฐํ•œ ํ›„, ๊ฐ๊ฐ์˜ Batch ๋ฐ์ดํ„ฐ๋ฅผ ์ผ์ • ๋น„์œจ๋กœ ์„ž๋Š”๋‹ค.

# codes are modified from https://github.com/cuge1995/PointCutMix/blob/main/train_pointcutmix_r.py#L252

for points, label in train_loader:
    target = label
    r = np.random.rand(1)
    if r < cutmix_prob:
        lam = np.random.beta(beta, beta)

        B = points.size(0)
        rand_index = torch.randperm(B) #shuffled index

        target_a = target
        target_b = target[rand_index] #shuffled label

        point_a = points #[B, num_points, num_features]
        point_b = points[rand_index] #shuffled points
        point_c = points[rand_index]

        remd = emdModule() #optimal assignment of point clouds
        _, ind = remd(point_a, point_b, 0.005, 300) #assignment function [B, num_points]

        for ass in range(B):
            point_c[ass] = point_c[ass][ind[ass].long(), :] #rearrange pixels that correspond to point_a

        int_lam = max(int(num_points * lam), 1) #number of points to sample
        gamma = np.random.choice(num_points, int_lam, replace=False, p=None) #points to sample (Random sampling)

        #cutmix
        for i2 in range(B):
            points[i2, gamma] = point_c[i2, gamma]
            
        #adjust lambda to exactly match point ratio
        lam = int_lam * 1.0 / num_points
        
        #pred and calculate loss
        pred = model(points)
        loss = loss_f(pred, target_a.long()) * (1. - lam) + loss_f(pred, target_b.long()) * lam
        
    else:
        pred = model(points)
        loss = loss_f(pred, target.long())
๋ฐ˜์‘ํ˜•