PCT: Point Cloud Transformer 논문 리뷰

https://arxiv.org/pdf/2012.09688.pdf

Computational visual media,2020

input embedding의 처리를 farthest point sampling 과 nearest neighbor search를 적용했다.(Point Transformer와 동일)

본 논문은 Transformer의 NLP 성공에 이어 Point cloud에 적용하였다. 적용하기 앞서 NLP의 데이터와 근본적인 차이를 극복하기 위해 다음과 같은 adjustments를 추가했다.

1. Point Cloud 데이터는 fixed order가 아니다. -> raw poisitional encoding 과 좌표에 기반한 인풋을 병합하여 사용하여, 각각의 point 가 unique한 좌표를 갖도록 하였다.

2. self-attention을 수정한 offset-attention module을 도입하였다.

offset-attention 기법이란 기존 self-attention 모듈 대신 self-attention인풋과 attention feature사이의 offset을 사용하는 기법이다. (sec 4에서 설명)

이 offset-attention의 장점은 다음과 같다.

첫째로,절대좌표가 아닌 offset의 상대좌표이기 때문에 rigid transformations등의 변환에 강건하다.

둘째로, Laplacian matrix가 graph 분야에서 상당히 효과적이라고 알려져 있는데 이 저자는 point cloud를 graph로 여기며 인접 matrix들을 attention map으로 사용

뿐만 아니라 row를 1로 합을 설정하여 단위 행렬로 하여, 라플라스 과정으로 간주되어진다. (sec 3.3에 자세히 나옴)

3. 모든 단어는 의미를 갖지만 point는 그렇지 않으므로 point cloud에 기존 self-attention 방식을 적용하여 global 한 feature를 뽑는 것은 무의미(?), 비효율 적이다. 그렇게 되면 local 정보에 취약할 수 밖에 없는데 point cloud set은 local 정보가 매우 중요하기 때문에다. 따라서 neighbor embedding을 통하여 local groups 간의 feature를 뽑는 방법을 적용하였다.

Model architecture

P ∈ R^(N×d) 인 N개의 점들이 각각 d-dimension을 가지고

AT^i 는 attention layer고 input 과 output이 동일하다. W_o 는 linear의 weight 이며 F_o는 이 F_i 전부를 concat 하여 weight를 곱해준 feature 이다. output(F_g) 을 뽑는것은 avg pooling 과 max pooling 두가지 기법을 사용한다.

classification 부터 살펴보면 위의 feature에서 LBRD을 2번 사용 하고 dropout 은 0.5로 하여 (usually 0.2~0.5) 각각의 class에 대한 classficiation score를 뽑는다.

Segmentation은 물체를 파트별로 나눠야하므로 (table top, table legs) 각각의 point 마다 label이 있어야 한다. 첫번째로 F_g와 F_o의 피쳐들을 concatenate 한다. (global feature와 local feature 모두를 잘 구별하기 위함) 그 다음에 64차원의 카테고리 one hot encoding을 encode 하고 global feature와 concat한다. ( PointNet++에서 사용된 기법) 그 후는 첫번째 LBR에서만 dropout이 수행되었다는 점외에는 classification과 비슷하다. 최종적으로 인풋에 대한 point-wise segmentation score(N x N_s)을 뽑고 각각의 라벨은 maximal score로 결정되어 진다.

그렇다면 이 과정에서 사용된 offset-attention을 알아보자.

Architecture of offset-attention

GCN(Graph conv net)에서 효과를 봤던 Laplacian matrix L 은 D −E 로써 the adjacency
matrix E를 대체했다. (where D is the diagonal degree matrix. ) 비슷하게, 이것을 point cloud에 적용하여서

offset-attention layer에서 SA feature 들과 input feature들의 차를 계산하여 offset을 구한다.

이 offset-attention은 LBR network 안에 SA feature를 대신하여 들어가게된다.

Fig 8 과 같은 흐름으로 F_in -F_sa 는 Laplacian 연산과 유사하다.

Neighbor Embedding architecture

*SG (sampling and grouping)

제안한 기본 PCT모델 자체가 global feature를 효과적으로 뽑을 수 있지만, local neighbor의 정보는 매우 중요한데 반해 어느정도 무시되어진다. 따라서 PointNet++, DGCNN의 아이디어를 끌어와 효과적으로 sampling하고 grouping하는 기법을 고안하였다. fig4에서 neighbor embedding module은 2가지 LBR layers와 2가지 SG layers로 이루어져 있다. 2가지 SG layer를 사용하여, feature aggregation 동안, 계단식으로 receptive field를 확장한다. 이때, aggregation은 유클라디안 거리에 기반한 K-NN search로 이루어진다. (자세하게는 처음에 P에서 FPS를 이용해 P_sampled 로 downsampling 한다. 그러면 P의 k-nearest에 대해 P_s의 point들이 knn을 수행한다)

Experimets

NPCT Architecture of offset-attention,Neighbor Embedding architecture 적용x

SPCT Architecture of offset-attention 적용

PCT Architecture of offset-attention,Neighbor Embedding architecture 적용

classification ModelNet40 dataset

Semantic segmentation task on S3DIS dataset

고찰

현재 나온 Point Transformers 논문들은 대부분 farthest sampling, Knn neighbor 에 대한 grouping을 적용하였다. pointnet++영향인지는 몰라도 그 근간에서 크게 안벗어난 논문들이 많다.

(수정중)

'논문리뷰' 카테고리의 다른 글

Stratified Transformer for 3D Point Cloud Segmentation 논문 리뷰 (0)	2022.06.17
Deep Learning for 3D Point Cloud : Survey 리뷰 (0)	2022.03.21
SampleNet 리뷰 (0)	2022.02.22
pointnet 논문 리뷰 (0)	2022.02.18

Hook0