NeurIPS 2022์—์„œ ๋ฐœํ‘œ๋œ A Neural Corpus Indexer for Document Retrieval ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

Introduction

Document retrieval๊ณผ ranking์€ ์›น ๊ฒ€์ƒ‰ ์—”์ง„์— ์žˆ์–ด์„œ key stage๋“ค์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ document retrieval์„ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” end-to-end ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๋ฉฐ ์ด์ „ ์—ฐ๊ตฌ ๋Œ€๋น„ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ค„๋ƒˆ์Šต๋‹ˆ๋‹ค.

Document retrieval์€ ์ผ๋ฐ˜์ ์œผ๋กœ term-based ๋ฐฉ๋ฒ•๊ณผ semantic-based ๋ฐฉ๋ฒ•์œผ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. Term-based ๋ฐฉ๋ฒ•์€ TF-IDF์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ ๋ฌธ์„œ์˜ semantic ์ •๋ณด๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ํž˜๋“ค๋ฉฐ ๋น„์Šทํ•œ ๋ฌธ์„œ์—ฌ๋„ ๋‹ค๋ฅธ ๋‹จ์–ด๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๋ฉด ๊ฒ€์ƒ‰์— ์‹คํŒจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Semantic-based ๋ฐฉ๋ฒ•์˜ ๊ฒฝ์šฐ์—” ๊ฒ€์ƒ‰ query์™€ ๋ฌธ์„œ์˜ representation์„ ๊ธฐ๋ฐ˜์˜ ANN(apporximate. Nearest Neighbor)์ด ๋Œ€ํ‘œ์ ์ธ๋ฐ ์ด ๋˜ํ•œ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ๋ฌธ์„œ์˜ semantic์„ ๋ชจ๋‘ ๋‹ด์•„๋‚ด๊ธฐ ์–ด๋ ต๊ณ  query์™€ ๋ฌธ์„œ๋ฅผ ๊ฐ™์€ space ์ƒ์— ํ‘œํ˜„ํ•ด์•ผ๋งŒ ํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์˜ ์ €์ž๋“ค์€ ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ๋ณด์™„ํ•˜์—ฌ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ์œ„ํ•œ ๋ช‡๊ฐ€์ง€ ์ •๊ตํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

  1. Semantic identifier: hierarchical k-means๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์„œ์˜ semantic์„ ์ž˜ ๋‹ด์•„๋‚ธ identifier(docid)๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  2. Query generation: ๋ฌธ์„œ๋ฅผ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” query๋“ค์„ ์ƒ์„ฑํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  3. Prefix-aware weight-adaptive decoder: hierarchy level์— ๋”ฐ๋ผ decoder weight์„ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.
  4. Consistency-based regularization loss: ํ•™์Šต์—์„œ์˜ over-fitting์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

Neural Corpus Indexer

24 01 15 1

Taken From Wang et al.

Neural corpus indexer(NCI)๋Š” sequence-to-sequence ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. NCI๋Š” ๊ฒ€์ƒ‰ query๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ , document identifier(docid)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋งŽ์€ ์–‘์˜ <query, docid> pair๋กœ ๋ชจ๋ธ ํ•™์Šต์ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

Document Representation with Semantic Identifiers

๋จผ์ € ๋ชจ๋“  ๋ฌธ์„œ์— ๋Œ€ํ•ด docid๋ฅผ ๋ถ€์—ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ๋น„์Šทํ•œ document๋“ค ๋ผ๋ฆฌ ๊ฐ€๊นŒ์šด docid๋ฅผ ๊ฐ€์ง€๊ธฐ๋ฅผ ์›ํ–ˆ๊ณ  ์ด๋ฅผ ์œ„ํ•ด์„œ hierarchical clustering์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ผ๋‹จ ๋จผ์ € ๋ชจ๋“  ๋ฌธ์„œ๋ฅผ BERT ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒกํ„ฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น document ๋ฒกํ„ฐ๋“ค์— hierarchical k-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด riโˆˆ[0,k)r_i \in [0,k), routing path๋Š” l={r0,r1,...,rm}l=\{r_0, r_1, ..., r_m\}์ด๋ผ๊ณ  ํ•  ๋•Œ, ๋ชจ๋“  ๋ฌธ์„œ๋ฅผ root r0r_0์œผ๋กœ ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” tree structure๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ดํ•ด๋ฅผ ๋•๊ธฐ์œ„ํ•ด ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด์ž๋ฉด, docid=012์™€ docid=013์€ level 0๊ณผ 1์—์„œ ๊ฐ™์€ ๊ตฐ์ง‘์— ์†ํ•˜๋Š” ๋ฌธ์„œ์ž…๋‹ˆ๋‹ค. cc๊ฐ€ ํ•œ ๊ตฐ์ง‘์— ์กด์žฌํ•˜๋Š” document ์ˆ˜๋ผ๊ณ  ํ•  ๋•Œ ๋ชจ๋“  ์‹คํ—˜์—์„œ k=30,c=30k=30, c=30๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Query Generation

์˜ค์ง ๊ฒ€์ƒ‰ query๋งŒ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์ •๋‹ต document identifier๋ฅผ ์ž˜ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š”, ์–ด๋–ป๊ฒŒํ•ด์•ผ ํ•ด๋‹น ๋ฒกํ„ฐ๊ฐ€ document semantic์„ ์ž˜ ์•Œ์•„์ฑ„๊ณ  identifier๋ฅผ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ์„์ง€๋ฅผ ๊ณ ๋ฏผํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด ํ•™์Šต์—์„œ๋ถ€ํ„ฐ document semantic์„ ๋ชจ๋ธ์— ์ž˜ ํ˜๋ ค๋ณด๋‚ด์ค„ ํ•„์š”๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด๋ฅผ ์œ„ํ•ด ์ €์ž๋“ค์€ document ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ query๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” query generation ๋‹จ๊ณ„๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” DocT5Query์™€ Document As Query๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ–ˆ์œผ๋ฉฐ ์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ query๋“ค์€ training loss์— ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค (cross-entropy์™€ consistency-based loss์—์„œ ๋ชจ๋‘ ํ™œ์šฉ).

Prefix-Aware Weight-Adaptive Decoder
24 01 15 2

Taken From Wang et al.

์ฃผ์–ด์ง„ ์ž…๋ ฅ query์— ๋Œ€ํ•ด docid๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ณผ์ •์€ ์•„๋ž˜ ์‹์œผ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.

p(lโˆฃx,ฮธ)=โˆi=1mp(riโˆฃx,r1,r2,โ€ฆ,riโˆ’1,ฮธi)p(l \mid x, \theta)=\prod_{i=1}^m p\left(r_i \mid x, r_1, r_2, \ldots, r_{i-1}, \theta_i\right)

3152533_15_25_3์—์„œ 525_2์™€ 535_3์ด ์„œ๋กœ ๋‹ค๋ฅธ ๊ฒƒ์ด๊ณ , 1112531_11_25_3๊ณผ 2142532_14_25_3๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๊ฒƒ ์ฒ˜๋Ÿผ, tree level์— ๋”ฐ๋ผ, prefix์— ๋”ฐ๋ผ ํ† ํฐ์ด ๋‹ฌ๋ผ์ง€๋Š” ๊ฒƒ์„ ์ธ์ง€ํ•˜๊ธฐ ์œ„ํ•ด 3152533_15_25_3๊ณผ ๊ฐ™์€ identifier๋ฅผ (1,3)(2,5)(3,5) ํ˜•ํƒœ๋กœ ๋จผ์ € ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ ๋’ค์—๋Š” decoder๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ prefix๋ฅผ ์ธ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ, prefix์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” weight WadaW_{ada}์„ ๋งŒ๋“ค๊ณ  ์ด๋ฅผ ํ† ํฐ ์ž„๋ฒ ๋”ฉ๊ณผ ๊ณฑํ•ด์ฃผ๊ณ  ์ด ๊ฐ’์— softmax๋ฅผ ์ทจํ•ด์„œ tree level๋ณ„ docid๋ฅผ ๋ฝ‘์•„๋ƒ…๋‹ˆ๋‹ค.

Wadai=ย AdaptiveDecoderย (e;r1,r2,โ€ฆ,riโˆ’1)WiW_{a d a}^i=\text { AdaptiveDecoder }\left(e ; r_1, r_2, \ldots, r_{i-1}\right) W_i
Training and Inference

ํ•™์Šต์—์„œ๋Š” ๊ฒ€์ƒ‰ query์™€ query generation์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ document query ๋ชจ๋‘์— ๋Œ€ํ•ด์„œ consistency-based regularization๊ณผ cross-entropy loss๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

Lregย =โˆ’logโกexpโก(simโก(zi,1,zi,2)/ฯ„)โˆ‘k=1,kโ‰ 22Qexpโก(simโก((zi,1,zi,k)/ฯ„)\mathcal{L}_{\text {reg }}=-\log \frac{\exp \left(\operatorname{sim}\left(\mathbf{z}_{i, 1}, \mathbf{z}_{i, 2}\right) / \tau\right)}{\sum_{k=1, k \neq 2}^{2 Q} \exp \left(\operatorname{sim}\left(\left(\mathbf{z}_{i, 1}, \mathbf{z}_{i, k}\right) / \tau\right)\right.}
L(ฮธ)=โˆ‘(q,d)โˆˆD(logโกp(dโˆฃE(q),ฮธ)+ฮฑLreg)\mathcal{L}(\theta)=\sum_{(q, d) \in \mathcal{D}}\left(\log p(d \mid E(q), \theta)+\alpha \mathcal{L}_{r e g}\right)

์ถ”๋ก  ๋‹จ๊ณ„์—์„œ๋Š” ๋จผ์ € encoder network๋ฅผ ํ†ตํ•ด query embedding์„ ๋ฝ‘์•„๋‚ธ ๋’ค์—, decoder network์—์„œ beam search๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Beam search์— ๋Œ€ํ•œ ์„ค๋ช…์€ ์ด๊ณณ, ์ž์„ธํ•œ pseudocode์€ ๋…ผ๋ฌธ์˜ Appendix B3์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Experiments

๋ฐ์ดํ„ฐ์…‹์€ Natural Questions์™€ TriviaQA๋ผ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์˜€๊ณ , ๊ฐ๊ฐ 320k, 78k์˜ query-document pair๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค. Metric์œผ๋กœ๋Š” Recall@N, MRR(Mean Reciprocal Rank), R-precision์„ ์‚ฌ์šฉํ•˜์˜€๋Š”๋ฐ, ๋ชจ๋‘ ์ฃผ์–ด์ง„ query ๊ธฐ๋ฐ˜์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋ฌธ์„œ๋ฅผ ์ž˜ ์ฐพ์•„๋‚ด๋А๋ƒ๋ฅผ ์ธก์ •ํ•˜๋Š” metric์ž…๋‹ˆ๋‹ค.

24 01 15 3

Taken From Wang et al.

Conclusion

NCI๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ์ƒ์œผ๋กœ ํฐ ๋ฐœ์ „์„ ์ด๋ค„๋‚ผ ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„์ ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋Š” ์˜คํ”ˆ ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ค€์ด ์•„๋‹ˆ๋ผ ์‹ค์ œ web scale์—์„œ๋Š” document์˜ ์ˆ˜๊ฐ€ ํ›จ์”ฌ ๋งŽ์•„์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋” ํฐ model capacity๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋‘๋ฒˆ์งธ๋Š” real-time์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋น ๋ฅธ inference ์†๋„๊ฐ€ ์š”๊ตฌ๋œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ๋Š”, ์ƒˆ๋ฌธ์„œ๊ฐ€ ์‹œ์Šคํ…œ์— ์ถ”๊ฐ€๋˜๋Š” ๊ณผ์ •์ด ๊นŒ๋‹ค๋กญ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋ฌธ์„œ๊ฐ€ ์ถ”๊ฐ€๋  ๋•Œ ๋งˆ๋‹ค hierarchical clustering์„ ํ†ตํ•ด ๋ฌธ์„œ๋งˆ๋‹ค์˜ semantic identifier๋ฅผ ์žฌ์„ค์ •ํ•ด์ฃผ๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.