์ƒˆ๋กญ๊ฒŒ ์•Œ๊ฒŒ ๋œ ์ง€์‹ ์ค‘์—์„œ ํ•˜๋‚˜์˜ ํฌ์ŠคํŒ…์œผ๋กœ ๋งŒ๋“ค๊ธฐ์—๋Š” ๋ถ€๋‹ด์Šค๋Ÿฌ์šด ๋‚ด์šฉ๋“ค์„ ์ด๊ณณ์— ๋ชจ์•„๋‘ก๋‹ˆ๋‹ค. ๋งค์ผ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ๊ธฐ๋กํ•˜๊ธฐ๋ณด๋‹ค๋Š” ์•„๋ฌด๋•Œ๋‚˜ ๋น„์ •๊ธฐ์ ์œผ๋กœ ๋‚ด์šฉ์„ ์—…๋ฐ์ดํŠธ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” AI/ML๊ณผ ๊ด€๋ จ๋œ ๊ธฐ์ˆ ์Šคํƒ ๋‚ด์šฉ์„ ์Œ“๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ์— ์ž‘์„ฑํ•œ ๋‚ด์šฉ๋“ค์ด ํ•˜๋‹จ์— ์œ„์น˜ํ•˜๋„๋ก ๋ฐฐ์—ดํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿงฉ ML library

2021.04.25

ํ…์„œํ”Œ๋กœ์šฐ ๊ณต์‹๋ฌธ์„œ์˜ tf.map_fn ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค. dimension 0์—์„œ unpack๋œ elems์ด๋ผ๋Š” tensor list์˜ ์š”์†Œ๋“ค์„ fn์— mapํ•ฉ๋‹ˆ๋‹ค.

tf.map_fn(fn, elems, dtype=None, parallel_iterations=None, back_prop=True,
    	  swap_memory=False, infer_shape=True, name=None)

MAML์„ ๊ตฌํ˜„ ํ•  ๋•Œ meta-batch์— ๋Œ€ํ•œ cross entropy๋ฅผ ๋ณ‘๋ ฌ์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ xs์˜ shape์€ [meta-batch size, nway*kshot, 84*84*3] ์ž…๋‹ˆ๋‹ค.

cent, acc = tf.map_fn(lambda inputs: self.get_loss_single(inputs, weights),
					 elems=(xs, ys, xq, yq),
				 	 dtype=(tf.float32, tf.float32),
				 	 parallel_iterations=self.metabatch)
๐Ÿงฉ ML library

2021.04.27

๋ชจ๋ธ ๊ทธ๋ž˜ํ”„๋ฅผ ๋นŒ๋“œํ•˜๋Š” ํ•จ์ˆ˜์—์„œ for loop๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋ฉด ์ด๊ฒŒ ๊ทธ๋Œ€๋กœ ๋ชจ๋ธ training ๋‹จ๊ณ„์—์„œ๋„ ๋งค๋ฒˆ for loop๊ฐ€ ์ ์šฉ๋˜์–ด ๋ชจ๋ธ์˜ ํ•™์Šต์ด ๋Š๋ ค์ง€๊ฒ ๊ตฌ๋‚˜๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์—ˆ๋Š”๋ฐ ๊ณฐ๊ณฐํžˆ ์ƒ๊ฐํ•ด๋ณด๋‹ˆ๊นŒ ์•„๋‹ˆ๋”๋ผ๊ตฌ์š”.

๋นŒ๋“œํ•˜๋Š” ๋‹จ๊ณ„์—์„œ๋Š” for loop๊ฐ€ ์—ฌ๋Ÿฌ ๋ฒˆ ๋Œ๋”๋ผ๋„, ๊ทธ๋ž˜ํ”„์˜ ๊ฐ ๋…ธ๋“œ๋“ค์ด ์—ฐ๊ฒฐ๋˜๊ณ  ๋‚œ ๋’ค์—๋Š” ๋นŒ๋“œ ๋œ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ ์ž์ฒด๊ฐ€ ์ค‘์š”ํ•˜์ง€, ๋นŒ๋“œ ๋‹จ๊ณ„์—์„œ์˜ for loop๋Š” ๊ด€๋ จ์ด ์—†๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฝค๋‚˜ ์˜ค๋žซ๋™์•ˆ ์•„๋ฌด๋ ‡์ง€ ์•Š๊ฒŒ ์ฐฉ๊ฐํ•˜๊ณ  ์žˆ์—ˆ์–ด์„œ ์ด ๊ณณ์— ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ map_fn์€ ํŠนํžˆ ์–ด๋–ค ๊ฒฝ์šฐ์— ๋ฉ”๋ฆฌํŠธ๋ฅผ ๊ฐ€์งˆ๊นŒ ๊ถ๊ธˆํ•˜๊ธด ํ•˜๋„ค์š” ๐Ÿง

๐Ÿงฉ ML library

2021.05.02

TensorFlow 1.15๋กœ ์ฝ”๋“œ๋ฅผ ์งœ๋‹ค๊ฐ€ softmax_cross_entropy_with_logits๋Š” loss์— ๋Œ€ํ•œ 2nd-order ๊ณ„์‚ฐ์„ ์ง€์›ํ•˜์ง€๋งŒ sparse_softmax_cross_entropy_with_logits๋Š” loss์— ๋Œ€ํ•œ 2nd-order ๊ณ„์‚ฐ์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š”๊ฑธ ์•Œ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋‘˜์˜ ์ฐจ์ด๋Š” label์ด one-hot ํ˜•ํƒœ๋กœ ์ฃผ์–ด์ง€๋ƒ ์•„๋‹ˆ๋ƒ์˜ ์ฐจ์ด๋ฐ–์— ์—†๋Š”๋ฐ ์ด๋Ÿฐ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋ƒˆ๋‹ค๋Š”๊ฒŒ ์ด์ƒํ•ด์„œ ์ฐพ์•„๋ณด๋‹ค๊ฐ€ tensorflow repository์— ๊ด€๋ จ ์ด์Šˆ๊ฐ€ ์˜ฌ๋ผ์™”๋˜ ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

์š”์•ฝํ•˜์ž๋ฉด ์ผ๋ถ€ indexing ์ž‘์—…์— ๋Œ€ํ•œ ๋„ํ•จ์ˆ˜ ๊ณ„์‚ฐ์ด ์•„์ง ์ œ๋Œ€๋กœ ๊ตฌํ˜„๋˜์ง€ ์•Š์•˜๊ฑฐ๋‚˜, ๋ช‡ ๊ฐ€์ง€ operation์— ๋Œ€ํ•ด์„œ 2์ฐจ ๋ฏธ๋ถ„ ๊ณ„์‚ฐ์ด ๊ฐœ๋ฐœ์ž๋“ค๋„ ์•„์ง ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•œ ์˜ค๋ฅ˜๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค(๊ตฌ์ฒด์ ์ธ ์›์ธ์€ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค). 0.2 ๋ฒ„์ „์—์„œ 1.15 ๊นŒ์ง€ ๊ฐœ๋ฐœ์ด ์ง„ํ–‰๋˜๋ฉด์„œ๋„ TensorFlow ํŒ€์ด ์ง€์†์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์ด ์‹ ๊ธฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿค– ML & DL

2021.05.10

PR-317: MLP-Mixer: An all-MLP Architecture for Vision ์˜์ƒ์„ ํ†ตํ•ด CNN๊ณผ MLP๊ฐ€ ๋ณ„๋กœ ๋‹ค๋ฅด์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค. ์˜์ƒ์—์„œ ์ด์ง„์›๋‹˜์€ CNN weight์ด Fully-Conneted weight๊ณผ ๋‹ค๋ฅธ ์  ๋‘ ๊ฐ€์ง€๊ฐ€ weight sharing๊ณผ locally connected๋ผ๊ณ  ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐํ™”๋œ ์ž๋ฃŒ๋งŒ ๋ด๋„ ์ด๋ ‡๊ฒŒ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ดํ•ด๋˜๋Š” ๋‚ด์šฉ์ธ๋ฐ ์™œ ์ง€๊ธˆ๊นŒ์ง€ ๊นจ๋‹ซ์ง€ ๋ชปํ–ˆ์„๊นŒ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ๊ณ , CNN์— ๋ช‡ ๊ฐœ์˜(์‚ฌ์‹ค์€ ์—„์ฒญ ๋งŽ์€ ์–‘์ด์ง€๋งŒ) weight์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ Fully-Connected์™€ ์™„์ „ํžˆ ๋™์ผํ•œ ๊ตฌ์กฐ๋กœ ๋งŒ๋“ค์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์ดํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿงฉ ML library

2021.05.11

tf.contrib.layers.batch_norm ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ is_traning ์•„๊ทœ๋จผํŠธ ์„ค์ •์— ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Batch normalization์„ ์‚ฌ์šฉํ•  ๋•Œ ํ•™์Šต ์ƒํ™ฉ์ธ์ง€ ํ…Œ์ŠคํŠธ ์ƒํ™ฉ์ธ์ง€์— ๋”ฐ๋ผ์„œ mean๊ณผ variance๋กœ ์‚ฌ์šฉํ•˜๋Š” statistics์˜ ์ถœ์ฒ˜๊ฐ€ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— is_traning๋ฅผ ์ž˜๋ชป ์„ค์ •ํ•œ๋‹ค๋ฉด ์ •ํ™•๋„๋Š” ๋†’๊ฒŒ ๋‚˜์˜ค๋”๋ผ๋„ ๊ทธ ์‹คํ—˜์ด ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

is_training์ด True์ธ ๊ฒฝ์šฐ์—๋Š” movingmean ํ…์„œ์™€ movingvariance ํ…์„œ์— statistics of the moments(๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ)์„ exponential moving average ์‹์— ๋”ฐ๋ผ ์ถ•์ ํ•ฉ๋‹ˆ๋‹ค. BN ๊ณ„์‚ฐ์—๋Š” ๋ฏธ๋‹ˆ๋ฐฐ์น˜์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. is_training์ด False์ธ ๊ฒฝ์šฐ์—๋Š” ๊ทธ๋™์•ˆ ์ถ•์ ํ•˜์˜€๋˜ movingmean ํ…์„œ์™€ movingvariance ํ…์„œ ๊ฐ’์„ ๊ฐ€์ ธ์™€ BN ๊ณ„์‚ฐ์— ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Few-shot learning setting์—์„œ support set๊ณผ query set์— ๋Œ€ํ•ด์„œ ๋‘˜ ๋‹ค is_training์„ True๋กœ ์„ค์ •ํ•˜๋ฉด ์ด๋Š” transductive setting์ด ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰ query๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ support ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ query ๋ถ„ํฌ์˜ ์ •๋ณด๊นŒ์ง€ ์‚ฌ์šฉํ•˜๊ฒ ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. Few-shot learning์—์„œ๋Š” ๋Œ€๋ถ€๋ถ„ transductive setting์ด non-transductive์— ๋น„ํ•ด 3%์ •๋„์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ณธ์ธ์˜ ์‹คํ—˜ ์ƒํ™ฉ์— ์•Œ๋งž๊ฒŒ ์•„๊ทœ๋จผํŠธ ๊ฐ’์„ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

tf.contrib.layers.group_norm ๊ฐ™์€ instance-based normalization ๋ฐฉ์‹์€ ๋ฏธ๋‹ˆ๋ฐฐ์น˜์— ๋Œ€ํ•œ running statistics๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— is_trainable ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๐Ÿค– ML & DL

2021.05.14

Moment1๋Š” ๋ฌผ๋ฆฌํ•™์—์„œ ํŠน์ • ๋ฌผ๋ฆฌ๋Ÿ‰๊ณผ distance์˜ ๊ณฑ์„ ํ†ตํ•ด ๋ฌผ๋ฆฌ๋Ÿ‰์ด ๊ณต๊ฐ„์ƒ ์–ด๋–ป๊ฒŒ ์œ„์น˜ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ Force, Torque, Angular momentum ๋“ฑ์„ ์˜ˆ๋กœ ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Moment of mass์— ๋Œ€ํ•ด์„œ zeroth moment๋Š” total mass, 1st moment๋Š” center of mass, 2nd moment๋Š” moment of inertia๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜ํ•™์—์„œ๋Š” ํ•จ์ˆ˜์˜ ํŠน์ง•์„ ๋‚˜ํƒ€๋‚ด๊ธฐ์œ„ํ•ด moment๋ผ๋Š” ์›Œ๋”ฉ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•จ์ˆ˜๊ฐ€ ํ™•๋ฅ ๋ถ„ํฌ ํ˜•ํƒœ์ธ ๊ฒฝ์šฐ first moment๋Š” ํ™•๋ฅ  ๋ถ„ํฌ์˜ ๊ธฐ๋Œ“๊ฐ’์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ด๋ฅผ moments about zero๋ผ๊ณ ๋„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ second central moment๋กœ๋Š” variance, third standardized moment๋Š” skewness(๋น„๋Œ€์นญ๋„), fourth standardized moment๋Š” kurtosis(์ฒจ๋„, ๋พฐ์กฑํ•œ ์ •๋„) ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿงฉ ML library

2021.09.20

PyTorch ๊ณต์‹ ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ torch Tensor ๊ธฐ๋Šฅ๋“ค์„ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • squeeze: ์ฐจ์›์ด 1์ธ ์ฐจ์›์„ ์ œ๊ฑฐํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋กœ ์˜ต์…˜์„ ์ฃผ์ง€ ์•Š์œผ๋ฉด ์ฐจ์›์ด 1์ธ ๋ชจ๋“  ์ฐจ์›์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.
  • unsqueeze: ํŠน์ • ์œ„์น˜์— 1์ธ ์ฐจ์›์„ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜ํž™๋‹ˆ๋‹ค.
  • view: ํ…์„œ์˜ shape์„ ๋ณ€๊ฒฝํ•ด์ฃผ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
๐Ÿค– ML & DL

2021.11.13

์œ„ํ‚คํ”ผ๋””์•„์˜ Signed Distance Function(SDF)4์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ €, SDF๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

  • If ฮฉ\Omega is a subset of a metric space and bb is the boundary of ฮฉ\Omega the signed distance function ff is defined by
f(x)={d(x,โˆ‚ฮฉ)ifย xโˆˆฮฉโˆ’d(x,โˆ‚ฮฉ)ifย xโˆˆฮฉcf(x)= \begin{cases} d(x, \partial \Omega) & \text{if } x \in \Omega \\ -d(x, \partial \Omega) & \text{if } x \in \Omega^c \end{cases}

SDF๋Š” ์–ด๋–ค boundary๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์–ด๋–ค ์  xx๊ฐ€ boundary ์•ˆ ์ชฝ์— ์œ„์น˜ํ•˜๊ฒŒ ๋˜๋ฉด function ๊ฐ’์€ ์–‘์ˆ˜๋ฅผ ๊ฐ–๊ฒŒ ๋˜๋ฉฐ, ์ด ์ ์ด boundary์™€ ์ ์  ๊ฐ€๊น๊ฒŒ ์ด๋™ํ•  ์ˆ˜๋ก function ๊ฐ’์€ 0์— ๊ฐ€๊นŒ์›Œ ์ง€๋‹ค๊ฐ€, boundary์— ์œ„์น˜ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” 0์ด ๋ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ xx๊ฐ€ boundary ๋ฐ”๊นฅ ์ชฝ์— ์œ„์น˜ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” function ๊ฐ’์ด ์Œ์ˆ˜๋ฅผ ๊ฐ–์Šต๋‹ˆ๋‹ค.

์œ„์—์„œ๋Š” SDF ํ•จ์ˆ˜์˜ ์‹์— ๋Œ€ํ•ด์„œ boundary ์•ˆ ์ชฝ์ธ ๊ฒฝ์šฐ์— ์–‘์ˆ˜๋ผ๊ณ  ํ‘œ๊ธฐํ•˜์˜€์ง€๋งŒ boundary ์•ˆ ์ชฝ์„ ์Œ์ˆ˜๋กœ ๋‘์–ด ๋ฐ˜๋Œ€๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์‚ฌ์ง„์€ DeepSDF5๋ผ๋Š” ๋…ผ๋ฌธ์—์„œ ๊ฐ€์ ธ์˜จ SDF์˜ ์˜ˆ์‹œ์ด๋ฉฐ ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” boundary ์•ˆ ์ชฝ์„ ์Œ์ˆ˜๋กœ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

img

๊ณผ๊ฑฐ์˜ surface ์ถ”์ •์ด๋‚˜ 3D reconstruction ๊ฐ™์€ task์—์„œ๋Š” ์ฃผ๋กœ voxel, point, mesh๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ ‘๊ทผํ–ˆ๋‹ค๋ฉด, ์ตœ๊ทผ์—๋Š” SDF ์‚ฌ์šฉํ•˜๋ ค๋Š” ์‹œ๋„๊ฐ€ ๋Š˜์–ด๋‚˜๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํŠนํžˆ Implicit Neural Representation ์—ฐ๊ตฌ์™€ SDF๋ฅผ ๊ฒฐํ•ฉํ•œ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋“ค์ด ํฅ๋ฏธ๋กœ์›Œ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

Implicit Neural Representation์€ ์ด๋ฏธ์ง€๋‚˜ 3D ๋ฐ์ดํ„ฐ๋ฅผ pixel, voxel ๋‹จ์œ„์˜ matrix ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, (x, y) ๊ฐ’์„ ๋ฐ›์•˜์„ ๋•Œ (r, g, b) ๊ฐ’์„ ์ถœ๋ ฅํ•˜๋Š” ์–ด๋–ค ํ•จ์ˆ˜ ํ•˜๋‚˜๋กœ์จ ํ‘œํ˜„ํ•˜๋ ค๋Š” ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค(ํ•จ์ˆ˜ 1๊ฐœ๋Š” ๋ฐ์ดํ„ฐ 1๊ฐœ๋ฅผ ์˜๋ฏธํ•˜๊ณ , ๋”ฐ๋ผ์„œ ํ•™์Šต ์ž…๋ ฅ 1๊ฐœ๋Š” ํ”ฝ์…€ ๊ฐ’ 1๊ฐœ๋กœ ์ฃผ์–ด์ง€๊ฒŒ ๋  ๋“ฏ ํ•ฉ๋‹ˆ๋‹ค). ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ์†์ ์ธ ํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ super resolution์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋Š”๋ฐ, ์ตœ๊ทผ์— ์ด ๋ฐฉ์‹๊ณผ SDF๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… output์„ ๋งค์šฐ ๋งค๋„๋Ÿฝ๊ฒŒ ๋งŒ๋“ค์–ด๋‚ด๊ณ ์ž ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ์ง„ํ–‰๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿค– ML & DL

2021.12.02

์ง€๊ธˆ๊นŒ์ง€๋Š” ์•„๋ฌด ์ƒ๊ฐ ์—†์ด continuous distribution์—์„œ๋„ single point์— ํŠน์ • ํ™•๋ฅ ์ด ์กด์žฌํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด N(0,1)\mathcal N (0, 1)์— ๋Œ€ํ•ด์„œ point x=1x=1์ด ๊ด€์ธก๋  ํ™•๋ฅ ์ด ํŠน์ • ๊ฐ’์œผ๋กœ ์กด์žฌํ•œ๋‹ค๊ณ  ์ž˜๋ชป ์ƒ๊ฐํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ณณ6์„ ์ฐธ๊ณ ํ•˜๋‹ˆ continuous probability function์€ continuous interval์˜ ๋ฌดํ•œ points์— ๋Œ€ํ•ด ์ •์˜๋˜๊ธฐ ๋•Œ๋ฌธ์— single point์˜ ํ™•๋ฅ ์€ ์–ธ์ œ๋‚˜ 0์ด๋ฉฐ, ๋”ฐ๋ผ์„œ continuous probability function์—์„œ ํ™•๋ฅ ์€ ํŠน์ • interval์— ๋Œ€ํ•ด์„œ ์ธก์ •ํ•˜๊ณ  single point์— ๋Œ€ํ•ด์„  ์ธก์ •ํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์–ด์ฐŒ๋ณด๋ฉด ๊ฐ„๋‹จํ•œ ๊ฒƒ์ด์—ˆ์ง€๋งŒ ์ž์„ธํžˆ ์ƒ๊ฐํ•ด๋ณด์ง€๋Š” ์•Š์•„์„œ ํ—ท๊ฐˆ๋ ธ๋˜ ๋“ฏ ํ•ฉ๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ, ๊ทธ๋Ÿฌ๋ฉด ์–ด๋–ป๊ฒŒ 0์ด ๋ชจ์—ฌ 1์ด ๋˜๋Š” ๊ฒƒ ์ธ์ง€๊นŒ์ง€ ๊ถ๊ธˆํ•ด์ง€๋ฉด์„œ ์ˆ˜ํ•™์„ ๋‹น์žฅ ๊ทผ๋ณธ๋ถ€ํ„ฐ ๋‹ค์‹œ ๊ณต๋ถ€ํ•ด์•ผํ•˜๋‚˜ ์‹ถ์—ˆ์ง€๋งŒ, ์‹œ๊ฐ„์€ ํ•œ์ •๋˜์–ด ์žˆ๊ณ  ํ•  ์ผ์€ ๋งŽ์œผ๋‹ˆ ๊ธธ๊ฒŒ ๋ณด๊ณ  ์ฒœ์ฒœํžˆ ๊ณต๋ถ€ํ•˜์ž๋Š” ๊ฒฐ๋ก ์œผ๋กœ ๋Œ์•„์™”์Šต๋‹ˆ๋‹ค ๐Ÿฅฒ

๐Ÿงฉ ML library

2021.12.08

PyTorch์— ํŠน์ • weight๋งŒ freezeํ•˜๋Š” ๊ธฐ๋Šฅ์ด ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

Layer ๋‹จ์œ„๋กœ freezing ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” required_grad=False๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ตฌํ˜„ํ–ˆ์—ˆ๋Š”๋ฐ, layer ๋‚ด ํŠน์ • weight๋งŒ ๊ณจ๋ผ์„œ freezeํ•˜๋Š” ๊ธฐ๋Šฅ์€ ๋”ฐ๋กœ ๋ณธ ์ ์ด ์—†๋Š” ๊ฒƒ ๊ฐ™์•„ ์ฐพ์•„๋ณด๋‹ค๊ฐ€ ํ•ด๋‹น ๋งํฌ๋ฅผ ์ฝ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ž‘์„ฑ์ž ๋ถ„์ด ์„ค๋ช…ํ•˜๊ธฐ๋กœ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€ ์ž„์‹œ๋ฐฉํŽธ์ด ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  • .step()๋ฅผ ํ˜ธ์ถœํ•˜๊ธฐ ์ „์— freeze ํ•˜๊ณ ์žํ•˜๋Š” weight์— ๋Œ€ํ•ด์„œ grad=0 ํ• ๋‹น. ๋‹ค๋งŒ momentum, weight decay๋ฅผ ์‚ฌ์šฉํ•˜๋Š” optimizer์˜ ๊ฒฝ์šฐ์—” grad=0์ด๋”๋ผ๋„ .step() ํ˜ธ์ถœ ์‹œ weight์„ ๋ณ€ํ˜•ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์›ํ•˜๋Š”๋Œ€๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ
  • Freezeํ•˜๊ณ  ์‹ถ์€ weight์„ ๋ฏธ๋ฆฌ copy ํ•ด๋‘๊ณ  .step() ์„ ํ˜ธ์ถœํ•˜์—ฌ weight์„ ์—…๋ฐ์ดํŠธํ•œ ๋’ค์—, ๋ณต์‚ฌํ–ˆ๋˜ weight์„ ์—…๋ฐ์ดํŠธ๋œ weight์— ๋ฎ์–ด์”Œ์šฐ๊ธฐ
๐Ÿค– ML & DL

2022.01.15

๋งํฌ7๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ triplet loss ๊ด€๋ จ ์šฉ์–ด๋ฅผ ์ˆ™์ง€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • Easy triplets: d(a,p)+margin<d(a,n)d(a, p) + \text{margin} < d(a, n)
  • Hard triplets: d(a,n)<d(a,p)d(a,n) < d(a, p)
  • Semi-hard triplets: d(a,p)<d(a,n)<d(a,p)+margind(a, p) < d(a, n) < d(a,p) + \text{margin}
๐Ÿงฉ ML library

2022.02.28

Random seed๋ฅผ ๊ณ ์ •ํ•  ๋•Œ ๊ฐ€์žฅ ๋จผ์ € ๊ณ ๋ คํ•˜๋ฉด ์ข‹์„ ๊ฒƒ๋“ค์„ ๊ธฐ๋กํ•˜์˜€์Šต๋‹ˆ๋‹ค.

random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
๐Ÿค– ML & DL

2022.04.10

์—ฐ๊ตฌ๋ฅผ ํ•˜๋ฉฐ, ๋ชจ๋ธ ํ•™์Šต์˜ ์•ˆ์ •์„ฑ์— ์žˆ์–ด์„œ residual connection์ด ์œ ์šฉํ•˜๋‹ค๋Š” ๊ฒฝํ—˜์ ์ธ ํŒ์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ResNet๊ณผ ๊ฐ™์ด ๋ชจ๋ธ ๊ตฌ์กฐ์—์„œ residual connection์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์–ด๋–ค ๊ฐ’์„ ์กฐ์‹ฌ์Šค๋Ÿฝ๊ฒŒ ๋ฐ”๊พธ๊ณ  ์‹ถ์„ ๋•Œ residual connection์„ ๊ฐ€์ง„ ๊ตฌ์กฐ๊ฐ€ ๋น„๊ต์  ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด GNN์„ ํ†ตํ•ด embedding vector๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ณ  ์‹ถ์„ ๋•Œ Vt+1=G(Vt)V_{t+1} = G(V_t)์˜ ํ˜•ํƒœ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค Vt+1=Vt+G(Vt)V_{t+1} = V_t + G(V_t)์˜ ํ˜•ํƒœ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์œผ๋ฉฐ, ํ˜„์žฌ ์‹คํ—˜ ์ค‘์ธ ๊ฒƒ ์ค‘์—์„œ๋Š” few-shot์œผ๋กœ distribution์˜ mean์„ ์ž˜ ์ถ”์ •ํ•ด๋ณด๋ ค๋Š” ๋‚ด์šฉ์ด ์žˆ๋Š”๋ฐ, ์ด ๊ฒฝ์šฐ์—๋„ ฮผ^=fฮธ(few-shot)\hat \mu = f_\theta(\text{few-shot}) ๋ณด๋‹ค๋Š” ฮผ^=meanย ofย few-shot+fฮธ(few-shot)\hat \mu = \text{mean of few-shot} + f_\theta(\text{few-shot}) ํ˜•ํƒœ์—์„œ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

์•„๋ฌด๋ž˜๋„ ์ผ๋ฐ˜์ ์œผ๋กœ parameter๊ฐ€ 0์— ๊ฐ€๊นŒ์šด ๊ฐ€์šฐ์‹œ์•ˆ์œผ๋กœ ์ดˆ๊ธฐํ™”๋˜๊ธฐ ๋•Œ๋ฌธ์—, residual connection์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์— ์ดˆ๊ธฐ loss๊ฐ€ ๋” ์ž‘์•„์ ธ ๋น„๊ต์  ํ•™์Šต์ด ์•ˆ์ •์ ์ธ ๊ฒƒ์ด ์•„๋‹๊นŒ ์‹ถ์Šต๋‹ˆ๋‹ค. (์ •๋ง๋กœ ๊ทธ๋Ÿฐ ๊ฒƒ์ธ์ง€ ์ฐพ์•„๋ณด๊ณ  ๋‚ด์šฉ ์ถ”๊ฐ€ํ•˜๊ธฐ)

๐Ÿค– ML & DL

2022.05.16

Mooreโ€“Penrose inverse(=Pseudo inverse)8์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • Ax=bA\mathrm x =\mathrm b์˜ ํ˜•ํƒœ์˜ linear system์„ ํ’€ ๋•Œ, AA๊ฐ€ ์ •๋ฐฉ ํ–‰๋ ฌ์ด ์•„๋‹ˆ๋ผ๋ฉด ์•„๋ž˜์˜ ๋‘ ๊ฐ€์ง€ ์ƒํ™ฉ์ด ์กด์žฌ.
  • Underdetemined (n < m): ๊ฐ€๋กœ๋กœ ๊ธด A. Infinitely many solution given b\mathrm b in general
  • Overdetermined (n > m): ์„ธ๋กœ๋กœ ๊ธด A. Zero solution for given b\mathrm b in general
  • AA์— ๋Œ€ํ•ด์„œ singular value decomposition์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์ „๊ฐœ๊ฐ€ ๊ฐ€๋Šฅํ•จ
Ax=bUฮฃVโŠคx=bVฮฃโˆ’1UโŠคUฮฃVโŠคx=Vฮฃโˆ’1UโŠคbx~=Vฮฃโˆ’1UโŠคb:=A+bA \mathrm x = b \\ U \Sigma V^\top \mathrm x =\mathrm b \\ V \Sigma ^{-1} U^\top U \Sigma V^\top \mathrm x =V \Sigma ^{-1} U^\top \mathrm b \\ \tilde {\mathrm x} = V \Sigma ^{-1} U^\top \mathrm b := A^+ \mathrm b
  • ์—ฌ๊ธฐ์„œ A+=Vฮฃ+UโŠคA^+ = V \Sigma ^+ U^\top๋ฅผ A์˜ pseudo inverse๋ผ ํ•จ
  • ฮฃ=diagn,m(ฮป1,โ‹ฏโ€‰,ฮปminโก{n,m})\Sigma = \text{diag}_{n,m}(\lambda_1, \cdots, \lambda_{\min\{ n, m \}})์ผ ๋•Œ, ฮฃ+=diagm,n(ฮป1+,โ‹ฏโ€‰,ฮปminโก{n,m}+)\Sigma^+ = \text{diag}_{m,n}(\lambda_1^+, \cdots, \lambda^+_{\min\{ n, m \}}) where ฮป+={ฮปโˆ’1,ฮปโ‰ 00,ฮป=0\lambda^+= \begin{cases} \lambda^{-1},& \lambda \neq 0 \\ 0, & \lambda = 0 \end{cases}

Mooreโ€“Penrose inverse๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์„ ํ˜•๋Œ€์ˆ˜ํ•™์˜ ๋งŽ์€ ๋ถ€๋ถ„์„ ์‰ฝ๊ฒŒ ์„œ์ˆ  ๋ฐ ์ฆ๋ช… ๊ฐ€๋Šฅํ•จ

  1. Underdetemined(ํ•ด๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ ์กด์žฌ)์—์„œ A+bA^+ \mathrm b๋Š” ์œ ํด๋ฆฌ๋“œ ๋…ธ๋ฆ„ โˆฃโˆฃx~โˆฃโˆฃ2||\tilde {\mathrm x} ||_2์„ ์ตœ์†Œํ™”ํ•˜๋Š” ํ•ด์ž„
  2. Overdetermined์—์„œ โˆฃโˆฃAx~โˆ’bโˆฃโˆฃ2=โˆฃโˆฃAA+bโˆ’bโˆฃโˆฃ2||A \tilde {\mathrm x} - \mathrm b||_2 = ||A A^+ \mathrm b - \mathrm b||_2๋Š” ์ตœ์†Œ์ œ๊ณฑ๋ฒ•์˜ ์ตœ์ ํ•ด์ž„
๐Ÿค– ML & DL

2022.05.27

Linear combination์— ๋Œ€ํ•ด์„œ ๊ณ„์ˆ˜๊ฐ€ ์–‘์ˆ˜์ด๊ณ  ๊ณ„์ˆ˜์˜ ํ•ฉ์ด 1์ธ ๊ฒฝ์šฐ, ์ด๋ฅผ convex combination์ด๋ผ๊ณ  ํ•จ

Convex set์˜ ์ •์˜์™€ ์—ฐ๊ด€์ง€์–ด ๋ณด๋ฉด, ์–ด๋–ค ์ง‘ํ•ฉ C์— ์†ํ•˜๋Š” ์ž„์˜์˜ ์ ๋“ค์˜ convex combination์ด C์— ์†ํ•˜๋ฉด ๊ทธ ์ง‘ํ•ฉ์€ convex set์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ convex set C์— ์†ํ•˜๋Š” ์ ๋“ค์˜ convex combination์€ ํ•ญ์ƒ C์— ์†ํ•จ.

๐Ÿค– ML & DL

2022.05.28

๋‹ค์–‘ํ•œ Data Augmentation ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€ํ•ด์„œ ์ด๊ณณ์— ์ •๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿค– ML & DL

2022.06.29

Upper bound, Lower bound, Supremum, Infimum์— ๋Œ€ํ•œ ์ˆ˜ํ•™์  ์ •์˜๋ฅผ ์ด๊ณณ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • Upper bound (์ƒ๊ณ„): ์–ด๋–ค ์‹ค์ˆ˜ ฮฒ\beta๊ฐ€ ์žˆ์„ ๋•Œ, EE์˜ ๋ชจ๋“  ์›์†Œ xx์— ๋Œ€ํ•ด์„œ x<ฮฒx < \beta๋ฅผ ๋งŒ์กฑํ•  ๋•Œ, ฮฒ\beta๋ฅผ EE์˜ upper bound๋ผ๊ณ  ํ•จ. ์ด ๋•Œ EE๋Š” bounded above๋ผ๊ณ  ํ•จ. (Lower bound๋„ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์ •์˜ ๋จ)
  • Supremum, Least upper bound (์ƒํ•œ): ฮฑ=supโกE\alpha = \sup E ์ด๋ ค๋ฉด, ฮฑ\alpha๊ฐ€ EE์˜ upper bound์ด๋ฉฐ, ฮณ<ฮฑ\gamma < \alpha์ธ ๋ชจ๋“  ฮณ\gamma๊ฐ€ EE์˜ upper bound๊ฐ€ ์•„๋‹ˆ์–ด์•ผ ํ•จ. ์ฆ‰, upper bound ์ค‘ least๊ฐ€ supermum์ž„
  • Infimum, Greatest lower bound (ํ•˜ํ•œ): ฮฑ=infโกE\alpha = \inf E ์ด๋ ค๋ฉด, ฮฑ\alpha๊ฐ€ EE์˜ lower bound์ด๋ฉฐ, ฮฒ>ฮฑ\beta > \alpha์ธ ๋ชจ๋“  ฮฒ\beta๊ฐ€ EE์˜ lower bound๊ฐ€ ์•„๋‹ˆ์–ด์•ผ ํ•จ. ์ฆ‰, lower bound ์ค‘ greatest๊ฐ€ infimum์ž„
๐Ÿค– ML & DL

2022.10.06

10์›” 6์ผ์— ์ง„ํ–‰๋œ AI workshop ๋‚ด์šฉ์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, Federated Learning๊ณผ ๊ด€๋ จ๋œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

  1. Federated Learning (FL)

    • Central server์— client์˜ data๋ฅผ ์—…๋กœ๋“œํ•  ์ˆ˜ ์—†๋Š” ์ƒํ™ฉ์— ์–ด๋–ป๊ฒŒ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์„์ง€?
    • Client์—์„œ ๊ฐ์ž ์—…๋ฐ์ดํŠธ๋œ '๋ชจ๋ธ'์„ ์„œ๋ฒ„๋กœ ์˜ฌ๋ฆฌ๊ณ , ํ‰๊ท ์„ ์ทจํ•ด์„œ ๋‹ค์‹œ client์—๊ฒŒ ๋ฟŒ๋ฆฌ๋Š” ๋ฐฉ์‹์ด ์ œ์ผ ์ผ๋ฐ˜์  (FedAvg)
    • ํ•˜์ง€๋งŒ ์ด๋Ÿฐ ๋ฐฉ์‹์€ non-IID setting(heterogeneous)์—์„œ ๋งค์šฐ ํฌ๊ฒŒ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๊ฒŒ ๋จ: PFL ์—ฐ๊ตฌ์˜ ๋ฐฐ๊ฒฝ
  2. Personalized Federated Learning (PFL): Client specific weights์ด ๋„์ž…๋จ
  3. PFL via Meta-learning: PFL์˜ ์ปจ์…‰๊ณผ Meta-learning(MAML)์˜ ์ปจ์…‰์ด ๋งค์šฐ ์œ ์‚ฌํ•˜๋‹ค๋Š” ์ ์—์„œ ๊ณ ์•ˆ๋จ

Imitation learning ๊ด€๋ จ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

  1. Reinforcement Learning (RL)

    • Purpose: Find an optimal policy ฯ€โˆ—\pi* that miximize VV
    • Require domain knowledge for real-world application
    • ๋“œ๋ก ์„ ์˜ˆ๋กœ ๋“ค๋ฉด, ์‹ค์ œ ๋“œ๋ก ์€ ๋งค์šฐ ์‰ฝ๊ฒŒ ๋ถ€์ˆด์ง€๋ฏ€๋กœ Sim2Real learning์„ ๊ณ ๋ คํ•ด์•ผ ํ•˜๊ณ , ๋“œ๋ก  physics์— ๋งŽ์€ perturbation์ด ์กด์žฌํ•˜๋ฏ€๋กœ Robust learning๋„ ๊ณ ๋ คํ•ด์•ผ ํ•จ
  2. Imitation Learning (IL)

    • Behavior cloning (BC), Inverse RL (IRL), IRL + RL ๋“ฑ์˜ ๋ฐฉ๋ฒ•์ด ์กด์žฌ
    • BC๋Š” ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๊ณ  compounding error์— ์ทจ์•ฝํ•˜๋ฏ€๋กœ, ์ด๋Ÿฐ ์ ์—์„œ๋Š” IRL์ด ์žฅ์ ์„ ๊ฐ€์ง
  3. Generative Adversariel Imitation Learning (GAIL)

    • Real data๋กœ๋Š” expert actions๋ฅผ ์ œ๊ณตํ•˜๊ณ , Fake data๋กœ๋Š” policy actions๋ฅผ ์ œ๊ณตํ•˜์—ฌ expert์˜ policy๋ฅผ ํ‰๋‚ด๋‚ด๋„๋ก ํ•™์Šด
    • Limitation: Real envrionment danger์™€ environment perturbation์— ๋Œ€ํ•ด์„œ๋Š” ์ž˜ ๋ชจ๋ธ๋งํ•˜์ง€ ์•Š์Œ. ๋”ฐ๋ผ์„œ domain-adpative IL์ด ํ•„์š”
  4. Simulation-based Learning: Domain Adaptive IL

    • Simulation(source) env.์—์„œ information์„ ๋ฝ‘์•„, target env์˜ policy์— ๋„์›€์„ ์ฃผ๋„๋ก, information extraction ๊ณผ์ •์ด ์ค‘์š”
๐Ÿค– ML & DL

2022.10.06

๋ ˆ๋”ง์„ ์ฝ๋‹ค๊ฐ€ "ํ•™์Šต์ด ๋„ˆ๋ฌด ์˜ค๋ž˜๊ฑธ๋ฆฌ๋Š” ๊ฒฝ์šฐ์—” ํ•˜์ดํผํŒŒ๋ฆฌ๋ฏธํ„ฐ ํŠœ๋‹์„ ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ•˜๋Š”๊ฐ€?"์— ๋Œ€ํ•œ ๊ธ€์ด ์žˆ์–ด, ๊ธ€์— ๋‹ฌ๋ฆฐ ์ฝ”๋ฉ˜ํŠธ์™€ ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ๋“ค์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

  • ๋ชจ๋ธ ์Šค์ผ€์ผ์„ ์ค„์ธ ์ƒํƒœ๋กœ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ง„ํ–‰ํ•˜๊ฑฐ๋‚˜, ๋ฐ์ดํ„ฐ์…‹์„ ์ผ๋ถ€๋งŒ ์‚ฌ์šฉํ•œ ํ•™์Šต์„ ํ†ตํ•ด ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ง„ํ–‰
  • e.g., ResNet152๋ผ๊ณ  ํ•œ๋‹ค๋ฉด ResNet18 ๊ฐ™์ด ์ž‘์€ ๋ชจ๋ธ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ImageNet์ด๋ผ๊ณ  ํ•œ๋‹ค๋ฉด 100๊ฐœ class๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต ์ˆ˜ํ–‰
  • ์ด ๋ฐฉ๋ฒ•์€ ๋‹น์—ฐํžˆ sub-optimal์ด๊ธด ํ•˜๊ฒ ์ง€๋งŒ ํ•™์Šต์ด ๋„ˆ๋ฌด ์˜ค๋ž˜๊ฑธ๋ฆฌ๋Š” ๊ฒฝ์šฐ์— ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•ด ๋ณผ๋งŒ ํ•œ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Œ
  • ์‚ฌ์‹ค ์ œ์ผ ์ข‹์€ ๊ฒƒ์€ GPU ์ž์›์„ ๋ณ‘๋ ฌ๋กœ ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ์—”์ง€๋‹ˆ์–ด๋ง์„ ๊ฑฐ์นœ ํ›„์— ํ•™์Šตํ•˜๋Š” ๊ฒƒ. ์™œ๋ƒ๋ฉด big model๊ณผ small model ์‚ฌ์ด์— ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋”ฐ๋ฅธ ๋ชจ๋ธ์˜ ๋™์ž‘์— ๋ถ„๋ช…ํžˆ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—, ์›๋ž˜ ์Šค์ผ€์ผ๋Œ€๋กœ ์‹คํ—˜ํ•˜๋Š”๊ฒŒ ์ œ์ผ ์ข‹์Œ
๐Ÿค– ML & DL

2022.10.14

ML ๋ถ„์•ผ์—์„œ์˜ "Grokking"์ด๋ผ๋Š” ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

  • Overparameterized๋œ ๋‰ด๋Ÿด๋„ท ๋ชจ๋ธ์ด, small training dataset์— ๋Œ€ํ•ด์„œ overfit ๋˜์–ด ์žˆ๋‹ค๊ฐ€, ๋งค์šฐ ๋งŽ์€ ์‹œ๊ฐ„(optimization step)์ด ์ง€๋‚œ ํ›„์— ์–ด๋Š ์ง€์ ์— ๊ฐ‘์ž๊ธฐ ์ข‹์€ generalization ์„ฑ๋Šฅ(validation loss ๊ฐ์†Œ)์„ ๋‹ฌ์„ฑํ•˜๋Š” ํ˜„์ƒ
  • OpenAI์˜ "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" ๋…ผ๋ฌธ์—์„œ ๋ช…๋ช…
๐Ÿค– ML & DL

2022.10.21

  • The stability-plasticity dilemma: ์ƒˆ๋กœ์šด ์ง€์‹์„ ์–ป๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์˜ ์˜๊ตฌ ๋ณ€ํ˜•์ด ์š”๊ตฌ๋˜๋ฉด์„œ๋„, ๋™์‹œ์— ๊ธฐ์กด์˜ ์ง€์‹์„ ์žŠ์–ด๋ฒ„๋ฆฌ์ง€๋„ ์•Š์•„์•ผ ํ•œ๋‹ค๋Š” ์ 
  • Learning in a parallel and distributed system requires plasticity for the integration of new knowledge but also stability in order to prevent the forgetting of previous knowledge.10
๐Ÿค– ML & DL

2022.12.03

Noisy label์ด๋ž€ ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”๊ฐ€?

  • ๋ฐ์ดํ„ฐ ์…‹ ๋‚ด์— ๋ฐ์ดํ„ฐ์˜ labeling์ด ์ž˜ ๋ชป ๋˜์–ด์žˆ๋Š” ๊ฒฝ์šฐ๋ฅผ noisy label ํ˜น์€ labeling noise๋ผ๊ณ  ํ•จ. Large scale dataset์— ๋Œ€ํ•ด์„œ๋Š” label์„ ํ™•์ธํ•˜๋Š” ๊ณผ์ •์ด ํž˜๋“ค๋‹ค ๋ณด๋‹ˆ๊นŒ(๋ˆ๊ณผ ์‹œ๊ฐ„์ด ๋งŽ์ด ์†Œ์š”), ์ด๋Ÿฌํ•œ noisy label์ด ์ถฉ๋ถ„ํžˆ ์กด์žฌํ•  ์ˆ˜ ์žˆ์Œ
  • ์ข…์ข… semi-supervised learning ๋ถ„์•ผ์—์„œ๋„ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์ด ๋•Œ๋Š” pseudo label ๊ธฐ๋ฐ˜์˜ self-training model์ด unlabeled dataset์— ์ž˜๋ชป pseudo labeling ํ•œ ๊ฒƒ์„ noisy label์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๋“ฏ ํ•จ

Ad-hoc์ด๋ž€ ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”๊ฐ€?

  • ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” '์˜ค๋กœ์ง€ ํŠน์ • ํ•˜๋‚˜์˜ ๋ชฉ์ ๋งŒ์„ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๋ฐฉ๋ฒ•' ์ •๋„๋กœ ํ•ด์„ํ•ด๋ณผ ์ˆ˜ ์žˆ์Œ
๐Ÿค– ML & DL

2023.01.01

Anomaly detection ๊ด€๋ จ ์šฉ์–ด์ •๋ฆฌ, ChatGPT๋ฅผ ํ™œ์šฉํ•ด๋ณด์•˜์Œ.

  • Target(positive) class๊ฐ€ ๊ฐ•์•„์ง€๋ผ๊ณ  ๊ฐ€์ •ํ•  ๋•Œ, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ๋“ค

    1. ๊ฐ•์•„์ง€์ด์ง€๋งŒ, ์ด์ „์— ๋ณธ ์  ์—†๋Š” ์ƒˆ๋กœ์šด ์ข…์˜ ๊ฐ•์•„์ง€๋ฅผ ๋ฐœ๊ฒฌํ•œ ๊ฒฝ์šฐ
    2. ๊ณ ์–‘์ด ๋ฐ์ดํ„ฐ ๋“ฑ๊ณผ ๊ฐ™์ด ์•„์˜ˆ ์ƒˆ๋กœ์šด ํด๋ž˜์Šค๋ฅผ ๋ฐœ๊ฒฌํ•œ ๊ฒฝ์šฐ
    3. ๊ฐ•์•„์ง€ ๋ฐ์ดํ„ฐ์ด์ง€๋งŒ ์†์ƒ๋œ/์˜ค์—ผ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐœ๊ฒฌํ•œ ๊ฒฝ์šฐ
  • Novelty detection: Unseen data point๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒฝ์šฐ๋‚˜, ์ƒˆ๋กœ์šด ํŠธ๋ Œ๋“œ๋‚˜ ๊ฒฝํ–ฅ์„ฑ์„ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒฝ์šฐ์— ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์šฉ์–ด
  • Outlier detection: ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ ๋งค์šฐ ์ฐจ์ด๋‚˜๋Š” data point๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒฝ์šฐ๋‚˜, ์ œ๊ฑฐํ•ด์•ผ ํ•  ์˜ค์—ผ๋˜๊ฑฐ๋‚˜ ์†์ƒ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒฝ์šฐ์— ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์šฉ์–ด
  • Anomaly detection: Novelty detection๊ณผ Outlier detection์˜ ๊ฒฝ์šฐ๋ฅผ ๋ชจ๋‘ ํฌํ•จํ•˜๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋„“์€ ๋ฒ”์œ„์˜ ์šฉ์–ด
  • ๋‹ค๋งŒ ์œ„์˜ ์„ธ ๊ฐ€์ง€ ์šฉ์–ด๋“ค์ด ๋งค์šฐ ์ž์ฃผ ํ˜ผ์šฉ๋˜๋ฏ€๋กœ, ๋…ผ๋ฌธ์ด๋‚˜ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์œ ๋™์ ์œผ๋กœ ์ดํ•ดํ•ด์•ผ ํ•จ
๐Ÿค– ML & DL

2023.01.11

Object detection๊ณผ ๊ด€๋ จ๋œ ์šฉ์–ด๋ฅผ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ € ๋ฌธ์ œ ์ƒํ™ฉ๋“ค์„ ๋‚˜์—ดํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • Localization: Single object, ํ•ด๋‹น object๊ฐ€ ์‚ฌ์ง„ ๋‚ด์—์„œ ์–ด๋Š ์œ„์น˜์— ์กด์žฌํ•˜๋Š”์ง€ bounding box ์„ค์ •
  • Object detection: Multiple object, ์—ฌ๋Ÿฌ objects๊ฐ€ ์‚ฌ์ง„ ๋‚ด์—์„œ ์–ด๋Š ์œ„์น˜์— ์กด์žฌํ•˜๋Š”์ง€ bounding box ์„ค์ •ํ•˜๊ณ  ๊ฐ๊ฐ์˜ class ์ •๋ณด๊นŒ์ง€ ๋ถ€์—ฌ
  • Segmentation: Multiple object, ์—ฌ๋Ÿฌ objects๊ฐ€ ์‚ฌ์ง„ ๋‚ด์—์„œ ์–ด๋Š ์œ„์น˜์— ์กด์žฌํ•˜๋Š”์ง€๋ฅผ 'ํ”ฝ์…€ ๋‹จ์œ„๋กœ' class ์ •๋ณด ๋ถ€์—ฌ
  • 2-Stage ๋ฐฉ์‹: ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฒƒ ๊ฐ™์€ ์œ„์น˜๋ฅผ ์ œ์•ˆํ•œ ๋‹ค์Œ์—(Region proposal, localization), ํ•ด๋‹น ์œ„์น˜ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ feature๋ฅผ ์ถ”์ถœํ•˜๊ณ  class ๋ถ€์—ฌ
  • 1-Stage ๋ฐฉ์‹: Localization๊ณผ classification์„ ํ•œ ๋ฒˆ์— ์ˆ˜ํ–‰. 2-Stage์— ๋น„ํ•ด ์„ฑ๋Šฅ์€ ๋‚ฎ์ง€๋งŒ ์†๋„๋Š” ๋น ๋ฆ„
  • Region proposal ๋ฐฉ์‹

    1. Sliding window: Window๋ฅผ ์Šฌ๋ผ์ด๋”ฉํ•˜๋ฉฐ window ๋‚ด์— object๊ฐ€ ์กด์žฌํ•˜๋Š”์ง€ ํ™•์ธ
    2. Selective search: ์ธ์ ‘ํ•œ ์˜์—ญ๋ผ๋ฆฌ ์œ ์‚ฌ์„ฑ์„ ์ธก์ •ํ•ด ํฐ ์˜์—ญ์œผ๋กœ ์ฐจ๋ก€๋Œ€๋กœ ํ†ตํ•ฉ
  • NMS: ์—ฌ๋Ÿฌ bounding box๊ฐ€ ๊ฐ™์€ class๋กœ ๊ฒน์ณ์žˆ๋‹ค๋ฉด, ํ•˜๋‚˜์˜ class๋กœ ํ†ตํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • RoI = Region of Intereset = Region proposal

2-Stage detector๋ฅผ ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • R-CNN: Selective search๋ฅผ ํ†ตํ•ด 2000๊ฐœ ์ •๋„์˜ region proposal ์ฐพ์Œ. ๊ฐ๊ฐ์˜ crop ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋‘ CNN์— ์ž…๋ ฅํ•œ ๋’ค์—, feature vector ์ถ”์ถœ. ๋งˆ์ง€๋ง‰์œผ๋กœ๋Š” Regressor๋ฅผ ํ†ตํ•ด bounding box๋ฅผ ์„ค์ •ํ•˜๊ณ , SVM์„ ํ†ตํ•ด classification
  • Fast R-CNN: Selective search๋ฅผ ํ†ตํ•ด 2000๊ฐœ ์ •๋„์˜ region proposal ์ฐพ์Œ
  • Faster R-CNN: ์ด์ „๊นŒ์ง€๋Š” CPU ๊ธฐ๋ฐ˜์˜ selective search ์˜€๋‹ค๋ฉด, ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ GPU ๊ธฐ๋ฐ˜์˜ Region Proposal Network(RPN)์„ ์ œ์•ˆํ•˜์—ฌ ์†๋„ ํ–ฅ์ƒ. ๊ทธ ์™ธ์—๋Š” Fast R-CNN์™€ ๋™์ผ

1-Stage detector๋ฅผ ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • YOLO: ์ด๋ฏธ์ง€๋ฅผ NxN ๊ทธ๋ฆฌ๋“œ๋กœ ๋ถ„ํ• ํ•˜์—ฌ ์˜ˆ์ธก ํ…์„œ(Prediction tensor) ์ƒ์„ฑ
  • SSD: ํ…Œ๋‘๋ฆฌ ์ƒ์ž ์กฐ์ •์„ ์œ„ํ•ด ํ”ฝ์…€์ด๋‚˜ ํŠน์ง•๋“ค์„ ์žฌ ์ถ”์ถœํ•˜์ง€ ์•Š์Œ
๐Ÿค– ML & DL

2023.01.14

Bayesian Inference์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • Bayesian Inference: ์ถ”๋ก  ๋Œ€์ƒ์˜ ์‚ฌ์ „ ํ™•๋ฅ ๊ณผ ์ถ”๊ฐ€์ ์ธ ์ •๋ณด๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ๋Œ€์ƒ์˜ ์‚ฌํ›„ ํ™•๋ฅ ์„ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์ผ๋ฐ˜์ ์œผ๋กœ ์šฐ๋ฆฌ์˜ ๋ชฉ์ ์€ p(xโˆ—โˆฃX)p(x^* | X)๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ž„. ์ฆ‰, given data XX๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ test data xโˆ—x^*์— ๋Œ€ํ•œ ์˜ฌ๋ฐ”๋ฅธ ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ
  • p(xโˆ—โˆฃX)=โˆซp(xโˆ—โˆฃฮธ)p(ฮธโˆฃX)dฮธp(x^* | X) = \int p (x^* | \theta) p(\theta | X) d \theta๋กœ ๊ณ„์‚ฐ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์„œ p(ฮธโˆฃX)p(\theta | X)๋Š” Bayes rule์— ์˜ํ•ด p(ฮธโˆฃX)=p(Xโˆฃฮธ)p(ฮธ)P(X)p(\theta | X) = \frac{p(X|\theta)p(\theta)}{P(X)}์ž„
๐Ÿค– ML & DL

2023.02.22

CLIP์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค

  • Natural language supervision: ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ๊ฐ€ ์ง์„ ์ด๋ฃจ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ
  • Contastive pre-training: Batchsize ๋งŒํผ์˜ ์ด๋ฏธ์ง€์™€ ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ํ…์ŠคํŠธ(๋ฌธ์žฅ)์— ๋Œ€ํ•ด ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ์„ ๊ฐ๊ฐ ๋ฝ‘์•„๋‚ด๊ณ , ์„œ๋กœ ์ง์ด ๋งž๋Š” ์ž„๋ฒ ๋”ฉ๊ฐ„ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์•„์ง€๋„๋ก ๋ชจ๋ธ ํ•™์Šต
  • Target dataset์— ๋Œ€ํ•ด class label ์ž„๋ฒ ๋”ฉ์„ ๋ชจ๋‘ ๋ฝ‘๋Š”๋ฐ, ์ด ๋•Œ ํ…์ŠคํŠธ๋กœ๋Š” 'a photo of a {class labe}'๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์คŒ (Prompt engineering!)
  • ์ตœ์ข…์ ์œผ๋กœ, ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€์˜ ์ž„๋ฒ ๋”ฉ๊ณผ target dataset์˜ 'a photo of a {class labe}' ์ž„๋ฒ ๋”ฉ ์‚ฌ์ด์— ์œ ์‚ฌ๋„๊ฐ€ ์ œ์ผ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•จ
๐Ÿค– ML & DL

2023.03.24

  • Domain generalization: source domain์œผ๋กœ ํ•™์Šตํ•œ ๋’ค ๋ฐ”๋กœ target domain์— ์ผ๋ฐ˜ํ™”
  • Domain adaptation: target domain์—๋„ ์–ด๋Š์ •๋„ label์ด ์กด์žฌํ•˜์—ฌ ์žฌํ•™์Šต์ด ๊ฐ€๋Šฅ
  • Style-based generalization: Gram matrix, Maximum Mean Discrepancy(MMD), Mean Var ๋“ฑ์„ style๋กœ ์—ฌ๊ฒจ์„œ ํ™œ์šฉ
  • ์ผ๋ฐ˜์ ์œผ๋กœ CNN์€ texture๋ฅผ ์ž˜ ์žก๋Š” high pass filter(๊ณ ์ฃผํŒŒ ์œ„์ฃผ๋กœ ์ „๋‹ฌ), Transfomer๋Š” contour๋ฅผ ์ž˜ ์žก๋Š” low pass filter์˜ ํŠน์„ฑ์„ ๋ณด์ธ๋‹ค๊ณ  ํ•จ. ๋”ฐ๋ผ์„œ CNN์— ๋Œ€ํ•ด adversarial attack ํ•  ๋•Œ๋„ ํŠน์ • ์ด๋ฏธ์ง€์— ๋‹ค๋ฅธ texture ์ž…ํžˆ๋ฉด ์˜ˆ์ธก ์„ฑ๋Šฅ ๋–จ์–ด์ง
๐Ÿค– ML & DL

2023.04.03

Stable diffusion์— ๋Œ€ํ•œ ๊ฐ„๋‹จํ•œ ๊ธฐ๋ก

  • Text2Image๋ฅผ ์œ„ํ•ด text encoder(CLIP์˜ text encoder)์™€ image generator ์‚ฌ์šฉ
  • Image generator: Image information creator (UNet + Scheduler)์™€ image decoder (Autoencoder decoder)๋กœ ๊ตฌ์„ฑ๋จ

    • Image information creator: latent space to latent space. Diffusion process ์ˆ˜ํ–‰
    • Image decoder: latent space to image space
  • Text conditioning: UNet ๋‚ด๋ถ€์˜ resnet block ์‚ฌ์ด์— attention layer๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ , token embedding์„ ๊ฐ attention layer์˜ ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด conditioning
๐Ÿค– ML & DL

2023.04.08

Random thought of AI tech.

  • ์ตœ๊ทผ์— ๋‚˜์˜จ ๋…ผ๋ฌธ์ธ Segment Anything๊ณผ PIX2STRUCT๋ฅผ ์ฝ์œผ๋ฉฐ ๋“  (์ด์ „๋ถ€ํ„ฐ ์ž์ฃผ ํ–ˆ์ง€๋งŒ ๋” ๊ฐ•ํ•ด์ง„) ์ƒ๊ฐ์€, 'ํ•™์Šต์„ ์œ„ํ•œ task๋ฅผ ์–ด๋–ป๊ฒŒ ์ •์˜ํ•˜๋Š”์ง€', ๊ทธ๋ฆฌ๊ณ  '์ˆ˜๋งŽ์€ ์–‘์˜ training ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ๋ชจ์•„์•ผํ•˜๋Š”์ง€' ๊ณ ๋ฏผํ•˜๋Š” ๊ฒƒ์ด powerfulํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ์ œ์ผ ์ค‘์š”ํ•œ ๊ธฐ๋ฐ˜์ด ๋  ๊ฒƒ์ด๋ผ๋Š” ๊ฒƒ
  • ๊ด€๋ จํ•˜์—ฌ Video PreTraining (VPT)๋„ ์ด๋Ÿฐ ์ƒ๊ฐ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์—ฐ๊ตฌ๋˜์—ˆ์Œ
๐Ÿงฉ ML library

2023.05.05

Lightning์—์„œ Distributed Data Parallel ์‚ฌ์šฉํ•  ๋•Œ ์ฐธ๊ณ ํ•  ์ ์— ๋Œ€ํ•ด ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

  • ์ฐธ๊ณ  ๋งํฌ: https://github.com/Lightning-AI/lightning/discussions/6501#discussioncomment-553152
  • sync_dist=True ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋“  process์— ๋Œ€ํ•ด sync ๋งž์ถค. ๊ธฐ๋ณธ ์˜ต์…˜์€ reduced mean
  • ๋‹ค๋งŒ, torchmetrics๊ณผ ๊ด€๋ จํ•ด์„œ๋Š” own sync code๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— self.log(...)์˜ sync_dist, sync_dist_op, sync_dist_group, reduce_fx, tbptt_reduce_fx flags๊ฐ€ metric logging์—๋Š” ์ „ํ˜€ ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Œ
  • Metric sync๋Š” metric.compute() ํ•จ์ˆ˜ ํ˜ธ์ถœ์‹œ ๋™์ž‘ํ•จ
๐Ÿค– ML & DL

2023.05.05

Reinforcement Learning from Human Feedback (RLHF)์— ๋Œ€ํ•ด ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค

  • ์˜์ƒ ๋งํฌ: https://www.youtube.com/watch?v=2MBJOuVq380
  • ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/2203.02155.pdf
  • RL์„ ์ด์šฉํ•˜์—ฌ human feedback์œผ๋กœ๋ถ€ํ„ฐ model์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•. ๋‹ค๋งŒ 2~3 ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ์‹ค์ œ๋กœ ์™œ ํ•™์Šต์ด ๋˜๋Š”์ง€์— ๋Œ€ํ•ด ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜์ง€ ๋ชปํ•ด์„œ ๋‹ค์‹œ ๊ณต๋ถ€ํ•  ํ•„์š” ์žˆ์Œ.
  • Pretraining a language model (LM)
  • Gathering data and training a reward model
  • Fine-tuning the LM with reinforcement learning
๐Ÿค– ML & DL

2023.05.05

VQ-VAE์— ๋Œ€ํ•ด ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

  • AutoEncoder: latent variable zz๋ฅผ ์ž˜ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ๊ตฌ์กฐ
  • VAE: zz encoding์˜ distribution์ด prior๋กœ ์ฃผ์–ด์ง
  • VQ-VAE

    • AutoEncoder์™€ ๊ฐ™์€ ๊ตฌ์กฐ์ด๊ธด ํ•˜๋‚˜, zz ๊ธฐ๋ฐ˜์œผ๋กœ codebook(K๊ฐœ์˜ embeddings) ๋‚ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด embedding์„ ๊ฐ€์ ธ์™€์„œ decoder input์œผ๋กœ ์‚ฌ์šฉํ•จ. codebook์„ ๊ฑฐ์ณ ๊ฐ€์ ธ์˜ค๊ธฐ ๋•Œ๋ฌธ์— vector quantization์ž„ (codebook์— ๋Œ€ํ•œ ์„ค๋ช…์€ ์ด ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŒ… ์ฐธ๊ณ )
    • Posterior์™€ prior๊ฐ€ categorical distribution์ž„
    • ํ•œ๊ฐ€์ง€ ์˜๋ฌธ: K๋Š” image ์ƒ˜ํ”Œ ์ˆ˜์™€ ๊ฐ™์€์ง€๊ฐ€ ๊ถ๊ธˆํ•จ
    • Forward pass: ์œ„์—์„œ ๋งํ•œ๋Œ€๋กœ codebook์—์„œ ์œ ์‚ฌํ•œ embedding์„ ๊ฐ€์ ธ์™€์„œ decoder์— feed forward
    • Backward pass: decoder๋Š” ๊ทธ๋Œ€๋กœ backward propagation ์ˆ˜ํ–‰ํ•˜๋Š”๋ฐ, codebook์—์„œ embedding ๊ณ ๋ฅด๋Š” ๋ถ€๋ถ„์€ argmin์— ์˜ํ•ด backprop ๋  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, decoder์˜ gradient๋ฅผ encoder ๋๋‹จ์— ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜ด
    • Loss: (encoder-decoder์— ๋Œ€ํ•œ reconstruction error) + (codebook embedding์ด encoder output๊ณผ ์œ ์‚ฌํ•ด์ง€๋„๋ก ๋•๋Š” l2 loss) + (encoder output์ด codebook embedding๊ณผ ์œ ์‚ฌํ•ด์ง€๋„๋ก ๋•๋Š” l2 loss)
๐Ÿค– ML & DL

2023.05.12

Meta์—์„œ 5์›” 9์ผ์— ๋ฐœํ‘œํ•œ ImageBind์— ๋Œ€ํ•ด์„œ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

  • 6 mocailities(Image/Video, Text, Heatmap, Depth, Audio, IMU)๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ด one modaility specialist model์˜ ์„ฑ๋Šฅ์„ ๋„˜๊น€
  • ํŠนํžˆ, ์ด ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ modality ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค๋ฅธ modality๋กœ์˜ ์ „์ด, ์˜ˆ๋ฅผ ๋“ค์–ด audio ๊ธฐ๋ฐ˜์œผ๋กœ image ์ƒ์„ฑ ๋“ฑ์˜ multi-modality ์—ฐ๊ตฌ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅ
  • Cross-modal retrieval, embedding-space arithmetic, audio-to-image generation ๋“ฑ ๊ฐ€๋Šฅ
  • ์ตœ๊ทผ Meta์˜ open source AI tool๋“ค์˜ ์ง‘ํ•ฉ์ฒด์ž„. DINO v2, SAM ๋“ฑ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ
  • For the four additional modalities (audio, depth, thermal, and IMU readings), ImageBind use naturally paired self-supervised data. ์ฆ‰, image ํ˜น์€ video๋ฅผ ๋‹ค๋ฅธ modaility์™€ pair ์‹œํ‚ด์œผ๋กœ์จ 6๊ฐœ์˜ modality๋ฅผ ๋ชจ๋‘ combine ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ImageBind๊ฐ€ ๋ณด์ž„
๐Ÿค– ML & DL

2023.05.15

ViT์™€ CNN์— ๋Œ€ํ•œ ๋น„๊ต: How Do Vision Transformers Work?

  • ViT ์ฆ‰, Multi-head Self Attention(MSA)์€ shape(structure) biased = low-pass filter
  • ResNet ์ฆ‰, ConvNet์€ texture biased = high-pass filter

CL ViT์™€ MIM ViT์— ๋Œ€ํ•œ ๋น„๊ต: What Do Self-Supervised Vision Transformers Learn?

  • CL: self-attentions collapse into homogeneity ๋ฐœ์ƒ / utilizes the low-frequency signals / a crucial role in the later layers
  • MIM: utilizes high-frequency signals / focuses on the early layers
๐Ÿค– ML & DL

2023.05.20

  • Hyper-parameter tuning ๊ณ ๋ฏผ: shell script ์งœ์„œ ๋ฏธ๋ฆฌ ์ •ํ•œ rule์— ๋”ฐ๋ผ ์‹คํ—˜ ์˜ต์…˜ ์—ฌ๋Ÿฌ ๊ฐœ ๋Œ๋ฆฌ๊ณ , wandb runs์—์„œ ์›ํ•˜๋Š” options๋“ค๋งŒ ๋„์›Œ์„œ ํ‘œ ํ˜•ํƒœ๋กœ ๋ณด๋Š”๊ฒŒ ์ œ์ผ ํŽธ๋ฆฌํ•œ๋“ฏ
๐Ÿค– ML & DL

2023.05.20

์„œ๋น„์Šค ๊ฐ€๋Šฅํ•œ AI๋Š” ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ๊ฒฐ์ •๋˜๋Š”์ง€์— ๋Œ€ํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ChatGPT๊ฐ€ ๋‹ต๋ณ€ํ•œ ๋‚ด์šฉ. ์ถฉ๋ถ„ํžˆ ๊ณ ๋ฏผํ•ด๋ณผ๋งŒํ•œ ๋‚ด์šฉ์ธ ๊ฒƒ ๊ฐ™์•„ ๊ธฐ๋กํ•จ

  1. Define requirements: Clearly identify the specific tasks or problems the AI model needs to address. Determine the desired input-output behavior, performance metrics, scalability, and any other relevant criteria.
  2. Training and validation data: The data should cover various scenarios that the AI model will encounter in real-world usage.
  3. Model selection: Consider factors like the model's architecture, complexity, size, computational requirements, and availability of resources.
  4. Model evaluation: Common metrics include accuracy, precision, recall, F1 score, or domain-specific metrics relevant to the task.
  5. Testing and validation: Deploy the AI model in a controlled or limited production environment. Validate its performance against real-world data or simulated scenarios, including edge cases and corner cases.
  6. Iterative improvement: Continuously monitor and evaluate the AI model's performance in a live or simulated environment. Collect user feedback and address any issues or limitations through iterative updates, such as fine-tuning, retraining, or architecture modifications.
  7. Ethical considerations: Evaluate the AI model's compliance with ethical guidelines, privacy requirements, and legal regulations.
  8. Scalability and resource requirements: Assess the AI model's scalability and resource demands, such as computing power, memory, or network bandwidth.
  9. Robustness and reliability: Test the AI model's robustness by subjecting it to adversarial attacks, noisy or incomplete data, or other challenging conditions. Assess its reliability by measuring its performance over an extended period, considering factors like model drift or degradation.
  10. Cost considerations: Evaluate the total cost of deploying and maintaining the AI model, including infrastructure, licensing, data storage, and ongoing support. Consider the model's value proposition and its impact on productivity, efficiency, or revenue generation.
๐Ÿค– ML & DL

DINO์™€ DINO v2์— ๋Œ€ํ•ด์„œ ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • Self-supervised ViT์˜ ํŠน์ง•: scene layout ๊ฒฝ๊ณ„ ํŒŒ์•… ์ž˜ํ•˜๋ฉฐ, feature๋งŒ ๊ฐ€์ง€๊ณ  k-NN classifier ๋งŒ๋“ค์–ด๋„ ์„ฑ๋Šฅ ์ข‹์Œ
  • ๋‹ค๋งŒ k-NN classifier ์„ฑ๋Šฅ ์œ„ํ•ด์„œ๋Š”, momentum encoder, multi-crop augmentation, small patches๊ฐ€ ์š”๊ตฌ๋˜๋Š” ๊ฒƒ ๋ฐœ๊ฒฌ
  • DINO: momentum encoder ๊ธฐ๋ฐ˜ BYOL ๋ฐฉ์‹ ์ฐจ์šฉ. ์—ฌ๊ธฐ์— loss ์‹์— ์กฐ๊ธˆ ์ฐจ์ด ๊ฐ€์ง€๊ณ , teacher-student ๊ตฌ์กฐ ๋™์ผํ•จ
  • DINO v2: Image level๋กœ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€ ๊ตฌ๋ถ„, patch level ๊ฐ™์€ ์ด๋ฏธ์ง€ ๋‚ด ์„œ๋กœ ๋‹ค๋ฅธ patch ๊ตฌ๋ถ„. ์ด ์™ธ์—๋„ ๋งŽ์€ ์–‘์˜ โ€˜ํ€„๋ฆฌํ‹ฐ ์ข‹์€โ€™ ๋ฐ์ดํ„ฐ์™€ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ ํ•™์Šต ๋ฐฉ๋ฒ• ์ œ์•ˆ
๐Ÿงฉ ML library

2023.08.12

  • Apache Arrow: ์ง๋ ฌํ™”์™€ ์—ญ์ง๋ ฌํ™”์˜ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋†’๋‹ค๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ๋‹ค๋ฃฐ ๋•Œ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ ์ž„. Apach Arrow๋Š” ์ง๋ ฌํ™” ๊ณผ์ •์ด ์—†๋Š” zero-copy read๊ฐ€ ๊ฐ€๋Šฅํ•œ๋ฐ, ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์ธ ๊ฐ์ฒด๋ฅผ ๊ฐ€์ง€๊ณ  ์ž‘์—…ํ•˜๋Š” ๋ฐฉ์‹์ด ์•„๋‹Œ ์ง๋ ฌํ™”๋œ ๋ฐ์ดํ„ฐ ์ž์ฒด๋ฅผ ๊ฐ€์ง€๊ณ  ์ž‘์—…ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์ด ๊ฐ€๋Šฅ

    • Main purpose: Language-independent open standards and libraries to accelerate and simplify in-memory computing
  • Huggingface datasets w. arrow: ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ ์ฒ˜๋Ÿผ Arrow๋Š” ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ์™€ ์ด๋™์„ ๋น ๋ฅด๊ฒŒ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ (Arrow format์€ zero-copy read ๊ฐ€๋Šฅํ•˜๊ธฐ์— ์ง๋ ฌํ™” ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์—†์• ์ฃผ๊ธฐ ๋•Œ๋ฌธ). ๋”ฐ๋ผ Huggingface datasets์€ arrow ํ™œ์šฉํ•จ. ๋˜ํ•œ column-oriented์ด๊ธฐ ๋•Œ๋ฌธ์— querying์ด๋‚˜ slicing ๋“ฑ ์ฒ˜๋ฆฌ ์†๋„ ๋น ๋ฆ„
๐Ÿงฉ ML library

2024.05.30

๐Ÿค– ML & DL

2024.08.05

img

Attention

  • Attention: Scaled dot-product attention mechanism์˜ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. Query์™€ key ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•˜๊ณ , ํ•ด๋‹น ์œ ์‚ฌ๋„๋ฅผ key์™€ ๋งตํ•‘๋˜์–ด์žˆ๋Š” ๊ฐ value์— ๋ฐ˜์˜ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. Self-attention์ด๋ผ๊ณ  ํ•œ๋‹ค๋ฉด, (1) ์ž…๋ ฅ์ด Wq, Wk, Wv matrix๋ฅผ ๊ฐ๊ฐ ๊ฑฐ์ณ์„œ query, key, value embedding์œผ๋กœ ๋ณ€ํ•˜๊ณ  (2) ํ•ด๋‹น query, key, value embedding ๊ฐ„ attention์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
Attentionโก(Q,K,V)=softmaxโก(QKTdk)V\operatorname{Attention}(Q, K, V)=\operatorname{softmax}\left(\frac{Q K^T}{\sqrt{d_k}}\right) V

img

  • Multi-Head Attention (MHA): Attention์„ ํ•˜๋‚˜๊ฐ€ ์•„๋‹Œ ์—ฌ๋Ÿฌ ๊ฐœ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋‹ค์–‘ํ•œ subspace์—์„œ์˜ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•˜๊ณ  ๋ณต์žกํ•œ ํŒจํ„ด์„ ๋” ์ž˜ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค. ํ•œ๋ฒˆ์˜ attention ์—ฐ์‚ฐ์„ ์œ„ํ•ด ๊ฐ๊ฐ ํ•˜๋‚˜์˜ query, key, value head๊ฐ€ ํ•„์š”ํ•˜๋ฏ€๋กœ, H๋ฒˆ์˜ ์—ฐ์‚ฐ์„ ์œ„ํ•ด์„œ๋Š” ๊ฐ H๊ฐœ์˜ query, key, value head๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • Multi-Query Attention (MQA): MQA๋Š” key, value head๋ฅผ ์˜ค๋กœ์ง€ ํ•˜๋‚˜๋งŒ ๋‘๋Š” ๋ณ€ํ˜•์ž…๋‹ˆ๋‹ค.
  • Grouped-Query Attention (GQA): GQA๋Š” H๊ฐœ์˜ query๋ฅผ G๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์–ด ์–ดํ…์…˜ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. GQA-G๋Š” G group์˜ key, value head๋ฅผ ๊ฐ€์ง€๋Š”๋ฐ, ๋”ฐ๋ผ์„œ GQA-H๋Š” MHA์™€ ๋™์ผํ•˜๊ณ  GQA-1์€ MQA์™€ ๋™์ผํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. MHA์˜ ์ฒดํฌํฌ์ธํŠธ๋ฅผ GQA์˜ ์ฒดํฌํฌ์ธํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๊ฐ ๊ทธ๋ฃน์— ์†ํ•˜๋Š” ๊ธฐ์กด head๋ฅผ mean pooling ํ•˜์—ฌ ์ƒˆ๋กœ์šด key, value head๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. GQA๋Š” MQA๋งŒํผ ๋น ๋ฅด๋ฉด์„œ๋„ MHA ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•ฉ๋‹ˆ๋‹ค.

Pre-Training

  • Mixture of Experts (MoE): ์—ฌ๋Ÿฌ ์ „๋ฌธ๊ฐ€ ์„œ๋ธŒ๋„คํŠธ์›Œํฌ๊ฐ€ ๋ฐ์ดํ„ฐ์˜ ๋‹ค๋ฅธ ์ธก๋ฉด์— ํŠนํ™”๋˜๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ถ”๋ก  ์ค‘์—๋Š” ์ด๋Ÿฌํ•œ ์ „๋ฌธ๊ฐ€ ์ค‘ ์ผ๋ถ€๋งŒ ํ™œ์„ฑํ™”๋˜์–ด ๊ณ„์‚ฐ ๋ถ€๋‹ด์„ ์ค„์ด๋ฉด์„œ๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • Mixture of Depth (MoD): ํ•™์Šต ๋ฐ ์ถ”๋ก  ์ค‘ ๋ชจ๋ธ์˜ ๊นŠ์ด๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

Instruction Tuning

  • Multi-Turn Instructions: Multi-Turn Instructions๋Š” ์—ฌ๋Ÿฌ ๋Œ€ํ™” ํ„ด์— ๊ฑธ์ณ ์‘๋‹ต์„ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด ํŠœ๋‹ ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์ด ํ™•์žฅ๋œ ์ƒํ˜ธ์ž‘์šฉ ๊ณผ์ •์—์„œ ๋ฌธ๋งฅ๊ณผ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผœ ์ฑ—๋ด‡๊ณผ ๊ฐ™์€ ํ”„๋กœ๊ทธ๋žจ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Instruction Following: Instruction Following์€ ์ฃผ์–ด์ง„ ์ง€์‹œ ์‚ฌํ•ญ์„ ์ดํ•ดํ•˜๊ณ  ์‹คํ–‰ํ•˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ด ๊ธฐ์ˆ ์€ ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ์ง€์‹œ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๋”ฐ๋ฅผ ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ์ค‘์š”ํ•˜์—ฌ, ์ •๋ฐ€ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์—… ์™„๋ฃŒ๋ฅผ ์š”๊ตฌํ•˜๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ ๋” ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

Alignment

  • Reinforcement Learning from Human Feedback

    1. Initial Training of the Language Model (Pre-training): ๋จผ์ €, LLM์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋Œ€๋Ÿ‰์˜ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ ํ•™์Šต(pre-training)๋ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์–ธ์–ด ๋ชจ๋ธ์ด ์–ธ์–ด์˜ ํ†ต๊ณ„์  ํŒจํ„ด์„ ํ•™์Šตํ•˜๊ณ , ๋‹ค์–‘ํ•œ ํ…์ŠคํŠธ ์ƒ์„ฑ ๋ฐ ์ดํ•ด ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
    2. Supervised Fine-tuning: LLM์ด ์‚ฌ์ „ ํ•™์Šต๋œ ํ›„, ์ฃผ๋กœ ์ธ๊ฐ„์ด ๋ ˆ์ด๋ธ”๋งํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํŠน์ • ์ž‘์—…์— ๋งž๊ฒŒ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„๋Š” ๋ชจ๋ธ์ด ํŠน์ • ํ˜•์‹์˜ ์งˆ๋ฌธ์— ๋Œ€๋‹ตํ•˜๊ฑฐ๋‚˜ ํŠน์ • ์Šคํƒ€์ผ๋กœ ๊ธ€์„ ์ž‘์„ฑํ•˜๋Š” ๋“ฑ ํŠน์ • ์ž‘์—…์„ ๋” ์ž˜ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    3. Collecting Human Feedback: ๋ชจ๋ธ์ด ์–ด๋Š ์ •๋„ ์„ฑ๋Šฅ์„ ๊ฐ–์ถ”๊ฒŒ ๋˜๋ฉด, ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ์ธ๊ฐ„์œผ๋กœ๋ถ€ํ„ฐ ํ”ผ๋“œ๋ฐฑ์„ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค. ํ”ผ๋“œ๋ฐฑ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ํ…์ŠคํŠธ์˜ ํ’ˆ์งˆ, ์ •ํ™•์„ฑ, ๊ด€๋ จ์„ฑ ๋“ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ํ˜•ํƒœ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ reward model์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
    4. Training the Reward Model: ์ˆ˜์ง‘๋œ ์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ reward model์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๋ฉฐ, ํ…์ŠคํŠธ์˜ ํ’ˆ์งˆ์ด๋‚˜ ์‚ฌ์šฉ์ž ์˜๋„์™€์˜ ์ผ์น˜๋„๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    5. Reinforcement Learning (RL) Fine-tuning: ํ•™์Šต๋œ reward model์„ ์‚ฌ์šฉํ•˜์—ฌ LLM์„ ๊ฐ•ํ™” ํ•™์Šต(Reinforcement Learning) ๋ฐฉ์‹์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Proximal Policy Optimization (PPO)์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ ˆ์ฐจ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค:
    6. Policy Generation: ํ˜„์žฌ LLM์„ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    7. Reward Evaluation: ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ๋ฅผ reward model์„ ํ†ตํ•ด ํ‰๊ฐ€ํ•˜์—ฌ ๋ณด์ƒ(reward)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    8. Policy Update: ๋ณด์ƒ์„ ์ตœ๋Œ€ํ™”ํ•˜๋„๋ก LLM์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ PPO ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ์ •์ฑ…์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.
    9. Iterative Improvement: ๊ฐ•ํ™” ํ•™์Šต์„ ํ†ตํ•ด ๋ชจ๋ธ์ด ์ง€์†์ ์œผ๋กœ ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค. ํ•„์š”ํ•˜๋ฉด ๋” ๋งŽ์€ ์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ์„ ์ˆ˜์ง‘ํ•˜์—ฌ reward model์„ ์—…๋ฐ์ดํŠธํ•˜๊ณ , ์ด๋ฅผ ๋‹ค์‹œ LLM์˜ ๊ฐ•ํ™” ํ•™์Šต์— ๋ฐ˜์˜ํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Direct Preference Optimization:

Decoding Strategies

  • Greedy Search: Greedy search๋Š” ๋ชจ๋ธ์ด ๊ฐ ๋‹จ๊ณ„์—์„œ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์˜ ํ† ํฐ์„ ์„ ํƒํ•˜๋Š” ๋‹จ์ˆœํ•œ ๋””์ฝ”๋”ฉ ์ „๋žต์ž…๋‹ˆ๋‹ค. ๋น ๋ฅด๊ณ  ์ง๊ด€์ ์ด์ง€๋งŒ ๋ฏธ๋ž˜์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•„ ์ตœ์ ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋†“์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Beam Search: Beam search๋Š” ๊ฐ ๋‹จ๊ณ„์—์„œ ์—ฌ๋Ÿฌ ํ›„๋ณด ์‹œํ€€์Šค(๋น”)๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋” ์ •๊ตํ•œ ๋””์ฝ”๋”ฉ ์ „๋žต์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ๊ฒฝ๋กœ๋ฅผ ๋™์‹œ์— ํƒ์ƒ‰ํ•จ์œผ๋กœ์จ ๊ทธ๋ฆฌ๋”” ์„œ์น˜๋ณด๋‹ค ๋” ์ตœ์ ์˜ ์†”๋ฃจ์…˜์„ ์ฐพ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง€์ง€๋งŒ, ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋” ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค.
  • Top-k Sampling: Top-k ์ƒ˜ํ”Œ๋ง์€ ๋ชจ๋ธ์ด ๋‹ค์Œ ํ† ํฐ์„ ์ƒ์œ„ k๊ฐœ์˜ ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ํ›„๋ณด ์ค‘์—์„œ ์„ ํƒํ•˜๋Š” ํ™•๋ฅ ์  ๋””์ฝ”๋”ฉ ์ „๋žต์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘์„ฑ์„ ๋„์ž…ํ•˜๊ณ  ๋ฐ˜๋ณต์ ์ด๊ฑฐ๋‚˜ ๊ฒฐ์ •๋ก ์ ์ธ ์ถœ๋ ฅ์„ ์ค„์—ฌ, ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ์˜ ์ž์—ฐ์Šค๋Ÿฌ์›€๊ณผ ๋‹ค์–‘์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
  • Top-p Sampling: Top-p ์ƒ˜ํ”Œ๋ง(๋ˆ„ํด๋ฆฌ์–ด์Šค ์ƒ˜ํ”Œ๋ง)์€ ๋ˆ„์  ํ™•๋ฅ ์ด ํŠน์ • ์ž„๊ณ„๊ฐ’ p๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ฐ€์žฅ ์ž‘์€ ํ›„๋ณด ์ง‘ํ•ฉ์—์„œ ๋‹ค์Œ ํ† ํฐ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ƒ˜ํ”Œ๋ง ํ’€์˜ ๋™์  ์กฐ์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ, ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ์˜ ๋‹ค์–‘์„ฑ๊ณผ ์ผ๊ด€์„ฑ ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ๋งž์ถฅ๋‹ˆ๋‹ค.

Efficient Tuning

  • Low-Rank Adaptation: pretrained model weight๋ฅผ ๋ชจ๋‘ freeze ํ•œ ๋’ค์—, downstream task fine-tuning์„ ์œ„ํ•œ rank decomposition matrice๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํšจ์œจ์  fine-tuning์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

References


  1. Wikipedia contributors. (2021, April 12). Moment (mathematics). In Wikipedia, The Free Encyclopedia. Retrieved 12:08, May 24, 2021, from https://en.wikipedia.org/w/index.php?title=Moment_(mathematics)&oldid=1017468752

    โ†ฉ
  2. JinWon Lee - PR-317: MLP-Mixer: An all-MLP Architecture for Vision. https://www.youtube.com/watch?v=KQmZlxdnnuY

    โ†ฉ
  3. JoonYoung Yi - Slideshare, Dynamically Expandable Network (DEN). https://www.slideshare.net/ssuser62b35f/180808-dynamically-expandable-network

    โ†ฉ
  4. Wikipedia contributors. (2021, August 1). Signed distance function. In Wikipedia, The Free Encyclopedia. Retrieved 00:41, November 14, 2021, from https://en.wikipedia.org/w/index.php?title=Signed_distance_function&oldid=1036639454

    โ†ฉ
  5. Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

    โ†ฉ
  6. 1.3.6.1.What is a Probability Distribution., NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, December 2, 2021.

    โ†ฉ
  7. Olivier Moindrot. "Triplet Loss and Online Triplet Mining in TensorFlow". https://omoindrot.github.io/triplet-loss, Mar 19, 2018.

    โ†ฉ
  8. Wikipedia contributors. (2022, April 27). Mooreโ€“Penrose inverse. In Wikipedia, The Free Encyclopedia. Retrieved 06:08, May 16, 2022, from https://en.wikipedia.org/w/index.php?title=Moore%E2%80%93Penrose_inverse&oldid=1085006448

    โ†ฉ
  9. https://github.com/onnx/onnx/blob/main/docs/Overview.md

    โ†ฉ
  10. Mermillod, Martial, Aurรฉlia Bugaiska, and Patrick Bonin. "The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects." Frontiers in psychology 4 (2013): 504.

    โ†ฉ