์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ™œ์„ฑํ™” ํ•ด์ฃผ์„ธ์š”

DLI-01,02

 ·  โ˜• 5 min read  ·  โœ๏ธ brinst · ๐Ÿ‘€... ์กฐํšŒ์ˆ˜

DLI

ํšŒ์‚ฌ์—์„œ nvidia DLI๋ฅผ ์ˆ˜๋ฃŒํ• ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ค˜์„œ ๊ด€๋ จ ๊ต์œก์„ ๋ฐ›๊ฒŒ ๋˜์—ˆ๋‹ค.
ํ•ด๋‹น ๊ต์œก์„ ๋ฐ›๊ณ  ๋‚ด์šฉ์„ ์ •๋ฆฌํ•˜๋Š” ๊ธ€์ด๋‹ค.

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

1. ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€ ์ž…๋ ฅ์„ ๋‹จ์ˆœํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ํ‰ํ‰ํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค.

(28x28) โ‡’ (784)๋กœ 1์ฐจ์›์œผ๋กœ ๋‚ด๋ฆผ

2. ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ด๋ฏธ์ง€ ์ž…๋ ฅ ๊ฐ’์„ ๋” ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์ •๊ทœํ™”ํ•œ๋‹ค.

์ •์ˆ˜ ๊ฐ’์„ 0๊ณผ 1์‚ฌ์ด์˜ ๋ถ€๋™ ์†Œ์ˆ˜์  ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์„ ์ •๊ทœํ™”๋ผ๊ณ  ํ•œ๋‹ค.
๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ํ”ฝ์…€ ๊ฐ’์„ ๊ฐ€์žฅ ํฐ ๊ฐ’์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด๋‹ค. โ‡’ x_train = x_train / x_train.max()

3. ๋ชจ๋ธ์— ๋Œ€ํ•ด ๋ ˆ์ด๋ธ” ๊ฐ’์„ ๋” ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ ˆ์ด๋ธ” ๊ฐ’์„ ๋ถ„๋ฅ˜ํ•œ๋‹ค.

label

y ๊ฐ’์— ๋Œ€ํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ ๊ฐ’์ค‘ ๋‹ต์ด ์กด์žฌํ•  ๋•Œ ์ด๊ฒƒ์„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ 0๊ณผ 1๋กœ ๋ณ€๊ฒฝํ•ด์ค˜์•ผํ•œ๋‹ค.

1
2
3
4
5
6
7
8
9
num_categories = 3
y_train = keras.utils.to_categorical(y_train, num_categories)

values = [
    [1,0,0],
    [0,0,1],
    [0,1,0],
    [0,0,1]
]

๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ํ•™์Šต

tensorflow_tf.keras.layers.Dense

1. ๋ชจ๋ธ ์ธ์Šคํ„ด์Šคํ™”

1
2
3
from tensorflow.keras.models import Sequential

model = Sequential()

2. Input Layer ์ƒ์„ฑ

model์— layer๋ฅผ ์Œ“๋Š” ๊ณผ์ •์ด๋‹ค.

units์€ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ ์ˆ˜๋ฅผ ์ง€์ •ํ•œ๋‹ค. โ†’ ์ถœ๋ ฅ ๊ฐ’์˜ ํฌ๊ธฐ

activation : ํ™œ์„ฑํ™” ํ•จ์ˆ˜

relu๋Š” ์•„๋ž˜ ๊ทธ๋ž˜ํ”„ ์ฒ˜๋Ÿผ 0๋ณด๋‹ค ์ž‘์€ ๊ฐ’์€ 0์œผ๋กœ ๋‚˜ํƒ€๋‚ด๊ณ  0๋ณด๋‹ค ํฌ๊ฒŒ ๋˜๋ฉด ์ง์„ ํ˜•ํƒœ์˜ ๊ฐ’์„ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.

relu

input_shape : input shape์„ ์ง€์ •ํ•œ๋‹ค.

1
2
3
from tensorflow.keras.layers import Dense

model.add(Dense(units=512, activation='relu', input_shape(784,)))

3. Hidden Layer ์ƒ์„ฑ

Hidden Layer๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ๋” ๋งŽ์€ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์ •ํ™•ํ•œ ํ•™์Šต ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

1
model.add(Dense(units=512, activation='relu'))

4. Output Layer ์ƒ์„ฑ

y ๊ฐ’์„ ์‚ดํŽด๋ณด๋ฉด ์ด 10๊ฐœ์˜ ๊ฐ’์ด ์ถœ๋ ฅ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ์ถœ๋ ฅ layer์—์„œ๋Š” units ๊ฐ’์„ 10๊ฐœ๋กœ ์ง€์ •ํ•ด์ค€๋‹ค.

๋˜ํ•œ ๊ธฐ์กด๊ณผ ๋‹ค๋ฅด๊ฒŒ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ relu๊ฐ€ ์•„๋‹Œ softmax๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

softmax๋Š” ์ž…๋ ฅ๋ฐ›์€ ๊ฐ’์„ ์ถœ๋ ฅ์œผ๋กœ 0~1์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๋ชจ๋‘ ์ •๊ทœํ™”ํ•˜๋ฉฐ ์ถœ๋ ฅ ๊ฐ’๋“ค์˜ ์ดํ•ฉ์€ ํ•ญ์ƒ 1์ด ๋˜๋Š” ํŠน์„ฑ์„ ๊ฐ€์ง„ ํ•จ์ˆ˜์ด๋‹ค.

relu์™€ ์ฐจ์ด์ ์ด ์žˆ๋‹ค๋ฉด, 0๋ณด๋‹ค ์ž‘์€ ๊ฐ’์€ 0์ธ ๋Œ€์‹  ๊ทธ ์ด์ƒ์˜ ๊ฐ’์„ ์ง์„ ํ˜•ํƒœ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ํ˜•ํƒœ๋ฅผ ๋ณด์ด๋Š” ๋ฐ˜๋ฉด, softmax๋Š” 0.1,0.5,0.4 ์ด๋Ÿฐ ํ˜•ํƒœ๋กœ ๊ฐ’์ด ์กด์žฌํ•˜๊ฒŒ ๋œ๋‹ค.

์ฆ‰ ๊ฐ’ ์ค‘์—์„œ ์ œ์ผ ํฐ ๊ฐ’์„ ๊ฒฐ๊ณผ๋กœ ํŠน์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

1
model.add(Dense(units = 10, activation = 'softmax'))

5. model ์š”์•ฝ

์œ„์—์„œ ๋ชจ๋ธ์— layer๋ฅผ ์Œ“์•˜๋Š”๋ฐ, ๊ทธ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

1
model.summary()

6. ๋ชจ๋ธ ์ปดํŒŒ์ผํ•˜๊ธฐ

๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ trainingํ•˜๊ธฐ ์ „ ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋กœ, ๋ชจ๋ธ์„ ์ปดํŒŒ์ผ ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค.

์—ฌ๊ธฐ์„œ๋Š” ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์ค‘ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ˆ˜ํ–‰๋˜๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•  loss function์„ ์ง€์ •ํ•œ๋‹ค.

[ML101] #3. Loss Function

loss function์ด๋ž€ ๊ฐ„๋‹จํ•˜๊ฒŒ ์–˜๊ธฐํ•˜๋ฉด, ๋ฐ์ดํ„ฐ๋ฅผ ํ† ๋Œ€๋กœ ์‚ฐ์ถœํ•œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’๊ณผ์˜ ์ฐจ์ด๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ง€ํ‘œ์ด๋‹ค.

loss function์˜ ํ•จ์ˆ˜๊ฐ’์ด ์ตœ์†Œํ™” ๋˜๋„๋ก ํ•˜๋Š” weight์™€ bias๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค. โ‡’ ํ•™์Šต

๋˜ํ•œ ๋ชจ๋ธ์ด ํ›ˆ๋ จํ•˜๋Š” ๋™์•ˆ ์ •ํ™•์„ฑ์„ ์ถ”์ ํ•˜๊ณ ์ž ํ•œ๋‹ค๊ณ  ๋ช…์‹œํ•œ๋‹ค.

```python
model.compile(loss='categorical_crossentrophy', metrics=['accuracy'])
```

7. ๋ชจ๋ธ ํ›ˆ๋ จํ•˜๊ธฐ

์ด์ œ ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ์„ ์ค€๋น„ํ–ˆ์œผ๋‹ˆ, ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ คํ•˜๊ณ  ๊ฒ€์ฆ๋ฐ์ดํ„ฐ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฒ€์ฆํ•˜๋ฉด ๋œ๋‹ค.

1
2
3
4
5
history = model.fit(
    x_train, y_train, epochs=5, verbose=1, validation_data=(x_valid, y_valid)
)

tf.model.get_weights()
  • epochs โ‡’ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์„ ๊ฑฐ์น˜๋Š” ํšŸ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. epochs ๊ฐ’์ด ๋„ˆ๋ฌด ์ž‘๋‹ค๋ฉด underfitting, ๋„ˆ๋ฌด ํฌ๋‹ค๋ฉด overfitting์ด ๋ฐœ์ƒํ•  ํ™•๋ฅ ์ด ๋†’๋‹ค.

  • verbose โ‡’ ํ•™์Šต์‹œ์— ์ง„ํ–‰์ƒํ™ฉ์„ ์•Œ๋ ค์ฃผ๋Š” argument์ด๋‹ค.

    ( 0 = slient, 1 = progress bar, 2 = on line per epoch)

  • validation_data โ‡’ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ

  • tf.model.get_weights()๋กœ ํ•ด๋‹น ๋ชจ๋ธ์˜ W(๊ธฐ์šธ๊ธฐ) ์™€ bias(์ ˆํŽธ) ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

verbose๋ฅผ ์‚ฌ์šฉํ–ˆ๊ธฐ์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋กœ๊ทธ๊ฐ€ ์ถœ๋ ฅ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1933 - accuracy: 0.9426 - val_loss: 0.1238 - val_accuracy: 0.9661
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1017 - accuracy: 0.9736 - val_loss: 0.1087 - val_accuracy: 0.9745
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0803 - accuracy: 0.9801 - val_loss: 0.1226 - val_accuracy: 0.9764
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0692 - accuracy: 0.9830 - val_loss: 0.1370 - val_accuracy: 0.9776
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0640 - accuracy: 0.9866 - val_loss: 0.1534 - val_accuracy: 0.9753

compile ์‹œ์— merics=[‘accuracy’] ์ถ”์  ์˜ต์…˜์„ ๋„ฃ์—ˆ๊ธฐ์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด accuracy๊ฐ€ ์ถœ๋ ฅ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

val_๋Š” ๊ฒ€์ฆ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ถœ๋ ฅ ๊ฐ’์ด๋‹ค.

Test

optimizer ์‚ฌ์šฉ

label
epochs์„ ์ ๊ฒŒ ์ค˜๋„ loss๊ฐ€ ๋‚ฎ์Œ : 15

๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” optimizer๋Š” Adm, RMSProp์ด๋ผ๊ณ  ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์•„๋ž˜์˜ˆ์ œ์—์„œ๋Š” ๋‹ค๋ฅธ optimizer๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๋กœ์Šค์œจ์ด ๋†’์Œ. (๊ทธ๋ž˜๋„ ์•ˆ์“ฐ๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ํšจ์œจ์ด ํ›จ์”ฌ ์ž˜๋‚˜์˜จ๋‹ค. Adam๋Š” 3500์ •๋„์— ์ตœ์ € ๋กœ์Šค๋ฅผ ์ฐ์Œ)

์ฆ‰, ํ•ด๋‹น ์˜ˆ์ œ์—์„œ๋Š” SGD๊ฐ€ ์ตœ์ ํ™”๋˜์–ด์žˆ๋Š” ๊ฒƒ ๊ฐ™์Œ.

SADAdam๊ณผ RMSProp๋Š” ์–ด๋Š์ •๋„ ๋ณต์žกํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์ž˜ ๋งž๋Š” ์˜ตํ‹ฐ๋งˆ์ด์ €์ธ๋ฐ ํ•˜๋‹จ์˜ ์˜ˆ์ œ๊ฐ€ ๋„ˆ๋ฌด ๊ฐ„๋‹จํ•ด์„œ ํšจ์œจ์ด ์ž˜ ์•ˆ๋‚˜์˜ฌ์ˆ˜๋„ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt
import tensorflow as tf

m = -2  # -2 to start, change me please
b = 40  # 40 to start, change me please

# Sample data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([10, 20, 25, 30, 40, 45, 40, 50, 60, 55])

tf.model = tf.keras.Sequential()
tf.model.add(tf.keras.layers.Dense(units=1, input_shape=([1])))
# ์„ธ๋ถ€์ ์ธ ํ•™์Šต์„ ์œ„ํ•ด ๋ณดํ†ต 0.01์„ ์ค€๋‹ค.
sgd = tf.keras.optimizers.SGD(lr=0.01)
tf.model.compile(loss='mse', optimizer=sgd)
tf.model.summary()
model = tf.model.fit(x, y, epochs=500, verbose=1)
m = tf.model.get_weights()[0][0]
b = tf.model.get_weights()[1][0]
y_hat = x * m + b
plt.plot(x, tf.model.predict(x), label='x๊ฐ’')
plt.plot(x, y, '.')
plt.plot(x, y_hat, '-')
plt.show()

print("Loss:", np.sum((y - y_hat) ** 2) / len(x))

optimizer ์„ค์ •์—†์ด ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•

epochs์„ 10000์„ ์คฌ์Œ์—๋„ sgd๋ฅผ ์‚ฌ์šฉํ• ๋•Œ๋ณด๋‹ค loss๊ฐ€ ๋†’์Œ(20~25)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import numpy as np
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt
import tensorflow as tf

m = -2  # -2 to start, change me please
b = 40  # 40 to start, change me please

# Sample data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([10, 20, 25, 30, 40, 45, 40, 50, 60, 55])

tf.model = tf.keras.Sequential()
# tf.model.add(tf.keras.layers.Dense(units=1, input_shape=([1])))
# sgd = tf.keras.optimizers.SGD(lr=0.01)
X = tf.keras.layers.Input(shape=[1])
Y = tf.keras.layers.Dense(1)(X)
tf.model = tf.keras.models.Model(X, Y)
tf.model.compile(loss='mse')
tf.model.summary()
model = tf.model.fit(x, y, epochs=10000, verbose=1)
m = tf.model.get_weights()[0][0]
b = tf.model.get_weights()[1][0]
print(m, b)
y_hat = x * m + b
plt.plot(x, tf.model.predict(x), label='๊ฐ’')
plt.plot(x, y, '.')
plt.plot(x, y_hat, '-')
plt.show()

print("Loss:", np.sum((y - y_hat) ** 2) / len(x))

polyfit์„ ์‚ฌ์šฉํ•ด์„œ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•

๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ• loss : 14

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import numpy as np
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt
import tensorflow as tf

m = -2  # -2 to start, change me please
b = 40  # 40 to start, change me please

# Sample data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([10, 20, 25, 30, 40, 45, 40, 50, 60, 55])

m,b = np.polyfit(x, y, 1)
y_hat = x * m + b
plt.plot(x, y, '.')
plt.plot(x, y_hat, '-')
plt.show()

print("Loss:", np.sum((y - y_hat) ** 2) / len(x))
๊ณต์œ ํ•˜๊ธฐ

brinst
๊ธ€์“ด์ด
brinst
Backend Developer

๋ชฉ์ฐจ