๐—›๐—ฎ๐—ฑ๐—ฎ๐—ฎ ๐—ฟ๐—ฎ๐—ฏ๐˜๐—ถ๐—ฑ ๐—ถ๐—ป๐—ฎ๐—ฎ ๐—ธ๐—ฎ ๐—ณ๐—ถ๐—ถ๐—ฐ๐—ป๐—ฎ๐—ฎ๐˜๐—ถ๐—ฑ ๐—ฎ๐˜ƒ๐—ฒ๐—ฟ๐—ฎ๐—ด๐—ฒ ๐—”๐—œ ๐˜‚๐˜€๐—ฒ๐—ฟ ๐—ฎ๐—บ๐—ฎ ๐—พ๐—ผ๐—ณ ๐—ฐ๐—ฎ๐—ฎ๐—ฑ๐—ถ ๐—ฒ๐—ต ๐—ผ๐—ผ ๐—–๐—ต๐—ฎ๐˜๐—š๐—ฃ๐—ง ๐—ฐ๐—ฎ๐—ฎ๐—ฑ๐—ถ ๐˜„๐—ฎ๐˜‚ ๐˜„๐—ฒ๐—ฒ๐˜†๐—ฑ๐—ถ๐—ถ๐˜†๐—ผ, ๐—พ๐—ผ๐—ฟ๐—ฎ๐—ฎ๐—น๐—ธ๐—ฎ๐—ฎ๐—ป ๐—ฎ๐—พ๐—ฟ๐—ถ๐˜€๐—ผ.


LLM explainer

@abdinajibmohamed12

ChatGPT iyo AI-yada lamidka eh waxaa la dhahaa Large Language Models (LLMs).

๐—ฆ๐—ถ ๐—ณ๐˜‚๐—ฑ๐˜‚๐—ฑ ๐˜€๐—ถ๐—ฑ๐—ฒ๐—ฒ ๐˜‚๐˜€๐—ต๐—ฎ๐—พ๐—ฒ๐—ฒ๐˜†๐—ฎ๐—ฎ๐—ป?

LLMs-ka waxa lee sameeyaan, markaa qoraal uqortid, waxee qiyasayaan ereyadaan laguu sooqoray maxaa ku xigi jiray caadiyaan.

Tusaale: Hadaa AI uqortid:

โ€๐—ฐ๐—ฎ๐—ฎ๐˜€๐—ถ๐—บ๐—ฎ๐—ฑ๐—ฎ ๐˜€๐—ผ๐—บ๐—ฎ๐—น๐—ถ๐—ฎ ๐—บ๐—ฎ๐—ด๐—ฎ๐—ฐ๐—ฒ๐—ฒ๐—ฑ โ€

AI-ga waxuu heestaa data badan oo lagu tababaray, badana waxaa kala dageen internetka. Waxuu sameenaa list oo ereyo eh uu ka soo dhex helay datadii lagu traingareeyay.

Tusaale ahaan, gudaha modelka waxuu ka fiirinaa sidaan oo kale (tani waa inuu isagaa ku jirto maskaxdiisa, adiga ma arkeysid):

  • Caasimada somaliya ___
    • waa [80%]
    • Mogadishu [75%]
    • xamar [65%]
    • kismaayo [50%]

Ereyadaan probability bay wataan, oo ah fursad ay sax ku noqon karaan. Tusaalaha kore, datada modelka lagu tababaray waxaa badanaa ku xigeysay โ€œcaasimada somaliaโ€ ereyga โ€œwaaโ€. Sababtoo ah dadka badanaa โ€œcaasimada Somalia Mogadishuโ€ maqoraan; af-Soomaali natural ah maโ€™ahan. Statistically midka udhow aa qiyaasoonaa, so โ€œwaaโ€ buu qaataa:

Caasimada somaliya waa ___

Markaas waxaa markale loo gelinaa AI-ga, hadane wuu qiyaasaa ereyga ku xiga:

  • Caasimada somaliya waa ___
    • Mogadishu [90%]
    • xamar [60%]
    • kismaayo [50%]

Markaas ayuu dhahaa:

Caasimada somaliya waa Mogadishu.

Uma baahnid inaa ML engineer noqotid si aad u fahamtid. Marka LLM maqashid, maskaxda ha kaaga soo dhacdo: waxaa lee qiyaasaa ereygaan ereyga ku xigo, maaha iney sida bani-aadamka jumlad dhameestiran u akhrinayaan oo fahmayaan.

๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐˜€

Kor waxaan aad u isticmaalayay ereyga โ€œereyโ€, markaan sharxaayey LLMs; lkn sida saxda ah waxaa la dhahaa token.

Token waxaa waaye: markee LLM-yadu qiyaasta sameynayaan, qoraalka waxee u jajibiyaan qaybo yaryar oo la dhaho tokens. Token mararka qaarkood waa ka weynaan karaa erey, mararka qaar ka yaraan karaa.

Hadaa rabtid inaad aragtid ChatGPT tokenization suu u sameeyo, ka fiiri:

Halkaas waxaad ku arkeysaa qoraal kasta in tokens loo kala jabinayo.