Модуль:data consistency check
Бул модул Wiktionary'те колдонулган тилдин, тилдердин үй-бүлөсүнүн жана скрипт маалыматтарынын жарактуулугун жана ички ырааттуулугун текшерет: Категория:Тил маалымат модулдары дагы модулдар, ошондой эле Модуль:scripts/data.
Чыгуу
[түзөтүү]Module:etymology languages/data
- Башкарди тили (
bsg-bas
) уникалдуу эмес канондук аталышы бар; ал код менен да колдонулатbsg
. - Рудбари тили (
rdb-rud
) уникалдуу эмес канондук аталышы бар; ал код менен да колдонулатrdb
. - Чали тили (
tks-cal
) уникалдуу эмес канондук аталышы бар; ал код менен да колдонулатtgf
.
- Орто ирандык үй-бүлөдө (
ira-mid
) балалуу үй-бүлөлөр же тилдер жок. - Эски ирандык үй-бүлөдө (
ira-old
) балалуу үй-бүлөлөр же тилдер жок.
- Блиссимволы скрипти (
Blis
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Кипро-миной скрипти (
Cpmn
) эч бир тилде колдонулбайт. - Иератикалык скрипт (
Egyh
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Элимейдик скрипт (
Elym
) эч бир тилде колдонулбайт. - Хирагана скрипти (
Hira
) эч бир тилде колдонулбайт. - Ньякенг Пуачэ Хмонг скрипти (
Hmnp
) эч бир тилде колдонулбайт. - Кана скрипти (
Hrkt
) эч бир тилде колдонулбайт. - Сүрөт менен көрсөтүлгөн скрипт (
Imag
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Эл аралык фонетикалык алфавит скрипти (
Ipach
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Kpelle скрипти (
Kpel
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Лома скрипти (
Loma
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Ай скрипти (
Ай
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Морзе коду (
Морзе
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Музыкалык нота скрипти (
Music
) эч бир тилде колдонулбайт. - Наг Мундари скрипти (
Nagm
) эч бир тилде колдонулбайт. - Белгисиз скрипт (
Эч ким
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Ронгоронго скрипти (
Roro
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Руми цифраларынын скрипти (
Rumin
) эч бир тилде колдонулбайт. - желек семафору (
Semap
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Көрүнүүчү сүйлөө скрипти (
Visp
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Виткуки скрипти (
Vith
) эч бир тилде колдонулбайт. - Волеай скрипти (
Wole
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Йезиди жазуусу (
Yezi
) эч бир тилде колдонулбайт. - математикалык белгилер скрипти (
Zmth
) эч бир тилде колдонулбайт. - символдук скрипт (
Zsym
) эч бир тилде колдонулбайт. - аныкталбаган скрипт (
Zyyy
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - коддолбогон скрипт (
Zzzz
) эч бир тилде колдонулбайт жана автоматтык түрдө аныктоо үчүн тизмеленген символдор жок. - Жапон скриптинин (
Jpan
) sort_by_scraping маалымат ачкычы жараксыз.
Текшерүүлөр аткарылды
[түзөтүү]Бир нече маалымат модулдары үчүн:
- Тилдердин, үй-бүлөлөрдүн жана этимологияга гана тиешелүү тилдердин коддору уникалдуу болушу керек жана бири-бирине карама-каршы келбеши керек.
- Тилдердин, үй-бүлөлөрдүн жана этимологияга гана тиешелүү тилдердин канондук аталыштары башка аттардын тизмесинен табылбашы керек.
- Башка ысымдардын тизмесиндеги ар бир ысым бир гана жолу көрсөтүлүшү керек.
otherNames
, эгерде бар болсо, массив болушу керек.- Wikidata пунктунун ID'лери оң бүтүн сан же
Q
менен башталып, ондук цифралар менен аяктаган сап болушу керек.
Төмөнкүлөр Module:languages колдонгон маалыматтар туура болушу керек:
- Ар бир код эки тамгалуу, үч тамгалуу же өзгөчө экендигине жараша туура подмодулда аныкталышы керек.
- Канондук аталыш (талаа
1
) болушу керек жана башка тилдин канондук аталышы менен бирдей болбошу керек. - Эгерде
2
талаасыnil
болбосо, ал жарактуу Wikidata пунктунун ID болушу керек. - Эгерде
3
жеfamily
талаасы берилип,nil
эмес, ал жарактуу үй-бүлөлүк код болушу керек. - Эгерде
4
жескрипт
талаасы берилип,nil
эмес, ал массив болушу керек жана массивдеги ар бир сап жарактуу скрипт болушу керек код. - Эгерде
ancestors
берилсе, ал массив болушу керек жана массивдеги ар бир сап жарактуу тил же этимология тилинин коду болушу керек. - Эгер
family
берилсе, ал жарактуу үй-бүлөлүк код болушу керек. - Эгерде
type
берилсе, ал таанылган маанилердин бири болушу керек (regular
,reconstructed
,appendix-constructed
). - Эгерде
entry_name
берилсе, ал эки массивди (from
жанаto
) же сапты (remove_diacritics
) камтыган таблица болушу керек) же экөө тең. - Эгерде
sort_key
берилсе, ал сап болушу мүмкүн, же өз кезегинде эки массивди камтыган таблицада болушу мүмкүн (from
жанаto
) же сап (remove_diacritics
). - Эгерде
entry_name
жеsort_key
берилсе,from
массивинин узундугуto
массивине узунураак же бирдей болушу керек. - Эгер
standardChars
берилсе, анда анын алдында^
(lua
) менен төрт бурчтуу кашаалардын арасына коюлганда жарактуу Lua сап үлгүсүн түзүшү керек). - Эгер
override_translit
коюлса,translit
да коюлушу керек, анткени кол транслитерациясын жокко чыгара турган транслитерация модулу болушу керек. - Эгерде
link_tr
бар болсо, алtrue
болушу керек. - Төмөнкүлөрдөн башка маалымат ачкычтары жок:
lua
.
Текшерүүлөр аткарылган жок:
- Эгерде
translit
бар болсо, анда ал модулдун аталышы болушу керек жана бул модулда беттин атын (жана кошумча тил кодун жана скрипт кодун) алганtr
функциясы болушу керек.) аргументтер катары. - Эгерде
sort_key
сап болсо, ал модулдун аталышы болушу керек жана бул модулда беттин атын (жана кошумча тил коду жана скрипт) алганmakeSortKey
функциясы камтылууга тийиш. код) аргумент катары. - Эгерде
entry_name
жеsort_key
таблица болсо жанаremove_diacritics
талаасын камтыса, талаанын мааниси жарактуу Lua үлгүсүн түзгөн сап болушу керек ал жокко чыгарылган топтом белгилеринин ичине жайгаштырылганда ([^...]
).
Булар бул жерде текшерилбейт, анткени Module:utilities каралып жаткан тилге тиешелүү категория үчүн сорттоо ачкычын түзүүгө аракет кылат деп ойлосок, бул шарттар аткарылбаса, модулдук каталар жазууларда тез эле пайда болот, же lua
транслитерация модулун колдонууга аракет кылат.
Module:languages/code to canonical name жана Module:languages/canonical names Module:languages маалымат субмодулдарында табылган бардык коддорду жана канондук аталыштарды камтышы керек жана андан ашык эмес.
Module:etymology languages тарабынан колдонулган маалыматтарга карата төмөнкүлөр туура болушу керек:
canonicalName
берилиши керек.parent
жарактуу тил, үй-бүлө же этимологияга гана тиешелүү тил коду болушу керек.- Эгерде
ancestors
берилсе, ал массив болушу керек жана массивдеги ар бир сап жарактуу тил же этимология тилинин коду болушу керек. Этимология тили да регулярдуу тилдин түпкү атасы катары көрсөтүлүшү керек. - Булардан башка маалымат ачкычтары жок:
lua
.
Module:families дайындарындагы коддор:
canonicalName
бар, ал башка үй-бүлөнүн канондук аты менен бирдей болбошу керек.- Эгер
family
берилсе, ал жарактуу үй-бүлөлүк код болушу керек. - Жок дегенде бир тилге же ага таандык субфамилияга ээ болуңуз.
- Булардан башка маалымат ачкычтары жок:
lua
.
Module:scripts маалыматтарындагы коддор:
canonicalName
бар.- Аны скрипттеринин бири катары көрсөткөн жок дегенде бир тилге ээ болуңуз.
- Скриптти автоматтык түрдө аныктоо үчүн
characters
үлгүсүнө ээ болуңуз жана бул төрт бурчтуу кашаалардын арасына жайгаштырылганда жарактуу Lua сап үлгүсүн түзүшү керек (lua
). (Ал скрипттеги бардык символдорго дал келиши керек, бирок аны текшерүү мүмкүн эмес.) - Төмөнкүлөрдөн башка маалымат ачкычтары жок:
lua
.
-- TODO:
-- ietf_subtag field used with a 2/3-letter langauge/family code except qaa-qtz, or a 4-letter script code.
-- Check against files containing up-to-date ISO data, to cross-check validity.
local m_languages = require("Module:languages")
local m_language_data = require("Module:languages/data/all")
local m_language_codes = require("Module:languages/code to canonical name")
local m_language_canonical_names = require("Module:languages/canonical names")
local m_etym_language_data = require("Module:etymology languages/data")
local m_etym_language_codes = require("Module:etymology languages/code to canonical name")
local m_etym_language_canonical_names = require("Module:etymology languages/canonical names")
local m_family_data = require("Module:families/data")
local m_family_codes = require("Module:families/code to canonical name")
local m_family_canonical_names = require("Module:families/canonical names")
local m_scripts = require("Module:scripts")
local m_script_data = require("Module:scripts/data")
local m_links = require("Module:links")
local m_script_utils = require("Module:script utilities")
local m_str_utils = require("Module:string utilities")
local m_table = require("Module:table")
local Array = require("Module:array")
local codepoint = m_str_utils.codepoint
local concat = table.concat
local dump = mw.dumpObject
local gcodepoint = m_str_utils.gcodepoint
local get_lang = m_languages.getByCode
local insert = table.insert
local list_to_text = mw.text.listToText
local new_title = mw.title.new
local split = m_str_utils.split
local ugmatch = m_str_utils.gmatch
local umatch = m_str_utils.match
local export = {}
local messages
local function discrepancy(modname, ...)
local ok, result = pcall(function(...) messages[modname]:insert(string.format(...)) end, ...)
if not ok then
mw.log(result, ...)
end
end
local all_codes = {}
local language_names = {}
local etym_language_names = {}
local family_names = {}
local script_names = {}
local nonempty_families = {}
local allowed_empty_families = {tbq = true}
local nonempty_scripts = {}
do
local function link_lang(name)
if name:find("[Ll]anguage$") then
return "[[:Category:" .. name .. "|" .. name .. "]]"
else
return "[[:Category:" .. name .. " language|" .. name .. " language]]"
end
end
local function link_etym_lang(name)
if name:find("[Ll]anguage$") then
return name
else
return name .. " language"
end
end
local function link_family(name)
if name:match("[Ll]anguages$") or name:match("[Ll]ects$") then
return "[[:Category:" .. name .. "|" .. name .. " family]]"
else
return "[[:Category:" .. name .. " languages|" .. name .. " family]]"
end
end
function export.link(data)
if not data[1] then
return "???"
end
local type = data.type
return type:match("etymology%-only") and link_etym_lang(data[1]) or
type:match("family") and link_family(data[1]) or
link_lang(data[1])
end
end
local link = export.link
local function link_script(name)
if not name then
return "???"
elseif name:find("[Cc]ode$") or name:find("[Ss]emaphore$") then
return "[[:Category:" .. name:gsub("^%l", string.upper) .. "|" .. name .. "]]"
else
return "[[:Category:" .. name .. " script|" .. name .. " script]]"
end
end
local function invalid_keys_message(modname, code, data, invalid_keys, is_script)
local plural = #invalid_keys ~= 1
discrepancy(modname, "The data key%s %s for %s (<code>%s</code>) %s invalid.",
plural and "s" or "",
invalid_keys
:map(
function(key)
return "<code>" .. key .. "</code>"
end)
:concat(", "),
(is_script and link_script or link)(data[1]),
code,
plural and "are" or "is")
end
local function check_data_keys(valid_keys, is_script)
valid_keys = Array(valid_keys):to_set()
return function (modname, code, data)
local invalid_keys
for k in pairs(data) do
if not valid_keys[k] then
invalid_keys = invalid_keys or Array()
invalid_keys:insert(k)
end
end
if invalid_keys then
invalid_keys_message(modname, code, data, invalid_keys, is_script)
end
end
end
-- Modification of isArray in [[Module:table]].
-- This assumes all keys are either integers or non-numbers.
-- If there are fractional numbers, the results might be incorrect.
-- For instance, find_gap{"a", "b", [0.5] = true} evaluates to 3, but there
-- isn't a gap at 3 in the sense of there being an integer key greater than 3.
local function find_gap(t, can_contain_non_number_keys)
local i = 0
for k in pairs(t) do
if not (can_contain_non_number_keys and type(k) ~= "number") then
i = i + 1
if t[i] == nil then
return i
end
end
end
end
local function check_true_or_string_or_nil(modname, code, data, field_name)
local field = data[field_name]
if not (field == nil or field == true or type(field) == "string") then
discrepancy(modname,
"%s (<code>%s</code>) has an <code>%s</code> value that is not <code>nil</code>, <code>true</code> or a string: <code>%s</code>",
link(data), code, field_name,
dump(data[field_name])
)
end
end
local function check_array(modname, code, canonical_name, data, array_name, subarray_name, can_contain_non_number_keys)
local subtable = data
if subarray_name then
subtable = assert(data[subarray_name], subarray_name)
end
local array_type = type(subtable[array_name])
if array_type == "table" then
local gap = find_gap(subtable[array_name], can_contain_non_number_keys)
if gap then
discrepancy(modname, "The %s array in %sthe data table for %s (<code>%s</code>) has a gap at index %d.",
array_name,
subarray_name and "the " .. subarray_name .. " field in " or "",
canonical_name,
code, gap)
else
return true
end
else
discrepancy(modname, "The %s field in %sthe data table for %s (<code>%s</code>) should be an array (table) but is %s.",
array_name,
subarray_name and "the " .. subarray_name .. " field in " or "",
canonical_name,
code,
array_type == "nil" and "nil" or "a " .. array_type)
end
end
local function check_no_alias_codes(modname, mod_data)
local lookup, discrepancies = {}, {}
for k, v in pairs(mod_data) do
local check = lookup[v]
if check then
discrepancies[check] = discrepancies[check] or {"<code>" .. check .. "</code>"}
insert(discrepancies[check], "<code>" .. k .. "</code>")
else
lookup[v] = k
end
end
for _, v in pairs(discrepancies) do
discrepancy(modname, "The codes " .. list_to_text(v, ", ", " and ") .. " are currently alias codes. Only one code should be used in the data.")
end
end
local function check_wikidata_item(modname, code, data, key)
local data_item = data[key]
if data_item == nil then
return
elseif type(data_item) == "number" then
if not require "Module:table".isPositiveInteger(data_item) then
discrepancy(modname, "%g, the Wikidata item id for %s (<code>%s</code>), is not a positive integer or a string in the correct format.",
data_item, data[1], code)
end
elseif type(data_item) == "string" then
if not data_item:find "^Q%d+$" then
discrepancy(modname, "%s, the Wikidata item id for %s (<code>%s</code>), is not a string in the correct format or a positive integer.",
data_item, data[1], code)
end
end
end
local function check_other_names_or_aliases(modname, code, canonical_name, data, data_key, allow_nested)
local array = data[data_key]
if not array then
return
end
check_array(modname, code, canonical_name, data, data_key, nil, true)
local names = {}
local function check_other_name(other_name)
if other_name == canonical_name then
discrepancy(modname,
"%s, the canonical name for <code>%s</code>, is repeated in the table of <code>%s</code>.",
canonical_name, code, data_key)
end
if names[other_name] then
discrepancy(modname,
"The name %s is found twice or more in the list of <code>%s</code> for %s (<code>%s</code>).",
other_name, data_key, canonical_name, code)
end
names[other_name] = true
end
for _, other_name in ipairs(array) do
if type(other_name) == "table" then
if not allow_nested then
discrepancy(modname,
"A nested table is found in the list of <code>%s</code> for %s (<code>%s</code>), but isn't allowed.",
data_key, canonical_name, code)
else
for _, on in ipairs(other_name) do
check_other_name(on)
end
end
else
check_other_name(other_name)
end
end
end
local function check_other_names_aliases_varieties(modname, code, canonical_name, data)
if data.otherNames then
check_other_names_or_aliases(modname, code, canonical_name, data, "otherNames")
end
if data.aliases then
check_other_names_or_aliases(modname, code, canonical_name, data, "aliases")
end
if data.varieties then
check_other_names_or_aliases(modname, code, canonical_name, data, "varieties", true)
end
end
local function validate_pattern(pattern, modname, code, data, standardChars)
if type(pattern) ~= "string" then
discrepancy(modname, "\"%s\", the %spattern for %s (<code>%s</code>), is not a string.",
pattern, standardChars and "standard character " or "", code, data[1])
end
local ranges
for lower, higher in ugmatch(pattern, "(.)%-%%?(.)") do
if codepoint(lower) >= codepoint(higher) then
ranges = ranges or Array()
insert(ranges, { lower, higher })
end
end
if ranges and ranges[1] then
local plural = #ranges ~= 1 and "s" or ""
discrepancy(modname, "%s (<code>%s</code>) specifies an invalid pattern " ..
"for %scharacter detection: <code>\"%s\"</code>. The first codepoint%s " ..
"in the range%s %s %s must be less than the second.",
link(data), code, standardChars and "standard " or "", pattern, plural, plural,
ranges
:map(
function(range)
return range[1] .. "-" .. range[2] .. (" (U+%X, U+%X)")
:format(codepoint(range[1]), codepoint(range[2]))
end)
:concat(", "),
#ranges ~= 1 and "are" or "is")
end
if not pcall(umatch, "", "[" .. pattern .. "]") then
discrepancy(modname, "%s (<code>%s</code>) specifies an invalid pattern for " ..
(standardChars and "standard" or "") .. " character detection: <code>\"%s\"</code>",
link(data), code, pattern)
end
end
local remove_exceptions_addition = 0xF0000
local maximum_code_point = 0x10FFFF
local remove_exceptions_maximum_code_point = maximum_code_point - remove_exceptions_addition
local function check_entry_name_or_sortkey(modname, code, data, replacements_name)
local canonical_name = data[1]
local replacements = data[replacements_name]
if type(replacements) == "string" then
if not (replacements_name == "sort_key" or replacements_name == "entry_name") then
discrepancy(modname, "The %s field in the data table for %s (<code>%s</code>) must be a table.",
replacements_name, canonical_name, code)
end
return
end
if (replacements.from ~= nil) ~= (replacements.to ~= nil) then
discrepancy(modname,
"The <code>from</code> and <code>to</code> arrays in the <code>%s</code> table for %s (<code>%s</code>) are not both defined or both undefined.",
replacements_name, canonical_name, code)
elseif replacements.from then
for _, key in ipairs { "from", "to" } do
check_array(modname, code, canonical_name, data, key, replacements_name)
end
end
if replacements.remove_diacritics and type(replacements.remove_diacritics) ~= "string" then
discrepancy(modname,
"The <code>remove_diacritics</code> field in the <code>%s</code> table for %s (<code>%s</code>) table must be a string.",
replacements_name, canonical_name, code)
end
if replacements.remove_exceptions then
if check_array(modname, code, canonical_name, data, "remove_exceptions", replacements_name) then
for sequence_i, sequence in ipairs(replacements.remove_exceptions) do
local code_point_i = 0
for code_point in gcodepoint(sequence) do
code_point_i = code_point_i + 1
if code_point > remove_exceptions_maximum_code_point then
discrepancy(modname,
"Code point #%d (0x%04X) in field #%d of the <code>remove_exceptions</code> array for %s (<code>%s</code>) is over U+%04X.",
code_point_i, code_point, sequence_i, canonical_name, code, remove_exceptions_maximum_code_point)
end
end
end
end
end
if replacements.from and replacements.to
and m_table.length(replacements.to) > m_table.length(replacements.from) then
discrepancy(modname,
"The <code>from</code> array in the <code>%s</code> table for %s (<code>%s</code>) must be shorter or the same length as the <code>to</code> array.",
replacements_name, canonical_name, code)
end
end
do
local function has_ancestor(lang, code)
for _, anc in ipairs(lang:getAncestors()) do
if code == anc:getCode() or has_ancestor(anc, code) then
return true
end
end
end
local function get_default_ancestors(lang)
if lang:hasType("etymology-only") then
local parent = lang:getParent()
if not has_ancestor(parent, lang:getCode()) then
return parent:getAncestorCodes()
end
end
local fam_code, def_anc = lang:getFamilyCode()
while fam_code and fam_code ~= "qfa-not" do
local fam = m_family_data[fam_code]
def_anc = fam.protoLanguage or
m_language_data[fam_code .. "-pro"] and fam_code .. "-pro" or
m_etym_language_data[fam_code .. "-pro"] and fam_code .. "-pro"
if def_anc and def_anc ~= lang:getCode() then
return {def_anc}
end
fam_code = fam[3]
end
end
local function iterate_ancestor(code, data, modname, anc_code, lang)
local anc = get_lang(anc_code, nil, true)
if not anc then
discrepancy(modname,
"%s (<code>%s</code>) lists the invalid language code <code>%s</code> as its ancestor.",
link(data), code, anc_code)
return
end
local anc_fam = anc:getFamily()
if not anc_fam then
discrepancy(modname,
"%s has no family.",
anc_code)
return
end
local anc_fam_code = anc_fam:getCode()
local def_ancs = get_default_ancestors(lang)
if def_ancs then
for _, def_anc in ipairs(def_ancs) do
def_anc = get_lang(def_anc, nil, true)
if def_anc and (
anc_code == def_anc:getCode() or
has_ancestor(def_anc, anc_code) or
def_anc:hasParent(anc_code) and not has_ancestor(anc, def_anc:getCode())
) then
discrepancy(modname,
"%s (<code>%s</code>) has the %s (<code>%s</code>) listed in its ancestor field, which is redundant, since it is determined to be ancestral automatically.",
link(data), code,
link(anc:getRawData()), anc_code)
end
end
end
if not lang:inFamily(anc_fam_code) then
discrepancy(modname,
"%s (<code>%s</code>) has %s (<code>%s</code>) set as an ancestor, but is not in the %s (<code>%s</code>).",
link(data), code,
link(anc:getRawData()), anc_code,
link(anc_fam:getRawData()), anc_fam_code)
end
local fam, proto = lang
repeat
fam = fam:getFamily()
proto = fam and fam:getProtoLanguage()
until proto or not fam or fam:getCode() == "qfa-not"
if proto and not (
proto:getCode() == anc:getCode() or
proto:hasAncestor(anc:getCode()) or
anc:hasAncestor(proto:getCode())
) then
local fam = lang:getFamily()
discrepancy(modname,
"%s (<code>%s</code>) is in the %s (<code>%s</code>) and has %s (<code>%s</code>) set as an ancestor, but it is not possible to form an ancestral chain between them.",
link(data), code,
link(fam:getRawData()), fam:getCode(),
link(anc:getRawData()), anc_code)
end
end
function export.check_ancestors(code, data, modname)
local ancestors = data.ancestors
if not ancestors then
return
elseif type(ancestors) == "string" then
ancestors = split(ancestors, "%s*,%s*", true)
end
local lang = get_lang(code, nil, true)
for _, anc in ipairs(ancestors) do
iterate_ancestor(code, data, modname, anc, lang)
end
end
end
local function check_code_to_name_and_name_to_code_maps(
source_module_type,
source_module_description,
code_to_module_map, name_to_code_map,
code_to_name_modname, code_to_name_module,
name_to_code_modname, name_to_code_module)
local aliases = require("Module:languages/data").aliases
local function check_code_and_name(modname, code, canonical_name)
-- Check the code is in code_to_module_map and that it didn't originate from the wrong data module.
local check_mod = code_to_module_map[code] or code_to_module_map[aliases[code]]
if not (check_mod and check_mod:match("^" .. source_module_type .. "/data")) then
if not name_to_code_map[canonical_name] then
discrepancy(modname,
"The code <code>%s</code> and the canonical name %s should be removed; they are not found in %s.",
code, canonical_name, source_module_description)
else
discrepancy(modname,
"<code>%s</code>, the code for the canonical name %s, is wrong; it should be <code>%s</code>.",
code, canonical_name, name_to_code_map[canonical_name])
end
elseif not name_to_code_map[canonical_name] then
local data_table = require("Module:" .. code_to_module_map[code])[code]
discrepancy(modname,
"%s, the canonical name for the code <code>%s</code>, is wrong; it should be %s.",
canonical_name, code, data_table[1])
end
end
for code, canonical_name in pairs(code_to_name_module) do
check_code_and_name(code_to_name_modname, code, canonical_name)
end
for canonical_name, code in pairs(name_to_code_module) do
check_code_and_name(name_to_code_modname, code, canonical_name)
end
end
local function check_extraneous_extra_data(
data_modname, data_module, extra_data_modname, extra_data_module)
for code, _ in pairs(extra_data_module) do
if not data_module[code] then
discrepancy(extra_data_modname,
"Language code <code>%s</code> is not found in [[Module:%s]], and should be removed from [[Module:%s]].",
code, data_modname, extra_data_modname
)
end
end
end
-- Just trying to not have a module error when someone puts a script code
-- in the position of a language code.
local function show_family_code(code)
if type(code) == "string" then
return "<code>" .. code .. "</code>"
else
return require("Module:debug").highlight_dump(code)
end
end
local function check_languages()
local check_language_data_keys = check_data_keys{
1, 2, 3, 4, -- canonical name, wikidata item, family, scripts
"display_text", "generate_forms", "entry_name", "sort_key",
"otherNames", "aliases", "varieties", "ietf_subtag",
"type", "ancestors",
"wikimedia_codes", "wikipedia_article", "standardChars",
"translit", "override_translit", "link_tr",
"dotted_dotless_i"
}
local function check_language(modname, code, data, mainData, extraData)
local canonical_name, lang_type = data[1], data.type
check_language_data_keys(modname, code, data)
if all_codes[code] then
discrepancy(modname, "Code <code>%s</code> is not unique; it is also defined in [[Module:%s]].", code, all_codes[code])
else
if not m_language_codes[code] then
discrepancy("languages/code to canonical name", "The code <code>%s</code> (%s) is missing.", code, canonical_name)
end
all_codes[code] = modname
end
if code:sub(-4) == "-pro" then
local fam_code = code:sub(1, -5)
local fam = get_lang(fam_code, nil, true, true)
if not fam then
discrepancy(modname,
"%s (<code>%s</code>) has a proto-language code associated with the invalid code <code>%s</code>.",
link(data), code, fam_code)
elseif not fam:hasType("family") then
discrepancy(modname,
"%s (<code>%s</code>) has a proto-language code associated with %s (<code>%s</code>), which is not a family.",
link(data), code, fam:getCanonicalName(), fam_code)
else
local expected_name = "Proto-" .. fam:getCanonicalName()
if canonical_name ~= expected_name then
discrepancy(modname,
"%s (<code>%s</code>) does not have the expected name \"%s\", even though it is the proto-language of the %s (<code>%s</code>).",
link(data), code, expected_name, fam:getCategoryName(), fam_code)
end
end
end
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif language_names[canonical_name] then
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(data), code, language_names[canonical_name])
else
if not m_language_canonical_names[canonical_name] then
discrepancy("languages/canonical names", "The canonical name %s (<code>%s</code>) is missing.", canonical_name, code)
end
language_names[canonical_name] = code
end
check_wikidata_item(modname, code, data, 2)
if extraData then
check_other_names_aliases_varieties(modname, code, canonical_name, extraData)
end
if lang_type and not (lang_type == "regular" or lang_type == "reconstructed" or lang_type == "appendix-constructed") then
discrepancy(modname, "%s (<code>%s</code>) is of an invalid type <code>%s</code>.", link(data), code, data.type)
end
if mainData.aliases then
discrepancy(modname, "%s (<code>%s</code>) has the <code>aliases</code> key. This must be moved to [[Module:" .. modname .. "/extra]].", link(data), code)
end
if mainData.varieties then
discrepancy(modname, "%s (<code>%s</code>) has the <code>varieties</code> key. This must be moved to [[Module:" .. modname .. "/extra]].", link(data), code)
end
if mainData.otherNames then
discrepancy(modname, "%s (<code>%s</code>) has the <code>otherNames</code> key. This must be moved to [[Module:" .. modname .. "/extra]].", link(data), code)
end
if not extraData then
discrepancy(modname .. "/extra", "%s (<code>%s</code>) has data in [[Module:" .. modname .. "]], but does not have corresponding data in [[Module:" .. modname .. "/extra]].", link(data), code)
--elseif extraData.otherNames then
-- discrepancy(modname .. "/extra", "%s (<code>%s</code>) has <code>otherNames</code> key, but these should be changed to either <code>aliases</code> or <code>varieties</code>.", link(data), code)
end
local sc = data[4]
if sc then
if type(sc) == "string" then
sc = split(sc, "%s*,%s*", true)
end
if type(sc) == "table" then
if not sc[1] then
discrepancy(modname, "%s (<code>%s</code>) has no scripts listed.", link(data), code)
else
for _, sccode in ipairs(sc) do
local cur_sc = m_script_data[sccode]
if not (cur_sc or sccode == "All" or sccode == "Hants") then
discrepancy(modname,
"%s (<code>%s</code>) lists the invalid script code <code>%s</code>.",
link(data), code, sccode)
-- elseif not cur_sc.characters then
-- discrepancy(modname,
-- "%s (<code>%s</code>) lists a script without characters <code>%s</code> (%s).",
-- link(data), code, sccode, cur_sc[1])
end
nonempty_scripts[sccode] = true
end
end
else
discrepancy(modname,
"The %s field for %s (<code>%s</code>) must be a table or string.",
4, link(data), code)
end
end
if data.ancestors then
export.check_ancestors(code, data, modname)
end
if data[3] then
local family = data[3]
if not m_family_data[family] then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid family code %s.",
link(data), code, show_family_code(family))
end
nonempty_families[family] = true
end
if data.sort_key then
check_entry_name_or_sortkey(modname, code, data, "sort_key")
end
if data.entry_name then
check_entry_name_or_sortkey(modname, code, data, "entry_name")
end
if data.display then
check_entry_name_or_sortkey(modname, code, data, "display")
end
if data.standardChars then
if type(data.standardChars) == "table" then
local sccodes = {}
for _, sccode in ipairs(sc) do
sccodes[sccode] = true
end
for sccode in pairs(data.standardChars) do
if not (sccodes[sccode] or sccode == 1) then
discrepancy(modname, "The field %s in the standardChars table for %s (<code>%s</code>) does not match any script for that language.",
sccode, link(data), code)
end
end
elseif data.standardChars and type(data.standardChars) ~= "string" then
discrepancy(modname, "The standardChars field in the data table for %s (<code>%s</code>) must be a string or table.",
link(data), code)
end
end
check_true_or_string_or_nil(modname, code, data, "override_translit")
check_true_or_string_or_nil(modname, code, data, "link_tr")
if data.override_translit and not data.translit then
discrepancy(modname,
"%s (<code>%s</code>) has <code>override_translit</code> set, but no transliteration module",
link(data), code)
end
end
local function check_module(modname, test)
local mod_data = mw.loadData("Module:" .. modname)
local extra_modname = modname .. "/extra"
local extra_mod_data = mw.loadData("Module:" .. extra_modname)
for code, data in pairs(mod_data) do
test(modname, code, data)
check_language(modname, code, data, mod_data[code], extra_mod_data[code])
end
check_no_alias_codes(modname, mod_data)
check_no_alias_codes(extra_modname, extra_mod_data)
check_extraneous_extra_data(modname, mod_data, extra_modname, extra_mod_data)
end
-- Check two-letter codes
check_module(
"languages/data/2",
function(modname, code, data)
if not code:find("^[a-z][a-z]$") then
discrepancy(modname, "%s (<code>%s</code>) does not have a two-letter code.", link(data), code)
end
end
)
-- Check three-letter codes
for i = 0x61, 0x7A do -- a to z
local letter = string.char(i)
check_module(
"languages/data/3/" .. letter,
function(modname, code, data)
if not code:find("^" .. letter .. "[a-z][a-z]$") then
discrepancy(modname,
"%s (<code>%s</code>) does not have a three-letter code starting with \"<code>%s</code>\".",
link(data), code, letter)
end
end
)
end
-- Check exceptional codes
check_module(
"languages/data/exceptional",
function(modname, code, data)
if code:find("^[a-z][a-z][a-z]?$") then
discrepancy(modname, "%s (<code>%s</code>) has a two- or three-letter code.", link(data), code)
end
end
)
-- These checks must be done while all_codes only contains language codes:
-- that is, after language data modules have been processed, but before
-- etymology languages, families, and scripts have.
check_code_to_name_and_name_to_code_maps(
"languages",
"a submodule of [[Module:languages]]",
all_codes, language_names,
"languages/code to canonical name", m_language_codes,
"languages/canonical names", m_language_canonical_names
)
-- Check [[Template:langname-lite]]
local frame = mw.getCurrentFrame()
local content = new_title("Template:langname-lite"):getContent()
content = content:gsub("%<%!%-%-.-%-%-%>", "") -- remove comments
local match = ugmatch(content, "\n\t*|#*([^\n]+)=([^\n]*)")
while true do
local code, name = match()
if not code then return "OK" end
if code:len() > 1 and code ~= "default" then
for _, code in pairs(split(code, "|", true)) do
local lang = get_lang(code, nil, true, true)
if name:match("etymcode") then
local nonEtym_name = frame:preprocess(name)
local nonEtym_real_name = lang:getFullName()
if nonEtym_name ~= nonEtym_real_name then
discrepancy("Template:langname-lite", "Code: <code>" .. code .. "</code>. Saw name: " .. nonEtym_name .. ". Expected name: " .. nonEtym_real_name .. ".")
end
name = frame:preprocess(name:gsub("{{{allow etym|}}}", "1"))
elseif name:match("familycode") then
name = name:match("familycode|(.-)|")
else
name = name
end
if not lang then
discrepancy("Template:langname-lite", "Code: <code>" .. code .. "</code>. Saw name: " .. name .. ". Language not present in data.")
else
local real_name = lang:getCanonicalName()
if name ~= real_name then
discrepancy("Template:langname-lite", "Code: <code>" .. code .. "</code>. Saw name: " .. name .. ". Expected name: " .. real_name .. ".")
end
end
end
end
end
end
local function check_etym_languages()
local modname = "etymology languages/data"
local check_etymology_language_data_keys = check_data_keys{
1, 2, 3, 4, 5, -- canonical name, wikidata item, family, scripts, parent
"display_text", "generate_forms", "entry_name", "sort_key",
"otherNames", "aliases", "varieties", "ietf_subtag",
"type", "main_code", "ancestors",
"wikimedia_codes", "wikipedia_article", "standardChars",
"translit", "override_translit", "link_tr",
"dotted_dotless_i"
}
for code, data in pairs(m_etym_language_data) do
local canonical_name, parent =
data[1], data[5]
check_etymology_language_data_keys(modname, code, data)
if all_codes[code] then
discrepancy(modname, "Code <code>%s</code> is not unique; it is also defined in [[Module:%s]].", code, all_codes[code])
else
if not m_etym_language_codes[code] then
discrepancy("etymology languages/code to canonical name", "The code <code>%s</code> (%s) is missing.", code, canonical_name)
end
all_codes[code] = modname
end
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif language_names[canonical_name] then
local m_canonical_lang = m_languages.getByCanonicalName(canonical_name, nil, true)
if not m_canonical_lang then
discrepancy(modname, "%s (<code>%s</code>) has a canonical name that cannot be looked up.",
link(data), code)
elseif data.main_code ~= m_canonical_lang:getCode() then
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(data), code, language_names[canonical_name])
end
else
if not m_etym_language_canonical_names[canonical_name] then
discrepancy("etymology languages/canonical names", "The canonical name %s (<code>%s</code>) is missing.", canonical_name, code)
end
etym_language_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, code, canonical_name, data)
if parent then
if type(parent) ~= "string" then
discrepancy(modname,
"Etymology-only %s (<code>%s</code>) has a parent language or family code that is %s rather than a string.",
link(data), code, parent == nil and "nil" or "a " .. type(parent))
elseif not (m_language_data[parent] or m_family_data[parent] or m_etym_language_data[parent]) then
discrepancy(modname,
"Etymology-only %s (<code>%s</code>) has invalid parent language or family code <code>%s</code>.",
link(data), code, parent)
end
nonempty_families[parent] = true
else
discrepancy(modname,
"Etymology-only %s (<code>%s</code>) has no parent language or family code.",
link(data), code)
end
if data.ancestors then
export.check_ancestors(code, data, modname)
end
if data[3] then
local family = data[3]
if not m_family_data[family] then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid family code %s.",
link(data), code, show_family_code(family))
end
nonempty_families[family] = true
end
check_wikidata_item(modname, code, data, 2)
end
local checked = {}
for code, data in pairs(m_etym_language_data) do
local stack = {}
while data do
if checked[data] then
break
end
if stack[data] then
discrepancy(modname, "%s (<code>%s</code>) has a cyclic parental relationship to %s (<code>%s</code>)",
link(data), code,
link(m_etym_language_data[data[5]]), data.parent or data[5]
)
break
end
stack[data] = true
code, data = data[5], data[5] and m_etym_language_data[data[5]]
end
for data in pairs(stack) do
checked[data] = true
end
end
check_no_alias_codes(modname, m_etym_language_data)
check_code_to_name_and_name_to_code_maps(
"etymology languages",
"[[Module:etymology languages/data]]",
all_codes, etym_language_names,
"etymology languages/code to canonical name", m_etym_language_codes,
"etymology languages/canonical names", m_etym_language_canonical_names)
end
local function check_families()
local modname = "families/data"
local check_family_data_keys = check_data_keys{
1, 2, 3, -- canonical name, wikidata item, (parent) family
"type", "ietf_subtag",
"protoLanguage", "otherNames", "aliases", "varieties",
}
for code, data in pairs(m_family_data) do
check_family_data_keys(modname, code, data)
local canonical_name, family, protolang = data[1], data[3], data.protoLanguage
if all_codes[code] then
discrepancy(modname, "Code <code>%s</code> is not unique; it is also defined in [[Module:%s]].", code, all_codes[code])
else
if not m_family_codes[code] then
discrepancy("families/code to canonical name", "The code <code>%s</code> (%s) is missing.", code, canonical_name)
end
all_codes[code] = modname
end
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif family_names[canonical_name] then
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(data), code, family_names[canonical_name])
else
if not m_family_canonical_names[canonical_name] then
discrepancy("families/canonical names", "The canonical name %s (<code>%s</code>) is missing.", canonical_name, code)
end
family_names[canonical_name] = code
end
if data[2] and type(data[2]) ~= "number" then
discrepancy(modname, "%s (<code>%s</code>) has a wikidata item value that is not a number or <code>nil</code>: %s", link(data), code, dump(data[2]))
end
check_other_names_aliases_varieties(modname, code, canonical_name, data)
if family then
if family == code and code ~= "qfa-not" then
discrepancy(modname,
"%s (<code>%s</code>) has itself as its family.",
link(data), code)
elseif not m_family_data[family] then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid parent family code %s.",
link(data), code, show_family_code(family))
end
nonempty_families[family] = true
end
if protolang then
local protolang_obj = get_lang(protolang, nil, true)
if not protolang_obj then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid proto-language code <code>%s</code>.",
canonical_name, code, protolang)
elseif protolang == code .. "-pro" then
discrepancy(modname,
"%s (<code>%s</code>) has %s (<code>%s</code>) listed as its proto-language, which is redundant, since it is determined to be the proto-language automatically.",
canonical_name, code,
protolang_obj:getCanonicalName(), protolang)
elseif protolang:sub(-4) == "-pro" then
discrepancy(modname,
"%s (<code>%s</code>) has %s (<code>%s</code>) listed as its proto-language, which is supposed to be the proto-language for the family <code>%s</code>.",
canonical_name, code,
protolang_obj:getCanonicalName(), protolang, protolang:sub(1, -5))
end
end
check_wikidata_item(modname, code, data, 2)
end
for code, data in pairs(m_family_data) do
if not (nonempty_families[code] or allowed_empty_families[code]) then
discrepancy(modname, "%s (<code>%s</code>) has no child families or languages.", link(data), code)
end
end
local checked = { ["qfa-not"] = true }
for code, data in pairs(m_family_data) do
local stack = {}
while data do
if checked[code] then
break
end
if stack[code] then
discrepancy(modname, "%s (<code>%s</code>) has a cyclic parental relationship to %s (<code>%s</code>)",
link(data), code,
link(m_family_data[data[3]]), data[3]
)
break
end
stack[code] = true
code, data = data[3], m_family_data[data[3]]
end
for code in pairs(stack) do
checked[code] = true
end
end
check_no_alias_codes(modname, m_family_data)
check_code_to_name_and_name_to_code_maps(
"families",
"[[Module:families/data]]",
all_codes, family_names,
"families/code to canonical name", m_family_codes,
"families/canonical names", m_family_canonical_names)
end
local function check_scripts()
local modname = "scripts/data"
local check_script_data_keys = check_data_keys({
1, 2, -- canonical name, writing systems
"canonicalName", "otherNames", "aliases", "varieties", "parent", "ietf_subtag",
"wikipedia_article", "ranges", "characters", "spaces", "capitalized", "translit", "direction",
"character_category", "normalizationFixes"
}, true)
local m_script_codes = require("Module:scripts/code to canonical name")
local m_script_canonical_names = require("Module:scripts/by name")
-- Just to satisfy requirements of check_code_to_name_and_name_to_code_maps.
local script_code_to_module_map = {}
for code, data in pairs(m_script_data) do
local canonical_name = data[1]
if not m_script_codes[code] and #code == 4 then
discrepancy("scripts/code to canonical name", "<code>%s</code> (%s) is missing", code, canonical_name)
end
check_script_data_keys(modname, code, data)
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif script_names[canonical_name] then
--[=[
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link_script(data.names[1]), code, script_names[data.names[1]])
--]=]
else
if not m_script_canonical_names[canonical_name] and #code == 4 then
discrepancy("scripts/by name", "%s (<code>%s</code>) is missing", canonical_name, code)
end
script_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, code, canonical_name, data)
if not nonempty_scripts[code] then
discrepancy(modname,
"%s (<code>%s</code>) is not used by any language%s.",
link_script(canonical_name), code, data.characters and ""
or " and has no characters listed for auto-detection")
--[[
elseif not data.characters then
discrepancy(modname, "%s (<code>%s</code>) has no characters listed for auto-detection.", link_script(canonical_name), code)
--]]
end
if data.characters then
validate_pattern(data.characters, modname, code, data, false)
end
script_code_to_module_map[code] = modname
end
check_no_alias_codes(modname, m_script_data)
check_code_to_name_and_name_to_code_maps(
"scripts",
"a submodule of [[Module:scripts]]",
script_code_to_module_map, script_names,
"scripts/code to canonical name", m_script_codes,
"scripts/by name", m_script_canonical_names)
end
-- FIXME: this is quite messy.
local function check_wikidata_languages()
local data = mw.text.jsonDecode(new_title("Module:languages/data/wikidata.json"):getContent())
local seen = {{}, {}, {}, [5] = {}}
for _, item in ipairs(data) do
local id = item.id
for k, v in pairs(item) do
if k ~= "id" then
local _seen = seen[k]
for i, code in ipairs(v) do
local _code = code[1]
local _type = type(_seen[_code])
if _type == "table" then
insert(_seen[_code], id)
elseif _type == "string" then
_seen[_code] = {_seen[_code], id}
else
_seen[_code] = id
end
end
end
end
end
for k, v in pairs(seen) do
for code, ids in pairs(v) do
if type(ids) == "table" then
local t = {}
for i, id in ipairs(ids) do
t[i] = ("<code>[[d:%s|%s]]</code>"):format(id, id)
end
discrepancy("languages/data/wikidata.json", "<code>%s</code> is set as an ISO 639-%d code on multiple items: %s.",
code, k, list_to_text(t))
end
end
end
end
local function check_labels()
local check_label_data_keys = check_data_keys{
"display", "Wikipedia", "glossary",
"plain_categories", "topical_categories", "pos_categories", "regional_categories", "sense_categories",
"omit_preComma", "omit_postComma", "omit_preSpace",
"deprecated", "track"
}
local function check_label(modname, code, data)
local _type = type(data)
if _type == "table" then
check_label_data_keys(modname, code, data)
elseif _type ~= "string" then
discrepancy(modname,
"The data for label <code>%s</code> is a %s; only tables and strings are allowed.",
code, _type)
end
end
for _, module in ipairs{"", "/regional", "/topical"} do
local modname = "Module:labels/data" .. module
module = require(modname)
for label, data in pairs(module) do
check_label(modname, label, data)
end
end
for code in pairs(m_language_codes) do
local modname = "Module:labels/data/lang/" .. code
local ok, module = pcall(require, modname)
if ok then
for label, data in pairs(module) do
check_label(modname, label, data)
end
end
end
end
local function check_zh_trad_simp()
local m_ts = require("Module:zh/data/ts")
local m_st = require("Module:zh/data/st")
local ruby = require("Module:ja-ruby").ruby_auto
local lang = get_lang("zh")
local Hant = m_scripts.getByCode("Hant")
local Hans = m_scripts.getByCode("Hans")
local data = {[0] = m_st, m_ts}
local mod = {[0] = "st", "ts"}
local var = {[0] = "Simp.", "Trad."}
local sc = {[0] = Hans, Hant}
local function find_stable_loop(chars, other, j)
local display = ruby({["markup"] = "[" .. other .. "](" .. var[(j+1)%2] .. ")"})
display = m_links.language_link{term = other, alt = display, lang = lang, sc = sc[(j+1)%2], tr = "-"}
insert(chars, display)
if data[(j+1)%2][other] == other then
insert(chars, other)
return chars, 1
elseif not data[(j+1)%2][other] then
insert(chars, "not found")
return chars, 2
elseif data[j%2][data[(j+1)%2][other]] ~= other then
return find_stable_loop(chars, data[(j+1)%2][other], j + 1)
else
local display = ruby({["markup"] = "[" .. data[(j+1)%2][other] .. "](" .. var[j%2] .. ")"})
display = m_links.language_link{term = data[(j+1)%2][other], alt = display, lang = lang, sc = sc[j%2], tr = "-"}
insert(chars, display .. " (")
display = ruby({["markup"] = "[" .. data[j%2][data[(j+1)%2][other]] .. "](" .. var[(j+1)%2] .. ")"})
display = m_links.language_link{term = data[j%2][data[(j+1)%2][other]], alt = display, lang = lang, sc = sc[(j+1)%2], tr = "-"}
insert(chars, display .. " etc.)")
return chars, 3
end
return chars
end
for i = 0, 1, 1 do
for char, other in pairs(data[i]) do
if data[(i+1)%2][other] ~= char then
local chars, issue = {}
local display = ruby({["markup"] = "[" .. char .. "](" .. var[i] .. ")"})
display = m_links.language_link{term = char, alt = display, lang = lang, sc = sc[i], tr = "-"}
insert(chars, display)
chars, issue = find_stable_loop(chars, other, i)
if issue == 1 or issue == 2 then
local sc_this, mod_this, j = {}
if chars[#chars-1]:match(var[(i+1)%2]) then
j = 1
else
j = 0
end
mod_this = mod[(i+j)%2]
sc_this = {[0] = sc[(i+j)%2], sc[(i+j+1)%2]}
for k, char in ipairs(chars) do
chars[k] = m_script_utils.tag_text(char, lang, sc_this[k%2], "term")
end
if issue == 1 then
discrepancy("zh/data/" .. mod_this, "character references itself: " .. concat(chars, " → "))
elseif issue == 2 then
discrepancy("zh/data/" .. mod_this, "missing character: " .. concat(chars, " → "))
end
elseif issue == 3 then
for j, char in ipairs(chars) do
chars[j] = m_script_utils.tag_text(char, lang, sc[(i+j)%2], "term")
end
discrepancy("zh/data/" .. mod[i], "possible mismatched character: " .. concat(chars, " → "))
end
end
end
end
end
local function check_serialization(modname)
local serializers = {
["Hani-sortkey/data/serialized"] = "Hani-sortkey/serializer",
}
if not serializers[modname] then
return nil
end
local serializer = serializers[modname]
local current_data = require("Module:" .. serializer).main(true)
local stored_data = require("Module:" .. modname)
if current_data ~= stored_data then
discrepancy(modname, "<strong><u>Important!</u> Serialized data is out of sync. Use [[Module: ".. serializer .. "]] to update it. If you have made any changes to the underlying data, the serialized data <u>must</u> be updated before these changes will take effect.</strong>")
end
end
-- Warning: cannot be called twice in the same module invocation because
-- some module-global variables are not reset between calls.
function export.do_checks(modules)
messages = setmetatable({}, {
__index = function (self, k)
local val = Array()
self[k] = val
return val
end
})
if modules["zh/data/ts"] or modules["zh/data/st"] then
check_zh_trad_simp()
end
check_languages()
check_etym_languages()
-- families and scripts must be checked AFTER languages; languages checks fill out
-- the nonempty_families and nonempty_scripts tables, used for testing if a family/script
-- is ever used in the data
check_families()
check_scripts()
check_wikidata_languages()
if modules["labels/data"] then
check_labels()
end
for module in pairs(modules) do
check_serialization(module)
end
setmetatable(messages, nil)
local function find_code(message)
return string.match(message, "<code>([^<]+)</code>")
end
find_code = require("Module:fun").memoize(find_code)
local function comp(message1, message2)
local code1, code2 = find_code(message1), find_code(message2)
if code1 and code2 then
return code1 < code2
else
return message1 < message2
end
end
for _, msglist in pairs(messages) do
msglist:sort(comp)
end
local ret = messages
messages = nil
return ret
end
function export.format_message(modname, msglist)
local header; if modname:match("^Module:") or modname:match("^Template:") then
header = "===[[" .. modname .. "]]==="
else
header = "===[[Module:" .. modname .. "]]==="
end
return header
.. msglist
:map(
function(msg)
return "\n* " .. msg
end)
:concat()
end
function export.check_modules(args)
local modules = {}
for _, arg in ipairs(args) do
modules[arg] = true
end
local ret = Array()
local messages = export.do_checks(modules)
for _, module in ipairs(args) do
local msglist = messages[module]
if msglist then
ret:insert(export.format_message(module, msglist))
end
end
return ret:concat("\n")
end
function export.check_modules_t(frame)
local args = m_table.shallowcopy(frame.args)
return export.check_modules(args)
end
function export.perform(frame)
local messages = export.do_checks({})
-- Format the messages
local ret = Array()
for modname, msglist in m_table.sortedPairs(messages) do
ret:insert(export.format_message(modname, msglist))
end
-- Are there any messages?
if i == 1 then
return "<b class=\"success\">Glory to Arstotzka.</b>"
else
ret:insert(1, "<b class=\"warning\">Discrepancies detected:</b>")
return ret:concat("\n")
end
end
return export