Korea Public Data

non-profit

https://github.com/yeongseon/kpubdata

yeongseon

Activity Feed

AI & ML interests

open-data, public-data, korean-nlp, tabular-data, datasets

Recent Activity

yeongseonchoe updated a dataset 20 days ago

kpubdata/seoul-bike-rent-month

yeongseonchoe published a dataset 20 days ago

kpubdata/seoul-bike-rent-month

yeongseonchoe updated a dataset 24 days ago

kpubdata/seoul-apartment-rent

View all activity

Organization Card

Community About org cards

kpubdata — Korean Public Data for Everyone

Making Korean government open data accessible worldwide with a single line of code.

from datasets import load_dataset

ds = load_dataset("kpubdata/seoul-apartment-trades")
df = ds["train"].to_pandas()

Mission

Korean public data (data.go.kr) is valuable but hard to access: complex API authentication, XML responses, Korean-only documentation, and no standard formats like Parquet or HuggingFace Datasets.

We bridge the gap — raw public data, cleaned and published as HuggingFace Datasets. No feature engineering, no opinions. Just honest, well-documented government data ready to use.

Principles

Source fidelity: Original Korean text values preserved as-is. English column names for accessibility.
Schema honesty: What is declared in the config is exactly what you get. No phantom columns, no all-null surprises.
Global-first documentation: Dataset cards in English with Korean domain context explained for international users.
No feature engineering: We publish clean raw data. Users add derived features (geocoding, distances, etc.) themselves — just like Kaggle.

Available Datasets

Dataset	Records	Period	Source	Description
seoul-apartment-trades	~234k	2020–2024	MOLIT via data.go.kr	Apartment sale transactions in Seoul, all 25 districts

More datasets coming — air quality, weather, transit, and more.

How It Works

[data.go.kr API] → [kpubdata SDK] → [kpubdata-builder pipeline] → [HuggingFace Dataset]

kpubdata — Python SDK that handles API auth, pagination, and response parsing for Korean public data portals
kpubdata-builder — Pipeline that fetches, transforms, validates, and publishes datasets to HuggingFace

Contributing

We welcome contributions! If there is a Korean public dataset you would like to see on HuggingFace:

Check if the source API is available on data.go.kr
Open an issue on kpubdata-builder
Or submit a PR with a new dataset config (see publishing standards)

License

Datasets are published under licenses compatible with their original government data licenses. Most Korean public data uses 공공누리 (Korea Open Government License), mapped to CC-BY-4.0.

See individual dataset cards for specific licensing details.

models 0

None public yet

datasets 3