Semantic web technologies — OWL, RDF, SPARQL, knowledge graphs — sit in an interesting corner of AI that most practitioners don’t engage with. They predate the deep learning era and come from a different intellectual tradition: formal logic and knowledge representation. This project was an exercise in that tradition, applied to a domain I found meaningful: mapping out the AI field itself.
The ontology
The domain model captures:
- AI subfields and the foundational concepts each requires (e.g., reinforcement learning requires probability theory and calculus)
- Career roles within AI (research scientist, ML engineer, AI product manager, etc.) and the skills each demands
- Skills themselves, with typing and difficulty levels
- Learning resources mapped to skills and difficulty levels, to support recommendation queries
- Prerequisite relationships between concepts — enabling path queries like “what do I need to learn before I can study transformers?”
The ontology is authored in RDF/XML format (ai-ontology.rdf) and loaded into a local GraphDB instance. SPARQL competency questions are defined up front and tested against the populated graph.
The application
A Streamlit app connects live to the GraphDB SPARQL endpoint. Users can run the predefined competency queries, explore the graph structure, and interact with a local Ollama model (llama3.2:1b) for natural language querying — the LLM translates plain English questions into SPARQL and interprets the results.
A reload_ontology.py utility handles clearing and reloading the graph data cleanly during development, which was useful for iterating on the ontology schema.
Stack
- Python · OWL/RDF (RDF/XML) · GraphDB · SPARQL · Streamlit · Ollama (llama3.2:1b) · Poetry
Reflection
Formal ontology design forces a kind of precision that machine learning doesn’t require. Every relationship needs a name, a direction, and a formal type. Every instance needs to fit cleanly into the class hierarchy. That discipline produces a knowledge base that’s queryable in ways a document corpus or a vector store isn’t — you can ask “which roles require both Python and statistics?” and get a precise, reproducible answer grounded in defined relationships.
The limitation, of course, is that someone has to maintain it. Ontologies don’t learn from data. But for stable domains with well-understood structure, that’s a feature rather than a bug.