Mapping Credibility Signal Questionnaire Data to RDF

TBD

Actually, it's not even "W3C-internal", it's more like "W3C-related". (But the above paragraph is boilerplate.)

## Introduction Needed by: https://github.com/sandhawke/meedancheck-to-rdf ## Responses as RDF statements Do we treat data expressing how a fixed question was answered about a given subject as essentially an RDF statement? The triple { given subject, question, answer } seems very triple-like. The rest of these issues assume, Yes, we're treating the answering of a question as an RDF statement. If we said No, then things would be simpler at this point, but more conversion steps would probably be needed later on during data integration and processing. This discarded option might look like:

:resp6321
    cred:question "Is the language of the headline extremely negative, extremely positive, or somewhere in the middle?";
    cred:answer "Neither negative nor positive".
    cred:observerProfile ;
    dc:date "2018-12-17T15:44:24-05:00Z"^^xsd:dateTimeStamp

## Encoding as Predicates and Objects Assuming we use an RDF or RDF-like encoding, with predicates identified by IRIs, we still have a few big design decisions. ### Conveying the list of allowed answers Do we need to record what the possible answers were for multiple choice questions, or can we just express what the answer was? For example, if user 1 is asked "Which is your favorite flavor ice cream: vanilla, chocolate, or strawberry" and user 2 is asked "Which is your favorite flavor ice cream: vanilla, chocolate, strawberry, or mint?", and they both answer "vanilla", will the data for both users be the same? Conveying the allowed answers is more scientifically accurate, but it makes things seem rather more complicated. In shorthand below, we refer to conveying the list of allowed answers as "**all**" or "**a**" (*all* the possible answers are encoded), and the opposite as "**one**" or "**o**" (only *one* of the possible answers is encoded). ### Division between Predicate and Object While { given subject, question, answer } looks simple, questions are not exactly predicates, so perhaps the response should map to an RDF statement in a different way. Some options are: Option **tf**: true/false. The question and answer text are both encoded in the predicate, and the object is xsd:true when this answer was given to this question. Option **ag**: agreement level. The predicate encodes a propositional statement and the object encodes a level of agreement the user has with that statement. This works nicely for some questions, and may fit into other uncertainty-reasoning systems. There is a question whether the answers are merely ordinal (ordered values) or can be calibrated as [interval or ratio](https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/) readings. Option **mc**: multiple choice. Treat the question as a predicate and the answer as the value. The triple's object would be a named node connected to definition text, and also a sortable literal for ordinal answers and a numeric answer for interval/ratio answers. ### Experimental Representation Combining these three options with the two from the previous issue, we get six variants, which we give 3-letter codes: | | a: all possible answers encoded | o: only the actually-given answer encoded | |---------------------|---------------------------------|-------------------------------------------| | tf: true/false | tfa | tfo | | ag: agreement | aga | ago | | mc: multiple choice | mca | mco | Then, for experimentation, we can have multiple versions of each predicate. Perhaps like this: for the signal "Clickbaity" the true/false+all-answers predicate would be :tfaClickbaity; others would be :tfoClickbaity, etc. Or we could make it part of the namespace. The tf and mc versions can be machine generated, while an ag one requires human configuration Also the data needed for the "a" versions requires human configuration, because the instance data might not have all the possible answers and doesn't say the order in which they were presented. Human configuration is also needed to map to signal names (and thus nice predicate names). My guess right now is aga and mco will be best, depending on the question. ## Representing metadata How do we represent metadata about the triple, like who answered it and when? On this question, there are a set of answers that use a "named graph" approach, and a set that use a "reification" approach. ### Named Graph designs For names graphs, we can do something like this:

:resp6321
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp.
:resp6321 { ?item ?signal ?reading }

Or we can separate the graph label from the response, like this

:resp6321
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp;
   cred:inGraph _:g6321.    
_:g6321 { ?item ?signal ?reading }

If there are multiple responses with the same metadata (unlikely if the timestamp includes time-of-day), then the readings can be merged into one graph:

:gr1
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp;
:gr1 { ?item ?signal1 ?reading1; ?signal2 ?reading2 }

### Reification designs Some options for reification follow. Custom vocab:

:resp6321
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp;
   cred:item ?item;
   cred:signal ?signal;
   cred:reading ?reading.

Using RDF reification vocab directly:

:resp6321
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp;
   rdf:subject ?item;
   rdf:predicate ?signal;
   rdf:object ?reading.

Using RDF reification vocab indirectly:

:resp6321
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp;
   cred:statement [
     rdf:subject ?item;
     rdf:predicate ?signal;
     rdf:object ?reading ].

Or doing it wikidata-style: (What is this called? Also they rely on namespace swapping instead of providing a metadataAccessVia link.)

?signal
   :metadataAccessVia ?signalMETA
?item
   ?signalMETA :resp6321
:resp6321
   ?signal ?reading.
   cred:observerProfile ?observerProfileURL;
   dc:date ?timeStamp;