.wpb_animate_when_almost_visible { opacity: 1; }
Table Annotation
Semantic Annotation Toolkit for Tabular Data
2.0

Table of Contents

Authentication prerequisite

Access to this API is secured by the OAuth 2.0 framework with the Client Credentials grant type, which means that you will have to present an OAuth 2.0 access_token whenever you want to request this API.

It's easy to negotiate this access_token: just send a request to the proper token negotiation endpoint, with a Basic Authentication header valued with your own client_id and client_secret.

For this API, the token negotiation endpoint is: https://api.orange.com/oauth/v3/token

A technical guide is available to learn how to negotiate and manage these access_token.

Base URL

Whenever you request this API and encounter a 404 NOT FOUND HTTP error response, please check first that the Base URL is correct.

The Base URL for this API is:

https://api.orange.com/table_annotation/v2/

The documentation below assumes that, whenever you make requests on this API, you are prepending the Base URL to the resource paths defined for this API.

GENERALITIES ABOUT RESOURCES

Requests sequence

("beta" is replaced by "v2")

enter image description here

Request rate limit

  • Maximum request the API can receive from all applications is 10 request/second. If the rate limit is exceeded, the request is rejected on error, you should wait a short delay (in the range of second) before retrying.

  • An application can make up to 600 requests per minutes.

RESOURCES

0. Status Checking

 curl https://api.orange.com/table_annotation/v2/status \
    -H 'Authorization: Bearer XXXX' 

 {
   "Annotation": "OK", 
   "Lookup": "OK", 
   "Preprocessing": "OK"
 }

1. Table Preprocessing

A preprocessing system for tabular data. It involves reading table from file, extracting metadata from table (orientation, header, key column detection and column primitive typing recognition).

POST /preprocessing

To submit tables to be preprocessed. Two content types: JSON-Encoded data and File are supported.

  • Json data

According to DAGOBAH Standard API Input/Output Format, the format of JSON-Encoded data content is:

 curl -X POST https://api.orange.com/table_annotation/v2/preprocessing \
    -H 'Authorization: Bearer XXXX' \
    -H "Content-Type: application/json"  \
    -d ' { 
     "data": [ 
         { "tableDataRaw": [["l1c1","l1c2", ...],["l2c1","l2c2", ...], ...], "requestInfo" : 
      {"id":1235, "title": "table 1" } }, 
         { "tableDataRaw": [["l1c1","l1c2", ...],["l2c1","l2c2", ...], ...], "requestInfo" : 
       {"id":"abc", "title": "table 2"}}, 
         ...
     ]
    }'

where the field "tableDataRaw" is your raw table which is reprensted as 2D array, the field "requestInfo" allows to specify the associated table. This make senses when you send simutaneously multiple tables. You'll need "requestInfo" to discriminate preprocessed tables. You are free to add any subfields you want in requestInfo to better describe your tables.

Example with a table:

curl -k -X POST https://api.orange.com/table_annotation/v2/preprocessing \
  -H 'Authorization: Bearer XXXX' \
  -H "Content-Type: application/json" \
  -d '{"data": [{"requestInfo": {"id": 1}, "tableDataRaw": [["Title","Year","Cast","col3"], 
                                                    ["Pulp Fiction","1994","John Travolta","Gangster"], 
                                                    ["Casino Royale","1967","David Niven","James Bond"], 
                                                    ["Outsiders","1983","Matt Dillon","Drama"], 
                                                    ["Hearts of Darkness: A Filmmakers Apocalypse","1991","Marlon Brando","Docmuentary"], 
                                                    ["Virgin Suicides","1999","Kristen Dunst","Drama"]]}]}' 

{ "task_id": "7e5c1c3b-2666-4bef-a995-f5a30e0a737f"}
  • Return: a task is created and executed independently of the context of the application by API workers. An ID is assigned to that task for further callback.
HTTP/1.1 202 ACCEPTED
{ "task_id": "7e5c1c3b-2666-4bef-a995-f5a30e0a737f" }

See Swagger in API Refernce for more details in returns.

  • File

Attention: Size of file is currently limited to 4MB. Supported formats: .csv, .txt, .tsv, .xlxs.

The API is able to automatically detect 1 table per {.csv, .txt, .tsv} file and multiple tables per sheet per .xlxs file.

curl -k -X POST https://api.orange.com/table_annotation/v2/preprocessing \
  -H 'Authorization: Bearer XXXX' \
  -F file=@file

GET /preprocessing/<task_id>

To check the status of the submitted preprocessing task.

curl -k -X GET  -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/preprocessing/<task_id>
  • Return: status of the task.

Since our worker backend is based on Celery, possible states include:

  • STARTED: the task has been started.
  • PENDING: the task is waiting for execution or is running.
  • IN PROGRESS: the task is being executed.
  • FAILURE: the task raised an exception or has exceeded the retry limit.
  • SUCCESS: the task executed successfully.
curl -k -X GET  -H 'Authorization: Bearer XXXX' \
    https://api.orange.com/table_annotation/v2/preprocessing/7e5c1c3b-2666-4bef-a995-f5a30e0a737f

HTTP/1.1 200 OK
{ "task_status": "SUCCESS" }

See Swagger in API Refernce for more details in returns.

GET /preprocessing/<task_id>/result

To get the preprocessing results. Result format:

  • Light result (default): /result or /result?format=light return only preprocessing results (headerInfo, orientationInfo (orientation, header, key column...) and requestInfo.
curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/preprocessing/<task_id>/result?format=light
  • Full result: /result?format=full return also the raw table and retructured table which are sometimes bulky and trivial.
curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/preprocessing/<task_id>/result?format=full
  • Return
curl  -k -X GET  -H 'Authorization: Bearer XXXX' \
    https://api.orange.com/table_annotation/v2/preprocessing/7e5c1c3b-2666-4bef-a995-f5a30e0a737f/result

HTTP/1.1 200 OK 
  [
  {
      "preprocessed": {
      "headerInfo": {
          "hasHeader": true, 
          "headerLabel": [
          "Title", 
          "Year", 
          "Cast", 
          "col3"
          ], 
          "headerPosition": 0, 
          "headerScore": 0.22
      }, 
      "primaryKeyInfo": {
          "hasPrimaryKey": true, 
          "primaryKeyPosition": 0, 
          "primaryKeyScore": 0.34
      }, 
      "primitiveTyping": [
          {
          "columnIndex": 0, 
          "typing": [
              {
              "typingLabel": "UNKNOWN", 
              "typingScore": 0.6
              }, 
              {
              "typingLabel": "WORK_OF_ART", 
              "typingScore": 0.2
              },
              ...
          ]
          }, 
         ....
      ], 
      "tableOrientation": {
          "orientationLabel": "HORIZONTAL", 
          "orientationScore": 0.89
      }
      }, 
      "raw": {
      "tableContent": null, 
      "tableEndOffset": null, 
      "tableNum": null, 
      "tableOffset": null
      }, 
      "requestInfo": {
      "id": 1
      }
  }
  ]

See Swagger in API Reference for more details in returns.

DELETE /preprocessing/<task_id>

To delete your resources (input file + result) once your request is done and you get your result.

curl -k -X DELETE -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/preprocessing/<task_id>
  • Return
HTTP/1.1 204 No Content

See Swagger in API Reference for more details in returns.

2. Table Annotation:

A semantic annotation systems/algorithms for tabular data. Its goal is to automatically understand the table by matching their elements with concepts, relations of Knowledge Graphs, such as Wikidata, DBpedia or entreprise KG.

Reference: DAGOBAH: Table and Graph Contexts For Efficient Semantic Annotation Of Tabular Data. SemTab@ISWC 2021

POST /annotation

Parameters :

  • KG : Name of the knowledge graph used to annotate the table: wikidata (by default) or dbpedia (not available yet).

  • efficientMode (default: False) : if True, the annotation is limited to 500 entity candidates per table cell.

Two content types: JSON-Encoded data and File are supported:

  • JSON data

According to DAGOBAH Standard API Input/Output Format, the format of JSON-Encoded data content is:

 curl -X POST https://api.orange.com/table_annotation/v2/annotation \
    -H 'Authorization: Bearer XXXX' \
    -H "Content-Type: application/json"  \
    -d ' { 
     "data": [ 
         { "tableDataRaw": [["l1c1","l1c2", ...],["l2c1","l2c2", ...], ...], "requestInfo" : 
      {"id":1235, "title": "table 1" } }, 
         { "tableDataRaw": [["l1c1","l1c2", ...],["l2c1","l2c2", ...], ...], "requestInfo" : 
       {"id":"abc", "title": "table 2"}}, 
         ...
     ]
    }'

where the field "tableDataRaw" is your raw table which is reprensted as 2D array, the field "requestInfo" allows to specify the associated table. This make senses when you send simutaneously multiple tables. You'll need "requestInfo" to discriminate preprocessed tables. You are free to add any subfields you want in requestInfo to better describe your tables.

E.g.

curl -k -X POST https://api.orange.com/table_annotation/v2/annotation \
  -H 'Authorization: Bearer XXXX' \
  -H "Content-Type: application/json" \
  -d '{"data": [{"requestInfo": {"id": 1}, "tableDataRaw": [["Title","Year","Cast","col3"], 
                                                            ["Pulp Fiction","1994","John Travolta","Gangster"], 
                                                            ["Casino Royale","1967","David Niven","James Bond"], 
                                                            ["Outsiders","1983","Matt Dillon","Drama"], 
                                                            ["Hearts of Darkness: A Filmmakers Apocalypse","1991","Marlon Brando","Docmuentary"], 
                                                            ["Virgin Suicides","1999","Kristen Dunst","Drama"]]}]}'  

{ "task_id": "7aef5b18-f8ce-4281-8754-82b70c2ad2ea" }
  • File

Attention: the size of file is currently limited to 4MB. Supported formats: .csv, .txt, .tsv, .xlxs.

The API is able to automatically detect 1 table per {.csv, .txt, .tsv} file and multiple tables per sheet per .xlxs file.

curl -k -X POST https://api.orange.com/table_annotation/v2/annotation \
  -H 'Authorization: Bearer XXXX' \
  -F file=@file

See Swagger in API Reference for more details in returns.

GET /annotation/<task_id>

To check the status of the submitted annotation task and also its progress (via task_current_progress response field).

curl -k -X GET  -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/annotation/<task_id>

E.g.

curl -k -X GET  -H 'Authorization: Bearer XXXX' https://api.orange.com/table_annotation/v2/annotation/7aef5b18-f8ce-4281-8754-82b70c2ad2ea

{
"task_current_progress": "Preprocessing step.", 
"task_status": "IN PROGRESS"
}

...

curl -k -X GET  -H 'Authorization: Bearer XXXX' https://api.orange.com/table_annotation/v2/annotation/7aef5b18-f8ce-4281-8754-82b70c2ad2ea

{
"task_current_progress": "Entity Scoring step: finished 2/6 table rows.", 
"task_status": "IN PROGRESS"
}

...

curl -k -X GET  -H 'Authorization: Bearer XXXX' https://api.orange.com/table_annotation/v2/annotation/7aef5b18-f8ce-4281-8754-82b70c2ad2ea

{"task_status": "SUCCESS"}

See Swagger in API Reference for more details in returns.

GET /annotation/<task_id>/result

Attention: the annotation process can take several minutes depending on the size of the input file

  • Light result (default): /result or /result?format=light return only annotation results (CEA, CTA, CPA) and execution times.
curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/annotation/<task_id>/result?format=light
  • Full result: /result?format=full return also the raw table and retructured table which are sometimes bulky and trivial.
curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/annotation/<task_id>/result?format=full
  • Return

curl  -k -X GET  -H 'Authorization: Bearer XXXX' \
    https://api.orange.com/table_annotation/v2/annotation/7aef5b18-f8ce-4281-8754-82b70c2ad2ea/result

HTTP/1.1 200 OK
[
{
  "annotated": {
  "CEA": [
      {
      "annotation": {
          "label": "Pulp Fiction", 
          "score": 0.93, 
          "uri": "https://www.wikidata.org/wiki/Q104123"
      }, 
      "column": 0, 
      "row": 1
      }, 
    ...
  ], 
  "CPA": [
      {
      "annotation": {
          "coverage": 1.0, 
          "label": "cast member", 
          "score": 0.95, 
          "uri": "https://www.wikidata.org/wiki/Property:P161"
      }, 
      "headColumn": 0, 
      "tailColumn": 2
      }, 
     ...
  ], 
  "CTA": [
      {
      "annotation": [
          {
          "coverage": 1.0, 
          "label": "film", 
          "score": 0.95, 
          "uri": "https://www.wikidata.org/wiki/Q11424"
          }
      ], 
      "column": 0
      }, 
       ...
  ]
  }, 
  "raw": {
  "tableContent": null, 
  "tableEndOffset": null, 
  "tableNum": null, 
  "tableOffset": null
  }, 
  "requestInfo": {
  "id": 1
  }
}
]

See Swagger in API Refernce for more details in returns.

DELETE /annotation/<task_id>

curl -k -X DELETE -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/annotation/<task_id>

See Swagger in API Refernce for more details in returns.

3. Entity Lookup

A lookup service aims to find relevant entity candidates of an input label from a reference Knowledge Graph.

The Lookup API has also a synchronous operation mode which allow to you to get the lookup results for your input labels in only one GET request. This mode is preferable if your input is small (less number of labels to be searched).

POST /lookup

To submit a single lable or a list of label to be search for candidate entities in KG. Only content types JSON-Encoded data supported.

Parameters:

  • KG : Name of the knowledge graph : wikidata (by default) or dbpedia.

  • score : Return ratio score for each entity in output result (False by default).

  • requestMode : Lookup request execution mode : sync (by default, more preferable for small input) or async.

Synchronous mode:

curl -k -X POST 'https://api.orange.com/table_annotation/v2/lookup?score=true&KG=wikidata' \
  -H 'Authorization: Bearer XXXX' \
  -H 'Content-Type: application/json' \
    -d '{"data":"milo aukerman"}'

[{"entities":[{"entity":"Q1935670","label":"Milo Aukerman","score":0.9487179487179488}, 
 {"entity":"Q22133501","label":"Aukerman","score":0.8095238095238094},
 "entity":"Q20995644", "label":"Aukerman","score":0.8095238095238094}],
 "label":"milo aukerman"}]

Asynchronous mode:

curl -k -X POST 'https://api.orange.com/table_annotation/v2/lookup?score=true&KG=wikidata&requestMode=async' \
  -H 'Authorization: Bearer XXXX' \
  -H 'Content-Type: application/json' \
    -d '{"data":"milo aukerman"}'

{"task_id":"fbcac42e-74cc-4c15-be08-1efbc8d6f15d"}

See Swagger in API Refernce for more details in returns.

GET /lookup/<task_id>

Check the task status (as in Preprocessing part). See Swagger in API Refernce for more details in returns.

curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/lookup/<task_id>

GET /lookup/<task_id>/result

curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/lookup/<task_id>/result

E.g.

curl -k -X GET -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/lookup/fbcac42e-74cc-4c15-be08-1efbc8d6f15d/result

[{"entities":[{"entity":"Q1935670","label":"Milo Aukerman","score":0.9487179487179488},{"entity":"Q22133501","label":"Aukerman","score":0.8095238095238094},{"entity":"Q20995644","label":"Aukerman","score":0.8095238095238094}],"label":"milo aukerman"}]

See Swagger in API Refernce for more details in returns.

DELETE /lookup/<task_id>

To delete your resources (input file + result) once your request is done and you get your result.

curl -k -X DELETE -H 'Authorization: Bearer XXXX' \
  https://api.orange.com/table_annotation/v2/lookup/<task_id>

HTTP/1.1 204 No Content

See Swagger in API Refernce for more details in returns.