Library Docstrings

The Anacode Toolkit library consists of the two modules anacode.api and anacode.agg. anacode.api simplifies the use of the API, whereas anacode.agg provides functionality for further analysis, aggregation and visualization of the results.

anacode.api

Writers

class anacode.api.writers.Writer

Base “abstract” class containing common methods that are needed by all implementations of Writer interface.

The writer interface consists of init, close and write_bulk methods.

close()

Not implemented here! Each subclass should decide what to do here.

init()

Not implemented here! Each subclass should decide what to do here.

write_absa(analyzed, single_document=False)

Converts absa analysis result to flat lists and stores them.

Parameters:
  • analyzed (list) – JSON absa analysis result
  • single_document (bool) – Is analysis describing just one document
write_analysis(analyzed)

Inspects analysis result for performed analysis and delegates persisting of results to appropriate write methods.

Parameters:analyzed – JSON object analysis response
Type:dict
write_bulk(results)

Stores multiple anacode api’s JSON responses marked with call IDs as tuples (call_id, call_result). Both scrape and analyze call IDs are defined in anacode.codes module.

Parameters:results (list) – List of anacode responses with IDs of calls used
write_categories(analyzed, single_document=False)

Converts categories analysis result to flat lists and stores them.

Parameters:
  • analyzed (list) – JSON categories analysis result
  • single_document (bool) – Is analysis describing just one document
write_concepts(analyzed, single_document=False)

Converts concepts analysis result to flat lists and stores them.

Parameters:
  • analyzed (list) – JSON concepts analysis result
  • single_document (bool) – Is analysis describing just one document
write_row(call_type, call_result)

Decides what kind of data it got and calls appropriate write method.

Parameters:
  • call_type (int) – Library’s ID of anacode call
  • call_result (list) – JSON response from Anacode API
write_sentiment(analyzed, single_document=False)

Converts sentiment analysis result to flat lists and stores them.

Parameters:
  • analyzed (list) – JSON sentiment analysis result
  • single_document (bool) – Is analysis describing just one document
class anacode.api.writers.CSVWriter(target_dir='.')
__init__(target_dir='.')

Initializes Writer to store Anacode API analysis results in target_dir in csv files.

Parameters:target_dir (str) – Path to directory where to store csv files
class anacode.api.writers.DataFrameWriter(frames=None)

Writes Anacode API output into pandas.DataFrame instances.

__init__(frames=None)

Initializes dictionary of result frames. Alternatively uses given frames dict for storage.

Parameters:frames (dict) – Might be specified to use this instead of new dict

Querying

class anacode.api.client.AnacodeClient(auth, base_url='https://api.anacode.de/')

Makes posting data to server for analysis simpler by storing user’s auth, the URL of the Anacode API server and paths for analysis calls.

To find out more about specific API calls and analyses and their output format, please refer to https://api.anacode.de/api-docs/calls.html.

__init__(auth, base_url='https://api.anacode.de/')

Default value for base_url is taken from environment variable ANACODE_API_URL if set; otherwise, ‘https://api.anacode.de/‘ is used.

Parameters:
  • auth (str) – User’s token
  • base_url (str) – Anacode API server URL
analyze(texts, analyses, external_entity_data=None, single_document=False)

Use Anacode API to perform specified linguistic analysis on texts. Please consult https://api.anacode.de/api-docs/calls.html for more details and better understanding of parameters.

Parameters:
  • texts – List of texts to analyze
  • analyses – List of analysss to perform. Can contain ‘categories’, ‘concepts’, ‘sentiment’ and ‘absa’
  • external_entity_data – Provide additional entities to relate to sentiment evaluation.
  • single_document (bool) – Makes API treat texts as paragraphs of one document instead of treating them as separate documents
Returns:

dict –

call(task)

Given tuple of Anacode API analysis code and arguments for this analysis this will call appropriate method out of scrape, categories, concepts, sentiment or absa and return it’s result

Parameters:task (tuple) – Task definition tuple - (analysis code, analysis args)
Returns:dict –
scrape(link)

Use Anacode API’s scrape call to scrape page from Web URL and return result.

Parameters:link (str) – URL that should be scraped
Returns:dict –
class anacode.api.client.Analyzer(client, writer, threads=1, bulk_size=100)

This class makes querying with multiple threads and storing in other formats then list of json-s simple.

__init__(client, writer, threads=1, bulk_size=100)
Parameters:
  • client (anacode.api.client.AnacodeClient) – Will be used to post analysis to anacode api
  • writer (anacode.api.writers.Writer) – Needs to implement init, close and write_bulk methods from Writer interface
  • threads (int) – Number of concurrent threads to use, defaults to 1
  • bulk_size (int) – How often should writer’s write_bulk method be invoked, defaults to 100
analyze(texts, analyses, external_entity_data=None, single_document=False)

Dummy clone for anacode.api.client.AnacodeClient.analyze()

analyze_bulk()

Performs bulk analysis. Will use multiprocessing.dummy.Pool to post data to anacode api if number of threads is more than one.

Analysis results are not returned, but cached internally.

flush_analysis_data()

Writes all cached analysis results using writer.

scrape(link)

Dummy clone for anacode.api.client.AnacodeClient.scrape()

should_start_analysis()

Checks how many tasks are in queue and returns boolean indicating whether analysis should be performed.

Returns:bool – True if analysis should happen now, False otherwise
anacode.api.client.analyzer(auth, writer, threads=1, bulk_size=100, base_url='https://api.anacode.de/')

Convenient function for initializing bulk analyzer and potentially temporary writer instance as well.

Parameters:
  • auth (str) – User’s token string
  • threads (int) – Number of threads to use for https communication with server
  • writer (str) – Writer instance that will store analysis results or path to folder where csv-s should be saved or dictionary where data frames should be stored
  • bulk_size (int) –
  • base_url (str) – Anacode API server URL
Returns:

anacode.api.client.Analyzer – Bulk analyzer instance