Library Docstrings¶

The Anacode Toolkit library consists of the two modules anacode.api and anacode.agg. anacode.api simplifies the use of the API, whereas anacode.agg provides functionality for further analysis, aggregation and visualization of the results.

anacode.api
- Writers
- Querying
anacode.agg
- Dataset loader
- API Datasets
anacode.agg.plotting

anacode.api ¶

Writers ¶

class anacode.api.writers.Writer¶

Base “abstract” class containing common methods that are needed by all implementations of Writer interface.

The writer interface consists of init, close and write_bulk methods.

close()¶: Not implemented here! Each subclass should decide what to do here.

init()¶: Not implemented here! Each subclass should decide what to do here.

write_absa(analyzed, single_document=False)¶

Converts absa analysis result to flat lists and stores them.

Parameters:	analyzed (list) – JSON absa analysis result single_document (bool) – Is analysis describing just one document

write_analysis(analyzed)¶

Inspects analysis result for performed analysis and delegates persisting of results to appropriate write methods.

Parameters:	analyzed – JSON object analysis response
Type:	dict

write_bulk(results)¶

Stores multiple anacode api’s JSON responses marked with call IDs as tuples (call_id, call_result). Both scrape and analyze call IDs are defined in anacode.codes module.

Parameters:	results (list) – List of anacode responses with IDs of calls used

write_categories(analyzed, single_document=False)¶

Converts categories analysis result to flat lists and stores them.

Parameters:	analyzed (list) – JSON categories analysis result single_document (bool) – Is analysis describing just one document

write_concepts(analyzed, single_document=False)¶

Converts concepts analysis result to flat lists and stores them.

Parameters:	analyzed (list) – JSON concepts analysis result single_document (bool) – Is analysis describing just one document

write_row(call_type, call_result)¶

Decides what kind of data it got and calls appropriate write method.

Parameters:	call_type (int) – Library’s ID of anacode call call_result (list) – JSON response from Anacode API

write_sentiment(analyzed, single_document=False)¶

Converts sentiment analysis result to flat lists and stores them.

Parameters:	analyzed (list) – JSON sentiment analysis result single_document (bool) – Is analysis describing just one document

class anacode.api.writers.CSVWriter(target_dir='.')¶

__init__(target_dir='.')¶

Initializes Writer to store Anacode API analysis results in target_dir in csv files.

Parameters:	target_dir (str) – Path to directory where to store csv files

class anacode.api.writers.DataFrameWriter(frames=None)¶

Writes Anacode API output into pandas.DataFrame instances.

__init__(frames=None)¶

Initializes dictionary of result frames. Alternatively uses given frames dict for storage.

Parameters:	frames (dict) – Might be specified to use this instead of new dict

Querying ¶

class anacode.api.client.AnacodeClient(auth, base_url='https://api.anacode.de/')¶

Makes posting data to server for analysis simpler by storing user’s auth, the URL of the Anacode API server and paths for analysis calls.

To find out more about specific API calls and analyses and their output format, please refer to https://api.anacode.de/api-docs/calls.html.

__init__(auth, base_url='https://api.anacode.de/')¶

Default value for base_url is taken from environment variable ANACODE_API_URL if set; otherwise, ‘https://api.anacode.de/‘ is used.

Parameters:	auth (str) – User’s token base_url (str) – Anacode API server URL

analyze(texts, analyses, external_entity_data=None, single_document=False)¶

Use Anacode API to perform specified linguistic analysis on texts. Please consult https://api.anacode.de/api-docs/calls.html for more details and better understanding of parameters.

Parameters:	texts – List of texts to analyze analyses – List of analysss to perform. Can contain ‘categories’, ‘concepts’, ‘sentiment’ and ‘absa’ external_entity_data – Provide additional entities to relate to sentiment evaluation. single_document (bool) – Makes API treat texts as paragraphs of one document instead of treating them as separate documents
Returns:	dict –

call(task)¶

Given tuple of Anacode API analysis code and arguments for this analysis this will call appropriate method out of scrape, categories, concepts, sentiment or absa and return it’s result

Parameters:	task (tuple) – Task definition tuple - (analysis code, analysis args)
Returns:	dict –

scrape(link)¶

Use Anacode API’s scrape call to scrape page from Web URL and return result.

Parameters:	link (str) – URL that should be scraped
Returns:	dict –

class anacode.api.client.Analyzer(client, writer, threads=1, bulk_size=100)¶

This class makes querying with multiple threads and storing in other formats then list of json-s simple.

__init__(client, writer, threads=1, bulk_size=100)¶

Parameters:	client (`anacode.api.client.AnacodeClient`) – Will be used to post analysis to anacode api writer (`anacode.api.writers.Writer`) – Needs to implement init, close and write_bulk methods from Writer interface threads (int) – Number of concurrent threads to use, defaults to 1 bulk_size (int) – How often should writer’s write_bulk method be invoked, defaults to 100

analyze(texts, analyses, external_entity_data=None, single_document=False)¶: Dummy clone for anacode.api.client.AnacodeClient.analyze()

analyze_bulk()¶

Performs bulk analysis. Will use multiprocessing.dummy.Pool to post data to anacode api if number of threads is more than one.

Analysis results are not returned, but cached internally.

flush_analysis_data()¶: Writes all cached analysis results using writer.

scrape(link)¶: Dummy clone for anacode.api.client.AnacodeClient.scrape()

should_start_analysis()¶

Checks how many tasks are in queue and returns boolean indicating whether analysis should be performed.

Returns:	bool – True if analysis should happen now, False otherwise

anacode.api.client.analyzer(auth, writer, threads=1, bulk_size=100, base_url='https://api.anacode.de/')¶

Convenient function for initializing bulk analyzer and potentially temporary writer instance as well.

Parameters:	auth (str) – User’s token string threads (int) – Number of threads to use for https communication with server writer (str) – Writer instance that will store analysis results or path to folder where csv-s should be saved or dictionary where data frames should be stored bulk_size (int) – base_url (str) – Anacode API server URL
Returns:	`anacode.api.client.Analyzer` – Bulk analyzer instance

Library Docstrings¶

anacode.api ¶

Writers ¶

Querying ¶

anacode.agg ¶

Dataset loader ¶

API Datasets ¶

anacode.agg.plotting ¶

Table Of Contents

This Page