Library Docstrings¶
The Anacode Toolkit library consists of the two modules anacode.api and anacode.agg. anacode.api simplifies the use of the API, whereas anacode.agg provides functionality for further analysis, aggregation and visualization of the results.
anacode.api¶
Writers¶
-
class
anacode.api.writers.Writer¶ Base “abstract” class containing common methods that are needed by all implementations of Writer interface.
The writer interface consists of init, close and write_bulk methods.
-
close()¶ Not implemented here! Each subclass should decide what to do here.
-
init()¶ Not implemented here! Each subclass should decide what to do here.
-
write_absa(analyzed, single_document=False)¶ Converts absa analysis result to flat lists and stores them.
Parameters: - analyzed (list) – JSON absa analysis result
- single_document (bool) – Is analysis describing just one document
-
write_analysis(analyzed)¶ Inspects analysis result for performed analysis and delegates persisting of results to appropriate write methods.
Parameters: analyzed – JSON object analysis response Type: dict
-
write_bulk(results)¶ Stores multiple anacode api’s JSON responses marked with call IDs as tuples (call_id, call_result). Both scrape and analyze call IDs are defined in anacode.codes module.
Parameters: results (list) – List of anacode responses with IDs of calls used
-
write_categories(analyzed, single_document=False)¶ Converts categories analysis result to flat lists and stores them.
Parameters: - analyzed (list) – JSON categories analysis result
- single_document (bool) – Is analysis describing just one document
-
write_concepts(analyzed, single_document=False)¶ Converts concepts analysis result to flat lists and stores them.
Parameters: - analyzed (list) – JSON concepts analysis result
- single_document (bool) – Is analysis describing just one document
-
-
class
anacode.api.writers.CSVWriter(target_dir='.')¶
-
class
anacode.api.writers.DataFrameWriter(frames=None)¶ Writes Anacode API output into pandas.DataFrame instances.
Querying¶
-
class
anacode.api.client.AnacodeClient(auth, base_url='https://api.anacode.de/')¶ Makes posting data to server for analysis simpler by storing user’s auth, the URL of the Anacode API server and paths for analysis calls.
To find out more about specific API calls and analyses and their output format, please refer to https://api.anacode.de/api-docs/calls.html.
-
__init__(auth, base_url='https://api.anacode.de/')¶ Default value for base_url is taken from environment variable ANACODE_API_URL if set; otherwise, ‘https://api.anacode.de/‘ is used.
Parameters:
-
analyze(texts, analyses, external_entity_data=None, single_document=False)¶ Use Anacode API to perform specified linguistic analysis on texts. Please consult https://api.anacode.de/api-docs/calls.html for more details and better understanding of parameters.
Parameters: - texts – List of texts to analyze
- analyses – List of analysss to perform. Can contain ‘categories’, ‘concepts’, ‘sentiment’ and ‘absa’
- external_entity_data – Provide additional entities to relate to sentiment evaluation.
- single_document (bool) – Makes API treat texts as paragraphs of one document instead of treating them as separate documents
Returns: dict –
-
-
class
anacode.api.client.Analyzer(client, writer, threads=1, bulk_size=100)¶ This class makes querying with multiple threads and storing in other formats then list of json-s simple.
-
__init__(client, writer, threads=1, bulk_size=100)¶ Parameters: - client (
anacode.api.client.AnacodeClient) – Will be used to post analysis to anacode api - writer (
anacode.api.writers.Writer) – Needs to implement init, close and write_bulk methods from Writer interface - threads (int) – Number of concurrent threads to use, defaults to 1
- bulk_size (int) – How often should writer’s write_bulk method be invoked, defaults to 100
- client (
-
analyze(texts, analyses, external_entity_data=None, single_document=False)¶ Dummy clone for
anacode.api.client.AnacodeClient.analyze()
-
analyze_bulk()¶ Performs bulk analysis. Will use
multiprocessing.dummy.Poolto post data to anacode api if number of threads is more than one.Analysis results are not returned, but cached internally.
-
flush_analysis_data()¶ Writes all cached analysis results using writer.
-
scrape(link)¶ Dummy clone for
anacode.api.client.AnacodeClient.scrape()
-
should_start_analysis()¶ Checks how many tasks are in queue and returns boolean indicating whether analysis should be performed.
Returns: bool – True if analysis should happen now, False otherwise
-
-
anacode.api.client.analyzer(auth, writer, threads=1, bulk_size=100, base_url='https://api.anacode.de/')¶ Convenient function for initializing bulk analyzer and potentially temporary writer instance as well.
Parameters: - auth (str) – User’s token string
- threads (int) – Number of threads to use for https communication with server
- writer (str) – Writer instance that will store analysis results or path to folder where csv-s should be saved or dictionary where data frames should be stored
- bulk_size (int) –
- base_url (str) – Anacode API server URL
Returns: anacode.api.client.Analyzer– Bulk analyzer instance