Socilyzer

Extract Data for Social Network Analysis

Relational data can be found in many data sets but it is often tedious to extract. To help you find the networks hidden in your data sets, we've written a script for the Excel-plugin DataNitro that can extract information about nodes and the relationships between them.

Getting Started

To get started:

  1. download and install DataNitro (Windows only) and install the included Python package
  2. download the script
  3. download the Excel-file example

The following is a demonstration of how the script is used to extract a list of nodes and a list of edges from a data set with incorporated relational data. Followed by a description of how to adapt the script to your own code.

Example: Running the Script

This script will construct a traditional network (an one-mode network) where all node can be connected directly to one another. See the last paragraph for scripts for two-mode networks.

Once you've opened the Excel-file, navigate to the DataNitro pane in the top and click 'Import script'. Load the Python script (sna_extract.py) you downloaded earlier and click 'Run'.

Social network data set up in Excel

A prompt will open to show which step the script is currently executing:

Running DataNitro script to extract social network data

As the prompt closes you'll be shown the 'Controls' sheet. This sheet gives you instructions on how to proceed and also allows you to adjust thresholds for the ratings. Ratings are used to show the strength of a relationship. The column on the far right titled 'Rating Counts' shows how many of each rating can currently be found in the list of relationships. This last column is updated as the thresholds are changed.

Adjust thresholds to inspect social network on different levels

Example: Using the Data

To use the extracted data export the 'Attributes' and 'Edgelist' sheets as Excel files.

Save edge data as an edgelist

These Excel files are imported as shown in the Data Entry guide

Once the two files have been imported into Socilyzer they can be visualized like this (here colored by cluster):

Social network graph showing number of client engagements

Modifying the Script to Your Data

The script is easy to adapt to your own data. Simply open the script in a text editor (e.g. Nodepad) or a code editor and change the variables listed near the top:

Configuration of DataNitro script for social network data

rater_nodes should equal any cell in the column where start nodes are listed, i.e., the origin of a relationship.
rated_nodes similar to the variable above, rated_nodes should be any cell in the column listing end nodes, i.e., the destionation of a relationship.
ignore_first_line should be set to True if headlines are used in the data sheet.

By default the script only extracts the name of the nodes but it can easily be changed to group (rater) nodes. This is done by setting "include_groups" to "True" and setting "groups" to a cell in the column with group names. This produces the following output in the Controls sheet:

Identifying groups to be used for the organizational network analysis

SQL Scripts

Relational data can also be extracted directly from a database using SQL. The example queries below use this publicly available dataset.

Extract edges:

SELECT "CUSTOMER-" || CustomerID as "Customer", "EMPLOYEE-" || EmployeeID as "Employee" FROM Orders;

Extract attributes:

SELECT "CUSTOMER-" || CustomerID as "id", ContactName || ", " || CustomerName as "name", country as "group" FROM Customers UNION SELECT "EMPLOYEE-" || EmployeeID, FirstName || " " || LastName, "Employee" FROM Employees;

Scripts for Two-Mode Networks

For data sets with two node sets (e.g., clients and sales people) you can construct a two-mode network where each node set is connected through the other node set. To help you extract such information you can use this DataNitro script.

A two-mode network can also be converted into an one-mode network where the number of shared intermediate nodes (second node set) determine the intensity of the relationship between the two nodes. As an example, two sales people can be strongly connected because they share many clients. This script helps you extract such information.

Contact

It's as easy as that. Should you run into any problems using the script with Socilyzer, feel free to reach out to us at hi@socilyzer.com