Prof. Meng - Emrecan Tarakci: Difference between revisions

From CS486wiki
Jump to navigationJump to search
Content deleted Content added
No edit summary   (change visibility)
No edit summary   (change visibility)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==


After the discussion and the agreement of the project with Professor Weiyi Meng,
After the discussion and the agreement of the project with Professor Weiyi Meng,
I - Emrecan Tarakci- have started working on the project known as Publication
I - Emrecan Tarakci- have started working on the project known as Publication
Analysis on Google Scholar. This project was for meeting the demand of Senior
Analysis on Google Scholar. This project was for meeting the demand of Senior
Project I & II courses. Program for Publication Analysis on Google Scholar
Project I & II courses. Program for Publication Analysis on Google Scholar
Line 17: Line 17:
non-self-citation, the ratio of total non-self-citation over the total citation.
non-self-citation, the ratio of total non-self-citation over the total citation.


== WCPMS Context Diagram ==
== Technical Details ==


The development environment of the program is Visual Studio 2013 for Desktop. As a programming language I used C#. Also, Microsoft SQL Server is used for Database Management.
== System Requirements ==


== Project Requirements ==
WCPMS-2 shall provide for entry of project IDs, names, descriptions, sponsors, clients, advisors, assigned students, and allocated funds. - {WCP99-001}


This project is to extract and analyze the publications and citations of university faculty based on the Google Scholar pages.
WCPMS-2 shall provide for the generation of emails with fields populated from the WCPMS database. - {WCP99-002}


Stage 1: Extract basic publication and citation information for a given faculty
WCPMS-2 shall provide for the generation of reports with fields populated from the WCPMS database. - {WCP99-003}
Input: The URL of the Google Scholar profile page of a faculty
Outputs: Extract every individual publication of the faculty. For each publication, extract its title, authors, publication venue (conference name for conference publication publications, journal name and volume/issue numbers for journal publications), page numbers, publication year, citation count, and citation link. Individual author names should be separated. The citation link is the URL that links to the (first) page that contains the publications that cite the publication under consideration. The extracted records are exported to an XML file and Excel file.
Requirement: Minimize the number of query submissions/downloads from Google Scholar site.


Stage 2: Extract information of all publications that cite a given publication P and determine whether a citation is a self-citation.
WCPMS-2 shall provide for the generation of spreadsheets with fields populated from the WCPMS database. - {WCP99-004}
Input: A given publication P and the URL L of the (first) page that contains the publications that cite P.
Output: Compute the number of self-citations and non-self-citations for P among the publications that cite P. A publication p1 that cites P is a self-citation if p1 and P share at least one author.
Requirement: Minimize the number of query submissions/downloads from Google Scholar site.


Stage 3: Combine Stage 1 and Stage 2 programs to find the non-self-citation count for every publication of a given faculty from the Google Scholar site.
WCPMS-2 shall be installed on a new WCP.Binghamton.edu server running CentOS7. - {WCP99-005}
Input: The URL of the Google Scholar profile page of a faculty
Output: The same as for Stage 1 except that the non-self-citation count for each publication is added to the result.


Stage 4: Compute the i10-index (the number of publications that have at least 10 citations) and H-index (the largest number h such that there are h papers with each having at least h citations) based on both the total citation and non-self-citation. Also compute the ratio of non-self-citation over the total citation.
The WCPMS shall maintain compatibility with the BU Central Authorization System. - {WCP99-006}
Input: The output of Stage 3.
Output: The i10-index based on the total citation, the i10-index based on non-self-citation, the H-index based on the total citation, the H-index based on non-self-citation, the ratio of non-self-citation over the total citation.


Stage 5: Divide the publication records of a given faculty by year.
The WCPMS should maintain growth provisions for tracking lab equipment, computers, and workspace assignments. - {WCP99-007}
Input: The output of Stage 1.
Output: Divide the input by year with the publications for more recent years listed first.


Stage 6: For the list of Google Scholar faculty profiles, compute the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.
An SQL script shall be provided that creates and initializes the WCPMS database. - {WCP99-008}


== Weekly Progress ==
System shall have two types of accounts, Students and Professor/TAs. - {WCP99-009}


Since, I have done first two phases during first semester, at the beginning of the spring semester I started with third phase.
System shall distinguish users according to their BU ID (Professor/Student). - {WCP99-010}


=== Week 1 & 2 & 3 & 4 ===
Students shall be able to input their personal information (BU ID, name, major, GPA, e-mail, phone number etc.) - {WCP99-011}


Working on Phase 3
Students shall be able to input their skills and abilities. - {WCP99-012}


Major difficulties were, sending multiple requests to Google's server and as a result being banned by Google (it is basically local IP ban)
Students shall be able to input their courses. - {WCP99-013}


=== Week 5 & 6 & 7 ===
Students should be able to input their current schedules as spreadsheets (.xlsx, excel format). - {WCP99-014}


Working on Phase 4
Students should be able to view the projects that they are assigned to. - {WCP99-015}


=== Week 8 & 9 ===
Students should be able to view the due dates of their projects. - {WCP99-016}


Working on Phase 5
Professors shall be able to input their personal information (name, title, department, e-mail, phone number) - {WCP99-017}


=== Week 10 & 11 & 12 ===
Professors shall be able to add/edit/delete project descriptions. - {WCP99-018}


Working on Phase 6
Professors shall be able to access the spreadsheets filled by students. - {WCP99-019}


== Charts ==
Professors shall be able to assign students to projects. - {WCP99-020}


=== First Semester ===
Professors shall be able to generate e-mails to students from the website. - {WCP99-021}
[[File:Screen_Shot_2014-12-16_at_11.35.59_PM1.png]]


=== Second Semester ===
The connection with the database, the SQL Server, shall be provided through Apache via the written PHP codes. {WCP99-022}
[[File:Screen_Shot_2014-12-17_at_12.37.20_AM.png]]

The inputs (inputs entered by the users) shall be sent to the SQL Server over Apache, and the corresponding tables, lists or any kind of information shall be returned as outputs. {WCP99-023}

System shall have two types of accounts as mentioned in WCP99-009, Students and Professor/TAs.

System shall distinguish users according to their BU ID (Professor/Student) as mentioned in WCP99-009.

Students shall not have access to add/edit/delete projects created by professors. - {WCP99-024}

Students shall not have access to other students’ or professors’ personal information. - {WCP99-025}

== User Interface ==

'''Student Side of the Website: '''

Latest revision as of 21:27, 2 May 2015

Introduction

After the discussion and the agreement of the project with Professor Weiyi Meng, I - Emrecan Tarakci- have started working on the project known as Publication Analysis on Google Scholar. This project was for meeting the demand of Senior Project I & II courses. Program for Publication Analysis on Google Scholar mainly focuses on extracting the records from Google Scholar such as name of author/s, title of paper, year of publication, publication venue and citation count. After extraction and storing, the program analyzes and computes the count of self-citations, non-self-citations, i10-index, H-index and the number of academician's publications per year. Since the program will work for the Watson faculty members at first, the program computes the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.

Technical Details

The development environment of the program is Visual Studio 2013 for Desktop. As a programming language I used C#. Also, Microsoft SQL Server is used for Database Management.

Project Requirements

This project is to extract and analyze the publications and citations of university faculty based on the Google Scholar pages.

Stage 1: Extract basic publication and citation information for a given faculty Input: The URL of the Google Scholar profile page of a faculty Outputs: Extract every individual publication of the faculty. For each publication, extract its title, authors, publication venue (conference name for conference publication publications, journal name and volume/issue numbers for journal publications), page numbers, publication year, citation count, and citation link. Individual author names should be separated. The citation link is the URL that links to the (first) page that contains the publications that cite the publication under consideration. The extracted records are exported to an XML file and Excel file. Requirement: Minimize the number of query submissions/downloads from Google Scholar site.

Stage 2: Extract information of all publications that cite a given publication P and determine whether a citation is a self-citation. Input: A given publication P and the URL L of the (first) page that contains the publications that cite P. Output: Compute the number of self-citations and non-self-citations for P among the publications that cite P. A publication p1 that cites P is a self-citation if p1 and P share at least one author. Requirement: Minimize the number of query submissions/downloads from Google Scholar site.

Stage 3: Combine Stage 1 and Stage 2 programs to find the non-self-citation count for every publication of a given faculty from the Google Scholar site. Input: The URL of the Google Scholar profile page of a faculty Output: The same as for Stage 1 except that the non-self-citation count for each publication is added to the result.

Stage 4: Compute the i10-index (the number of publications that have at least 10 citations) and H-index (the largest number h such that there are h papers with each having at least h citations) based on both the total citation and non-self-citation. Also compute the ratio of non-self-citation over the total citation. Input: The output of Stage 3. Output: The i10-index based on the total citation, the i10-index based on non-self-citation, the H-index based on the total citation, the H-index based on non-self-citation, the ratio of non-self-citation over the total citation.

Stage 5: Divide the publication records of a given faculty by year. Input: The output of Stage 1. Output: Divide the input by year with the publications for more recent years listed first.

Stage 6: For the list of Google Scholar faculty profiles, compute the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.

Weekly Progress

Since, I have done first two phases during first semester, at the beginning of the spring semester I started with third phase.

Week 1 & 2 & 3 & 4

Working on Phase 3

Major difficulties were, sending multiple requests to Google's server and as a result being banned by Google (it is basically local IP ban)

Week 5 & 6 & 7

Working on Phase 4

Week 8 & 9

Working on Phase 5

Week 10 & 11 & 12

Working on Phase 6

Charts

First Semester

Second Semester