Prof. Meng - Emrecan Tarakci: Difference between revisions

From CS486wiki
Jump to navigationJump to search
Content deleted Content added
No edit summary   (change visibility)
No edit summary   (change visibility)
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Scope ==
== Introduction ==


After the discussion and the agreement of the project with Professor Weiyi Meng,
WCPMS web site is developed to arrange and organize project assessment spreadsheets. The main purpose of the project is to reduce paper work while saving time and increasing the efficiency of the current system. The website is easy to use not only for the professors or teaching assistants but also for students. This project has been started on the school year of 2013-2014. It is to be considered as Phase I. This year’s effort has two phases, Phase IA and Phase II. The project is sponsored by Watson Capstone Project, and the customer is Prof. Maynard. The site of operation will be Binghamton University.
I - Emrecan Tarakci- have started working on the project known as Publication
Analysis on Google Scholar. This project was for meeting the demand of Senior
Project I & II courses. Program for Publication Analysis on Google Scholar
mainly focuses on extracting the records from Google Scholar such as name of
author/s, title of paper, year of publication, publication venue and citation
count. After extraction and storing, the program analyzes and computes the
count of self-citations, non-self-citations, i10-index, H-index and the number
of academician's publications per year. Since the program will work for the
Watson faculty members at first, the program computes the total citation count,
the total non-self-citation count, the average citation count, the average
non-self-citation count, the total i10-index, the total i10-index based on
non-self-citation, the average i10-index, the average i10-index based on
non-self-citation, the average H-index, the average H-index based on
non-self-citation, the ratio of total non-self-citation over the total citation.


== WCPMS Context Diagram ==
== Technical Details ==


The development environment of the program is Visual Studio 2013 for Desktop. As a programming language I used C#. Also, Microsoft SQL Server is used for Database Management.
[[File:Wcpmscontextdiagram.JPG‎]]


== Project Requirements ==


This project is to extract and analyze the publications and citations of university faculty based on the Google Scholar pages.
== System Requirements ==


Stage 1: Extract basic publication and citation information for a given faculty
WCPMS-2 shall provide for entry of project IDs, names, descriptions, sponsors, clients, advisors, assigned students, and allocated funds. - {WCP99-001}
Input: The URL of the Google Scholar profile page of a faculty
Outputs: Extract every individual publication of the faculty. For each publication, extract its title, authors, publication venue (conference name for conference publication publications, journal name and volume/issue numbers for journal publications), page numbers, publication year, citation count, and citation link. Individual author names should be separated. The citation link is the URL that links to the (first) page that contains the publications that cite the publication under consideration. The extracted records are exported to an XML file and Excel file.
Requirement: Minimize the number of query submissions/downloads from Google Scholar site.


Stage 2: Extract information of all publications that cite a given publication P and determine whether a citation is a self-citation.
WCPMS-2 shall provide for the generation of emails with fields populated from the WCPMS database. - {WCP99-002}
Input: A given publication P and the URL L of the (first) page that contains the publications that cite P.
Output: Compute the number of self-citations and non-self-citations for P among the publications that cite P. A publication p1 that cites P is a self-citation if p1 and P share at least one author.
Requirement: Minimize the number of query submissions/downloads from Google Scholar site.


Stage 3: Combine Stage 1 and Stage 2 programs to find the non-self-citation count for every publication of a given faculty from the Google Scholar site.
WCPMS-2 shall provide for the generation of reports with fields populated from the WCPMS database. - {WCP99-003}
Input: The URL of the Google Scholar profile page of a faculty
Output: The same as for Stage 1 except that the non-self-citation count for each publication is added to the result.


Stage 4: Compute the i10-index (the number of publications that have at least 10 citations) and H-index (the largest number h such that there are h papers with each having at least h citations) based on both the total citation and non-self-citation. Also compute the ratio of non-self-citation over the total citation.
WCPMS-2 shall provide for the generation of spreadsheets with fields populated from the WCPMS database. - {WCP99-004}
Input: The output of Stage 3.
Output: The i10-index based on the total citation, the i10-index based on non-self-citation, the H-index based on the total citation, the H-index based on non-self-citation, the ratio of non-self-citation over the total citation.


Stage 5: Divide the publication records of a given faculty by year.
WCPMS-2 shall be installed on a new WCP.Binghamton.edu server running CentOS7. - {WCP99-005}
Input: The output of Stage 1.
Output: Divide the input by year with the publications for more recent years listed first.


Stage 6: For the list of Google Scholar faculty profiles, compute the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.
The WCPMS shall maintain compatibility with the BU Central Authorization System. - {WCP99-006}


== Weekly Progress ==
The WCPMS should maintain growth provisions for tracking lab equipment, computers, and workspace assignments. - {WCP99-007}


Since, I have done first two phases during first semester, at the beginning of the spring semester I started with third phase.
An SQL script shall be provided that creates and initializes the WCPMS database. - {WCP99-008}


=== Week 1 & 2 & 3 & 4 ===
System shall have two types of accounts, Students and Professor/TAs. - {WCP99-009}


Working on Phase 3
System shall distinguish users according to their BU ID (Professor/Student). - {WCP99-010}


Major difficulties were, sending multiple requests to Google's server and as a result being banned by Google (it is basically local IP ban)
Students shall be able to input their personal information (BU ID, name, major, GPA, e-mail, phone number etc.) - {WCP99-011}


=== Week 5 & 6 & 7 ===
Students shall be able to input their skills and abilities. - {WCP99-012}


Working on Phase 4
Students shall be able to input their courses. - {WCP99-013}


=== Week 8 & 9 ===
Students should be able to input their current schedules as spreadsheets (.xlsx, excel format). - {WCP99-014}


Working on Phase 5
Students should be able to view the projects that they are assigned to. - {WCP99-015}


=== Week 10 & 11 & 12 ===
Students should be able to view the due dates of their projects. - {WCP99-016}


Working on Phase 6
Professors shall be able to input their personal information (name, title, department, e-mail, phone number) - {WCP99-017}


== Charts ==
Professors shall be able to add/edit/delete project descriptions. - {WCP99-018}


=== First Semester ===
Professors shall be able to access the spreadsheets filled by students. - {WCP99-019}
[[File:Screen_Shot_2014-12-16_at_11.35.59_PM1.png]]


=== Second Semester ===
Professors shall be able to assign students to projects. - {WCP99-020}
[[File:Screen_Shot_2014-12-17_at_12.37.20_AM.png]]

Professors shall be able to generate e-mails to students from the website. - {WCP99-021}

The connection with the database, the SQL Server, shall be provided through Apache via the written PHP codes. {WCP99-022}

The inputs (inputs entered by the users) shall be sent to the SQL Server over Apache, and the corresponding tables, lists or any kind of information shall be returned as outputs. {WCP99-023}

System shall have two types of accounts as mentioned in WCP99-009, Students and Professor/TAs.

System shall distinguish users according to their BU ID (Professor/Student) as mentioned in WCP99-009.

Students shall not have access to add/edit/delete projects created by professors. - {WCP99-024}

Students shall not have access to other students’ or professors’ personal information. - {WCP99-025}

== User Interface ==

'''Student Side of the Website: '''

Since we are building this website for the University we tried to choose the colors and styles that are similar to the current University services such as BU Brain and Blackboard.

[[File:StudentHome.jpg]]

This is the Home page that both the students and professors see when they login to the system. They can view their personal information and view the announcements.

[[File:Skill_Assesment.jpg]]

Here, the students are able to upload/update their personal information, grades and technical skills.

[[File:Students_Projects.jpg]]

This page shows a list of the current projects where the students are able to view the project details and apply to up to 4 preferences.

[[File:Personal_Statement.jpg]]

If the students wish to enter a personal statement to emphasize on a skill or anything else, this is where they do so.

'''Professor/Admin Side of the Website: '''

[[File:Create_Project.jpg]]

The professors enter the project details as well as uploading the project proposal in any type of format that is supported by Google Docs.

[[File:Current_Project.jpg]]

This page is similar to the Projects page of the Student side. It shows the details of active projects.

[[File:Students.jpg]]

Here, the professors are able to view students and all of their information they uploaded; their personal statements, preferences, grades, skills etc.

[[File:Project_Assignment1.jpg]]

[[File:Project_Assignment2.jpg]]

This is the page where the students get assigned to the projects. A project is selected from the upper list and the project details is shown on the left whereas the students who applied to the project and their information is shown on the right. If needed, a list of unassigned students is also available.

[[File:Assignment_Grid.jpg]]

The purpose of this page is basically the same as the Project Assignment page, with a spreadsheet view allowing the user to work on multiple projects at a time.

[[File:Admin_Panel.jpg]]

Here the user has the option of creating new announcements, enabling/disabling the project application for students, upload a full list of students to the database and generate excel reports of lists of students and reports

[[File:Students_Report.jpg]]

The Students Report that is generated from the Admin Panel that has the students information.

[[File:Projects_Report.jpg]]

The Projects Report that is generated from the Admin Panel that has the projects information.

Latest revision as of 21:27, 2 May 2015

Introduction

After the discussion and the agreement of the project with Professor Weiyi Meng, I - Emrecan Tarakci- have started working on the project known as Publication Analysis on Google Scholar. This project was for meeting the demand of Senior Project I & II courses. Program for Publication Analysis on Google Scholar mainly focuses on extracting the records from Google Scholar such as name of author/s, title of paper, year of publication, publication venue and citation count. After extraction and storing, the program analyzes and computes the count of self-citations, non-self-citations, i10-index, H-index and the number of academician's publications per year. Since the program will work for the Watson faculty members at first, the program computes the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.

Technical Details

The development environment of the program is Visual Studio 2013 for Desktop. As a programming language I used C#. Also, Microsoft SQL Server is used for Database Management.

Project Requirements

This project is to extract and analyze the publications and citations of university faculty based on the Google Scholar pages.

Stage 1: Extract basic publication and citation information for a given faculty Input: The URL of the Google Scholar profile page of a faculty Outputs: Extract every individual publication of the faculty. For each publication, extract its title, authors, publication venue (conference name for conference publication publications, journal name and volume/issue numbers for journal publications), page numbers, publication year, citation count, and citation link. Individual author names should be separated. The citation link is the URL that links to the (first) page that contains the publications that cite the publication under consideration. The extracted records are exported to an XML file and Excel file. Requirement: Minimize the number of query submissions/downloads from Google Scholar site.

Stage 2: Extract information of all publications that cite a given publication P and determine whether a citation is a self-citation. Input: A given publication P and the URL L of the (first) page that contains the publications that cite P. Output: Compute the number of self-citations and non-self-citations for P among the publications that cite P. A publication p1 that cites P is a self-citation if p1 and P share at least one author. Requirement: Minimize the number of query submissions/downloads from Google Scholar site.

Stage 3: Combine Stage 1 and Stage 2 programs to find the non-self-citation count for every publication of a given faculty from the Google Scholar site. Input: The URL of the Google Scholar profile page of a faculty Output: The same as for Stage 1 except that the non-self-citation count for each publication is added to the result.

Stage 4: Compute the i10-index (the number of publications that have at least 10 citations) and H-index (the largest number h such that there are h papers with each having at least h citations) based on both the total citation and non-self-citation. Also compute the ratio of non-self-citation over the total citation. Input: The output of Stage 3. Output: The i10-index based on the total citation, the i10-index based on non-self-citation, the H-index based on the total citation, the H-index based on non-self-citation, the ratio of non-self-citation over the total citation.

Stage 5: Divide the publication records of a given faculty by year. Input: The output of Stage 1. Output: Divide the input by year with the publications for more recent years listed first.

Stage 6: For the list of Google Scholar faculty profiles, compute the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.

Weekly Progress

Since, I have done first two phases during first semester, at the beginning of the spring semester I started with third phase.

Week 1 & 2 & 3 & 4

Working on Phase 3

Major difficulties were, sending multiple requests to Google's server and as a result being banned by Google (it is basically local IP ban)

Week 5 & 6 & 7

Working on Phase 4

Week 8 & 9

Working on Phase 5

Week 10 & 11 & 12

Working on Phase 6

Charts

First Semester

Second Semester